why sequential scan is used on indexed column ???

Started by Julius Tuskenisalmost 18 years ago8 messagesgeneral
Jump to latest
#1Julius Tuskenis
julius.tuskenis@gmail.com

Hello.

I have a question concerning performance. One of my queries take a long
to execute. I tried to do "explain analyse" and I see that the
sequential scan is being used, although I have indexes set on columns
that are used in joins. The question is - WHY, and how to change that
behavior???

The DBMS: pgSQL 8.1.4 on gentoo linux.

The query:

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190

result:
"Hash Join (cost=5.17..10185.89 rows=6047 width=138) (actual
time=10698.539..10698.539 rows=0 loops=1)"
" Hash Cond: ("outer".aps_saskaita = "inner".sas_id)"
" -> Seq Scan on apsilankymai (cost=0.00..8618.50 rows=300350
width=42) (actual time=2121.310..6470.721 rows=300350 loops=1)"
" -> Hash (cost=5.14..5.14 rows=9 width=96) (actual
time=31.545..31.545 rows=1 loops=1)"
" -> Bitmap Heap Scan on b_saskaita (cost=2.03..5.14 rows=9
width=96) (actual time=31.473..31.489 rows=1 loops=1)"
" Recheck Cond: (sas_subjektas = 20190)"
" -> Bitmap Index Scan on idx_sas_subjektas
(cost=0.00..2.03 rows=9 width=0) (actual time=25.552..25.552 rows=1
loops=1)"
" Index Cond: (sas_subjektas = 20190)"
"Total runtime: 10698.780 ms"

The tables with indexes:

CREATE TABLE b_saskaita
(
sas_id serial NOT NULL,
sas_tevas integer,
sas_kreditas numeric(8,2) NOT NULL DEFAULT 0,
sas_statusas smallint NOT NULL DEFAULT 1,
sas_subjektas integer,
sas_kam_naudojama integer,
sas_pastaba character varying(100),
CONSTRAINT b_saskaita_pkey PRIMARY KEY (sas_id),
CONSTRAINT fk_sas_subjektas FOREIGN KEY (sas_subjektas)
REFERENCES subjektas (sub_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT fk_saskaitos_tevas FOREIGN KEY (sas_tevas)
REFERENCES b_saskaita (sas_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE RESTRICT
)
WITHOUT OIDS;
ALTER TABLE b_saskaita OWNER TO postgres;
GRANT ALL ON TABLE b_saskaita TO postgres;
GRANT ALL ON TABLE b_saskaita TO public;

CREATE INDEX fki_sas_subjektas
ON b_saskaita
USING btree
(sas_subjektas);

CREATE TABLE apsilankymai
(
aps_id serial NOT NULL,
aps_abonementas integer NOT NULL,
aps_atejo timestamp(0) without time zone NOT NULL,
aps_isejo timestamp(0) without time zone,
aps_ileidimas integer,
aps_zetonas integer NOT NULL,
aps_padalinys integer NOT NULL,
aps_saskaita integer,
aps_statusas smallint DEFAULT 0,
CONSTRAINT apsilankymai_pkey PRIMARY KEY (aps_id),
CONSTRAINT fk_apsilankymo_abonementas FOREIGN KEY (aps_abonementas)
REFERENCES subjekto_abonementai (sab_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT fk_apsilankymo_padalinys FOREIGN KEY (aps_padalinys)
REFERENCES padalinys (pad_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE RESTRICT,
CONSTRAINT fk_apsilankymo_saskaita FOREIGN KEY (aps_saskaita)
REFERENCES b_saskaita (sas_id) MATCH FULL
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT fk_apsilankymo_zetonas FOREIGN KEY (aps_zetonas)
REFERENCES zetonai (zet_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE RESTRICT
)
WITHOUT OIDS;
ALTER TABLE apsilankymai OWNER TO postgres;
GRANT ALL ON TABLE apsilankymai TO postgres;
GRANT ALL ON TABLE apsilankymai TO public;
COMMENT ON COLUMN apsilankymai.aps_ileidimas IS 'jei apsilankymas neturi
skaitytis - nurodoma kuris apsilankymas yra pagrindinis';
COMMENT ON COLUMN apsilankymai.aps_padalinys IS 'kuriame padalinyje
lankesi zmogus. reikalingas, kai norim skaiciuoti kartus zmoniu turinciu
abonementa keliuose klubuose';
COMMENT ON COLUMN apsilankymai.aps_statusas IS '0 - neiejes, 1 - viduje,
2 - isejes';

CREATE INDEX idx_aps_saskaita
ON apsilankymai
USING btree
(aps_saskaita);

Thank you in advance.

--

Julius Tuskenis

#2Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Julius Tuskenis (#1)
Re: why sequential scan is used on indexed column ???

Julius Tuskenis <julius.tuskenis@gmail.com> schrieb:

Hello.

I have a question concerning performance. One of my queries take a long
to execute. I tried to do "explain analyse" and I see that the
sequential scan is being used, although I have indexes set on columns
that are used in joins. The question is - WHY, and how to change that
behavior???

Try to create an index on apsilankymai.sas_id

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

#3Michael Fuhr
mike@fuhr.org
In reply to: Andreas Kretschmer (#2)
Re: why sequential scan is used on indexed column ???

On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote:

Julius Tuskenis <julius.tuskenis@gmail.com> schrieb:

I have a question concerning performance. One of my queries take a long
to execute. I tried to do "explain analyse" and I see that the
sequential scan is being used, although I have indexes set on columns
that are used in joins. The question is - WHY, and how to change that
behavior???

Try to create an index on apsilankymai.sas_id

In the DDL that Julius posted apsilankymai doesn't have an sas_id
column.

The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id. Both
columns have an index: b_saskaita.sas_id is a primary key so it
should have an index implicitly, and apsilankymai.aps_saskaita has
an explicit CREATE INDEX statement. The WHERE clause is on
b_saskaita.sas_subjektas, which also has an explicit CREATE INDEX
statement. Unless I'm mistaken all relevant columns have an index.

A few of the row count estimates differ from reality:

Hash Join (cost=5.17..10185.89 rows=6047 width=138) (actual time=10698.539..10698.539 rows=0 loops=1)

Bitmap Heap Scan on b_saskaita (cost=2.03..5.14 rows=9 width=96) (actual time=31.473..31.489 rows=1 loops=1)

However, that might not be entirely responsible for the questionable
plan. I created a test case that has close to the same estimated and
actual row counts and has the same plan if I disable enable_nestloop:

set enable_nestloop to off;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.54..5814.42 rows=5406 width=286) (actual time=3222.429..3222.429 rows=0 loops=1)
Hash Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
-> Seq Scan on apsilankymai (cost=0.00..4627.50 rows=300350 width=42) (actual time=0.085..1514.863 rows=300350 loops=1)
-> Hash (cost=6.43..6.43 rows=9 width=244) (actual time=0.122..0.122 rows=1 loops=1)
-> Bitmap Heap Scan on b_saskaita (cost=2.32..6.43 rows=9 width=244) (actual time=0.089..0.095 rows=1 loops=1)
Recheck Cond: (sas_subjektas = 20190)
-> Bitmap Index Scan on fki_sas_subjektas (cost=0.00..2.32 rows=9 width=0) (actual time=0.066..0.066 rows=1 loops=1)
Index Cond: (sas_subjektas = 20190)
Total runtime: 3222.786 ms

I get a better plan if I enable nested loops:

set enable_nestloop to on;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=79.93..4660.23 rows=5406 width=286) (actual time=1.000..1.000 rows=0 loops=1)
-> Seq Scan on b_saskaita (cost=0.00..10.25 rows=9 width=244) (actual time=0.116..0.870 rows=1 loops=1)
Filter: (sas_subjektas = 20190)
-> Bitmap Heap Scan on apsilankymai (cost=79.93..441.58 rows=6007 width=42) (actual time=0.084..0.084 rows=0 loops=1)
Recheck Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
-> Bitmap Index Scan on idx_aps_saskaita (cost=0.00..78.43 rows=6007 width=0) (actual time=0.068..0.068 rows=0 loops=1)
Index Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
Total runtime: 1.321 ms

Julius, do you perchance have enable_nestloop = off? If so, do you
get a better plan if you enable it? Also, have you run ANALYZE
lately?

--
Michael Fuhr

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Fuhr (#3)
Re: why sequential scan is used on indexed column ???

Michael Fuhr <mike@fuhr.org> writes:

I created a test case that has close to the same estimated and
actual row counts and has the same plan if I disable enable_nestloop:

There's something weird about this --- why does the second plan seqscan
b_saskaita, instead of using the bitmap scan that it had previously
estimated to be cheaper? What PG version are you testing, and can you
provide the full test case?

(As for the original question, the hash plan seems to me to be perfectly
reasonable for the estimated row counts --- fetching one row out of
fifty using an indexscan is going to be expensive. So I think the OP's
problem is purely a statistical one, or maybe he's in a situation where
he should reduce random_page_cost.)

regards, tom lane

#5Andreas Kretschmer
akretschmer@spamfence.net
In reply to: Michael Fuhr (#3)
Re: why sequential scan is used on indexed column ???

Michael Fuhr <mike@fuhr.org> schrieb:

On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote:

Julius Tuskenis <julius.tuskenis@gmail.com> schrieb:

I have a question concerning performance. One of my queries take a long
to execute. I tried to do "explain analyse" and I see that the
sequential scan is being used, although I have indexes set on columns
that are used in joins. The question is - WHY, and how to change that
behavior???

Try to create an index on apsilankymai.sas_id

In the DDL that Julius posted apsilankymai doesn't have an sas_id
column.

The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id. Both
columns have an index: b_saskaita.sas_id is a primary key so it
should have an index implicitly, and apsilankymai.aps_saskaita has

Right, my mistake.

Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect. (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly." (unknown)
Kaufbach, Saxony, Germany, Europe. N 51.05082�, E 13.56889�

#6Julius Tuskenis
julius.tuskenis@gmail.com
In reply to: Michael Fuhr (#3)
Re: why sequential scan is used on indexed column ???

Hi Michael.

Thank you for your answer. I've checked - enable_nestloop is true. I did
ANALYZE, but that didn't help. The sequential scan is still used.... Any
more ideas why?

Julius Tuskenis

Michael Fuhr rašė:

Show quoted text

On Sat, Jun 14, 2008 at 04:59:44PM +0200, Andreas Kretschmer wrote:

Julius Tuskenis <julius.tuskenis@gmail.com> schrieb:

I have a question concerning performance. One of my queries take a long
to execute. I tried to do "explain analyse" and I see that the
sequential scan is being used, although I have indexes set on columns
that are used in joins. The question is - WHY, and how to change that
behavior???

Try to create an index on apsilankymai.sas_id

In the DDL that Julius posted apsilankymai doesn't have an sas_id
column.

The join is on apsilankymai.aps_saskaita = b_saskaita.sas_id. Both
columns have an index: b_saskaita.sas_id is a primary key so it
should have an index implicitly, and apsilankymai.aps_saskaita has
an explicit CREATE INDEX statement. The WHERE clause is on
b_saskaita.sas_subjektas, which also has an explicit CREATE INDEX
statement. Unless I'm mistaken all relevant columns have an index.

A few of the row count estimates differ from reality:

Hash Join (cost=5.17..10185.89 rows=6047 width=138) (actual time=10698.539..10698.539 rows=0 loops=1)

Bitmap Heap Scan on b_saskaita (cost=2.03..5.14 rows=9 width=96) (actual time=31.473..31.489 rows=1 loops=1)

However, that might not be entirely responsible for the questionable
plan. I created a test case that has close to the same estimated and
actual row counts and has the same plan if I disable enable_nestloop:

set enable_nestloop to off;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=6.54..5814.42 rows=5406 width=286) (actual time=3222.429..3222.429 rows=0 loops=1)
Hash Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
-> Seq Scan on apsilankymai (cost=0.00..4627.50 rows=300350 width=42) (actual time=0.085..1514.863 rows=300350 loops=1)
-> Hash (cost=6.43..6.43 rows=9 width=244) (actual time=0.122..0.122 rows=1 loops=1)
-> Bitmap Heap Scan on b_saskaita (cost=2.32..6.43 rows=9 width=244) (actual time=0.089..0.095 rows=1 loops=1)
Recheck Cond: (sas_subjektas = 20190)
-> Bitmap Index Scan on fki_sas_subjektas (cost=0.00..2.32 rows=9 width=0) (actual time=0.066..0.066 rows=1 loops=1)
Index Cond: (sas_subjektas = 20190)
Total runtime: 3222.786 ms

I get a better plan if I enable nested loops:

set enable_nestloop to on;

explain analyze
select *
FROM b_saskaita JOIN apsilankymai ON (aps_saskaita = sas_id)
where sas_subjektas = 20190;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=79.93..4660.23 rows=5406 width=286) (actual time=1.000..1.000 rows=0 loops=1)
-> Seq Scan on b_saskaita (cost=0.00..10.25 rows=9 width=244) (actual time=0.116..0.870 rows=1 loops=1)
Filter: (sas_subjektas = 20190)
-> Bitmap Heap Scan on apsilankymai (cost=79.93..441.58 rows=6007 width=42) (actual time=0.084..0.084 rows=0 loops=1)
Recheck Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
-> Bitmap Index Scan on idx_aps_saskaita (cost=0.00..78.43 rows=6007 width=0) (actual time=0.068..0.068 rows=0 loops=1)
Index Cond: (apsilankymai.aps_saskaita = b_saskaita.sas_id)
Total runtime: 1.321 ms

Julius, do you perchance have enable_nestloop = off? If so, do you
get a better plan if you enable it? Also, have you run ANALYZE
lately?

#7Julius Tuskenis
julius.tuskenis@gmail.com
In reply to: Tom Lane (#4)
Re: why sequential scan is used on indexed column ???

Hello, Tom.

So I think the OP's
problem is purely a statistical one, or maybe he's in a situation where
he should reduce random_page_cost.)

What could be done solving that "statistical problem"? :) Current value
for random_page_cost is 4. What value would you suggest?

Julius Tuskenis

#8Michael Fuhr
mike@fuhr.org
In reply to: Tom Lane (#4)
Re: why sequential scan is used on indexed column ???

On Sat, Jun 14, 2008 at 02:35:38PM -0400, Tom Lane wrote:

Michael Fuhr <mike@fuhr.org> writes:

I created a test case that has close to the same estimated and
actual row counts and has the same plan if I disable enable_nestloop:

There's something weird about this --- why does the second plan seqscan
b_saskaita, instead of using the bitmap scan that it had previously
estimated to be cheaper?

Dunno.

What PG version are you testing, and can you provide the full test case?

My test was in 8.2.9, the only version I had handy at the time. I
later tested 8.1.13 (Julius said he was running 8.1.4) and got the
same plan that Julius got without messing with planner settings.

I don't have access to my test case right now but I'll post it when
I get a chance. I simply populated the tables with random data,
adjusting the amount and distribution until I got row count estimates
close to what Julius got. I don't know if my test case is close
enough to Julius's data to be relevant to his problem but if you think
my results are weird then maybe I've stumbled across something else
that's interesting.

(As for the original question, the hash plan seems to me to be perfectly
reasonable for the estimated row counts --- fetching one row out of
fifty using an indexscan is going to be expensive. So I think the OP's
problem is purely a statistical one, or maybe he's in a situation where
he should reduce random_page_cost.)

Hmmm...8.1.13 wants to do the hash join that you think would be
reasonable but 8.2.9 prefers the nested loop as in my second example.
I think I did have a reduced random_page_cost (2 as I recall).

--
Michael Fuhr