question about seq scan and index scan

Started by Sébastien PALLEAUover 23 years ago7 messagesgeneral
Jump to latest
#1Sébastien PALLEAU
spalleau@elma.fr

hi,
I tried to execute the following request on a postgresql 7.2.3 system :

select count(*) from table tablea a, tableb b, tablec c where a.id=b.id and a.id=c.id and a.mynumber > 5000;
tablea, tableb and tablec have primary on id that identify a unique member.

tablea contains 471413 records
tableb contains 471413 records
tablec contains 471413 records

a.id is tablea primary key
b.id is tableb primary key
c.id is tablec primary key

An explain on the request provides the following results.
seq scan on tableb.id (I dont agree)
seq scan on tablec.id (i dont agree)
seq scan on tablea.id (seems normal but why executed last ?)

and the most strange is that for the following request :
explain select count(*) from table tablea a, tableb b, tablec c where a.id=b.id and a.id=c.id and a.mynumber > 20000;
provides :
index scan using numberpoints_tablea_key
index scan using tableb_pkey
index scan using tablec_pkey

why doesn't postgres uses indexes in the first case ?
thanks for your anwers.

#2Hervé Piedvache
herve@elma.fr
In reply to: Sébastien PALLEAU (#1)
Re: question about seq scan and index scan

Hi Seb,

You omit to say one thing ...

We have an index on tablea on mynumber (integer).
Those tables are vacuumed every nights.

If we do :
explain select count(*) from tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber>23000;
NOTICE: QUERY PLAN:

Aggregate (cost=51898.63..51898.63 rows=1 width=48)
-> Hash Join (cost=33905.40..51887.68 rows=4379 width=48)
-> Seq Scan on tableb b (cost=0.00..14391.60 rows=471460 width=16)
-> Hash (cost=33894.45..33894.45 rows=4379 width=32)
-> Hash Join (cost=13522.17..33894.45 rows=4379 width=32)
-> Seq Scan on tablec c (cost=0.00..16781.60
rows=471460 width=16)
-> Hash (cost=13511.23..13511.23 rows=4379 width=16)
-> Seq Scan on tablea a (cost=0.00..13511.23
rows=4379 width=16)

EXPLAIN

And If we do an :
explain select count(*) from tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber>23350;
NOTICE: QUERY PLAN:

Aggregate (cost=50522.53..50522.53 rows=1 width=48)
-> Nested Loop (cost=13521.19..50512.57 rows=3987 width=48)
-> Hash Join (cost=13521.19..32709.93 rows=3987 width=32)
-> Seq Scan on tableb b (cost=0.00..16781.60 rows=471460
width=16)
-> Hash (cost=13511.23..13511.23 rows=3987 width=16)
-> Seq Scan on tablea a (cost=0.00..13511.23 rows=3987
width=16)
-> Index Scan using tablec_pkey on tablec c (cost=0.00..4.45 rows=1
width=16)

EXPLAIN

But If I do :
explain select count(*) from tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber>24000;
NOTICE: QUERY PLAN:

Aggregate (cost=414.72..414.72 rows=1 width=48)
-> Nested Loop (cost=0.00..414.65 rows=29 width=48)
-> Nested Loop (cost=0.00..284.62 rows=29 width=32)
-> Index Scan using ix_mynumber_key on tablea a
(cost=0.00..116.44 rows=29 width=16)
-> Index Scan using tableb_pkey on tableb b (cost=0.00..5.76
rows=1 width=16)
-> Index Scan using tablec_pkey on tablec c (cost=0.00..4.45 rows=1
width=16)

EXPLAIN

Thanks for you help ... to "explain" us why Postgresql do not use all the
time the index scan for this kind of request for 3 tables with the same
primary key id.

regards,

Le Vendredi 25 Octobre 2002 18:18, Sᅵbastien PALLEAU a ᅵcrit :

hi,
I tried to execute the following request on a postgresql 7.2.3 system :

select count(*) from table tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber > 5000; tablea, tableb and tablec have primary on
id that identify a unique member.

tablea contains 471413 records
tableb contains 471413 records
tablec contains 471413 records

a.id is tablea primary key
b.id is tableb primary key
c.id is tablec primary key

An explain on the request provides the following results.
seq scan on tableb.id (I dont agree)
seq scan on tablec.id (i dont agree)
seq scan on tablea.id (seems normal but why executed last ?)

and the most strange is that for the following request :
explain select count(*) from table tablea a, tableb b, tablec c where
a.id=b.id and a.id=c.id and a.mynumber > 20000; provides :
index scan using numberpoints_tablea_key
index scan using tableb_pkey
index scan using tablec_pkey

why doesn't postgres uses indexes in the first case ?
thanks for your anwers.

--
Hervᅵ Piedvache

Elma Ingᅵnierie Informatique
6 rue du Faubourg Saint-Honorᅵ
F-75008 - Paris - France
Tel. 33-144949901
fax. 33-144949902

#3Stephan Szabo
sszabo@megazone23.bigpanda.com
In reply to: Hervé Piedvache (#2)
Re: question about seq scan and index scan

On Fri, 25 Oct 2002, [iso-8859-15] Herv� Piedvache wrote:

Thanks for you help ... to "explain" us why Postgresql do not use all the
time the index scan for this kind of request for 3 tables with the same
primary key id.

Depending on the number of rows returned, index scan may not be faster
than a sequential scan. As you up the constant, the estimated number of
rows drops which make the index scan a better and better plan. Are
the estimated number of rows for the scan of a (4379, 3987, and 29)
realistic for mynumber>(23000, 23500, 24000)

#4Hervé Piedvache
herve@elma.fr
In reply to: Stephan Szabo (#3)
Re: question about seq scan and index scan

Le Vendredi 25 Octobre 2002 20:17, Stephan Szabo a ᅵcrit :

On Fri, 25 Oct 2002, [iso-8859-15] Hervᅵ Piedvache wrote:

Thanks for you help ... to "explain" us why Postgresql do not use all the
time the index scan for this kind of request for 3 tables with the same
primary key id.

Depending on the number of rows returned, index scan may not be faster
than a sequential scan. As you up the constant, the estimated number of
rows drops which make the index scan a better and better plan. Are
the estimated number of rows for the scan of a (4379, 3987, and 29)
realistic for mynumber>(23000, 23500, 24000)

I'm surpise about this answer for

23000

result is 98

23350

result is 96

24000

result is 93

With a table of 471 413 records and most of the values for mynumber are
< of 2500 result is 467 902 records ...

I do not understand the explain result sorry ! ;o/

Regards,
--
Hervᅵ Piedvache

Elma Ingᅵnierie Informatique
6 rue du Faubourg Saint-Honorᅵ
F-75008 - Paris - France
Tel. 33-144949901
fax. 33-144949902

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hervé Piedvache (#4)
Re: question about seq scan and index scan

=?iso-8859-15?q?Herv=E9=20Piedvache?= <herve@elma.fr> writes:

Le Vendredi 25 Octobre 2002 20:17, Stephan Szabo a �crit :

Are the estimated number of rows for the scan of a (4379, 3987, and 29)
realistic for mynumber>(23000, 23500, 24000)

I'm surpise about this answer for

23000

result is 98

23350

result is 96

24000

result is 93

Hmm, have you "ANALYZE"d this table recently? If so it would be
interesting to see the pg_stats row for tablea.mynumber.

If the distribution of mynumber is very uneven, it might be worth
your while to increase the statistics target for the column, so that
the planner has more data points with which to make a better estimate.
See ALTER TABLE SET STATISTICS.

regards, tom lane

#6Hervé Piedvache
herve@elma.fr
In reply to: Tom Lane (#5)
Re: question about seq scan and index scan

Hi Tom,

Le Samedi 26 Octobre 2002 16:31, Tom Lane a ᅵcrit :

=?iso-8859-15?q?Herv=E9=20Piedvache?= <herve@elma.fr> writes:

Le Vendredi 25 Octobre 2002 20:17, Stephan Szabo a ᅵcrit :

Are the estimated number of rows for the scan of a (4379, 3987, and 29)
realistic for mynumber>(23000, 23500, 24000)

I'm surpise about this answer for

23000

result is 98

23350

result is 96

24000

result is 93

Hmm, have you "ANALYZE"d this table recently? If so it would be
interesting to see the pg_stats row for tablea.mynumber.

Every night ...

If the distribution of mynumber is very uneven, it might be worth
your while to increase the statistics target for the column, so that
the planner has more data points with which to make a better estimate.
See ALTER TABLE SET STATISTICS.

Yep done like this ... I have deleted my index on tablea on mynumber, and put
an alter table set statistics 100 ... and now it's really better !

Many thanks !
--
Hervᅵ Piedvache

Elma Ingᅵnierie Informatique
6 rue du Faubourg Saint-Honorᅵ
F-75008 - Paris - France
Tel. 33-144949901
fax. 33-144949902

#7Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Sébastien PALLEAU (#1)
Re: question about seq scan and index scan

On 25 Oct 2002 at 18:18, S�bastien PALLEAU wrote:

hi,
I tried to execute the following request on a postgresql 7.2.3 system :

select count(*) from table tablea a, tableb b, tablec c where a.id=b.id and
a.id=c.id and a.mynumber > 5000;
tablea, tableb and tablec have primaryon id that identify a unique member.

tablea contains 471413 records tableb contains 471413 records tablec contains
471413 records

a.id is tablea primary key

b.id is tableb primary key c.id is tablec primary key

An explain on the request provides the following results.
seq scan on tableb.id (I dont agree)
seq scan on tablec.id (i dont agree)
seq scan on tablea.id (seems normal but why executed last ?)

and the most strange is that for the following request :
explain select count(*) from table tablea a, tableb b, tablec c where a.id=b.id
and a.id=c.id and a.mynumber > 20000;
provides :
index scan using numberpoints_tablea_key
index scan using tableb_pkey
index scan using tablec_pkey

why doesn't postgres uses indexes in the first case ?

umm. What happens if you phrase like

a.mynumber > 20000 and a.id=b.id and a.id=c.id

and BTW what's the difference between two queries? I failed to spot any..

Bye
Shridhar

--
Cynic, n.: One who looks through rose-colored glasses with a jaundiced eye.