Weird indices
Hi,
I try to optimize our databases and I find a query that's not very
optimal :
sitefr=# explain select nomsession from session where nomsession='xxx';
NOTICE: QUERY PLAN:
Seq Scan on session (cost=0.00..16275.95 rows=10113 width=12)
EXPLAIN
Phew! I think there's an index missing but...
sitefr=# \d session
Table "session"
Attribute | Type | Modifier
------------+-----------+-------------------------------------------------
idsession | integer | not null default nextval('seq_idsession'::text)
nomsession | text |
idmembre | text |
referer | text |
ip | text |
datelog | timestamp |
Indices: ix_session_idmembre,
ix_session_nomsession,
session_idsession_key
So I look at the index itself :
sitefr=# \d ix_session_nomsession
Index "ix_session_nomsession"
Attribute | Type
------------+------
nomsession | text
btree
Did I miss something or 'text' attributes (fields) can't be indexed ?
That sounds crazy ! (I vacuum analyzed many times)
Just in case 'nomsession' would not be as dispersed as I would
think...
sitefr=# select count(nomsession) from session;
count
--------
510069
(1 row)
sitefr=# select count(distinct nomsession) from session;
count
--------
401094
(1 row)
Anyone has an idea ?
Thanks !
--
Jean-Christophe Boggio
cat@thefreecat.org
Independant Consultant and Developer
Delphi, Linux, Perl, PostgreSQL
Do you have a value that is not null that is very common?
It's estimating that there will be 10113 rows that match
nomsession='xxx' which makes a seq scan a much less bad plan.
On Sat, 17 Feb 2001, Jean-Christophe Boggio wrote:
Show quoted text
Hi,
I try to optimize our databases and I find a query that's not very
optimal :sitefr=# explain select nomsession from session where nomsession='xxx';
NOTICE: QUERY PLAN:Seq Scan on session (cost=0.00..16275.95 rows=10113 width=12)
EXPLAIN
Phew! I think there's an index missing but...
sitefr=# \d session
Table "session"
Attribute | Type | Modifier
------------+-----------+-------------------------------------------------
idsession | integer | not null default nextval('seq_idsession'::text)
nomsession | text |
idmembre | text |
referer | text |
ip | text |
datelog | timestamp |
Indices: ix_session_idmembre,
ix_session_nomsession,
session_idsession_keySo I look at the index itself :
sitefr=# \d ix_session_nomsession
Index "ix_session_nomsession"
Attribute | Type
------------+------
nomsession | text
btreeDid I miss something or 'text' attributes (fields) can't be indexed ?
That sounds crazy ! (I vacuum analyzed many times)Just in case 'nomsession' would not be as dispersed as I would
think...sitefr=# select count(nomsession) from session;
count
--------
510069
(1 row)sitefr=# select count(distinct nomsession) from session;
count
--------
401094
(1 row)
Stephan,
Ref : Saturday, February 17, 2001 1:50:32 AM
SS> Do you have a value that is not null that is very common?
SS> It's estimating that there will be 10113 rows that match
SS> nomsession='xxx' which makes a seq scan a much less bad plan.
At first, I thought that couldn't be the case but we happen to have
100000 records where nomsession=''
Just updated them so that their value is null and everything runs
fine. Thanks for your help !
--
Jean-Christophe Boggio
cat@thefreecat.org
Independant Consultant and Developer
Delphi, Linux, Perl, PostgreSQL
Stephan Szabo wrote:
Do you have a value that is not null that is very common?
It's estimating that there will be 10113 rows that match
nomsession='xxx' which makes a seq scan a much less bad plan.
Err, why? There is an index, isn't there? Shouldn't the index allow
postgres to quickly find the %2 of rows that would match?
On Sat, 17 Feb 2001, Jean-Christophe Boggio wrote:
Hi,
I try to optimize our databases and I find a query that's not very
optimal :sitefr=# explain select nomsession from session where nomsession='xxx';
NOTICE: QUERY PLAN:Seq Scan on session (cost=0.00..16275.95 rows=10113 width=12)
EXPLAIN
Phew! I think there's an index missing but...
sitefr=# \d session
Table "session"
Attribute | Type | Modifier
------------+-----------+-------------------------------------------------
idsession | integer | not null default nextval('seq_idsession'::text)
nomsession | text |
idmembre | text |
referer | text |
ip | text |
datelog | timestamp |
Indices: ix_session_idmembre,
ix_session_nomsession,
session_idsession_keySo I look at the index itself :
sitefr=# \d ix_session_nomsession
Index "ix_session_nomsession"
Attribute | Type
------------+------
nomsession | text
btreeDid I miss something or 'text' attributes (fields) can't be indexed ?
That sounds crazy ! (I vacuum analyzed many times)Just in case 'nomsession' would not be as dispersed as I would
think...sitefr=# select count(nomsession) from session;
count
--------
510069
(1 row)sitefr=# select count(distinct nomsession) from session;
count
--------
401094
(1 row)
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Stephan Szabo wrote:
Do you have a value that is not null that is very common?
It's estimating that there will be 10113 rows that match
nomsession='xxx' which makes a seq scan a much less bad plan.Err, why? There is an index, isn't there? Shouldn't the index allow
postgres to quickly find the %2 of rows that would match?
Right now it has to go to the heap file to find out whether or not
a row is currently visible to the transaction which means potentially
alot of seeks and reads from the heap file which can be more expensive
than just sequentially reading from the heap file depending on a bunch
of things such as how wide the rows are (if there are 100 rows per
block in the heap file and 500000 rows, you need to do 5000 reads.
If you are looking for 10000 rows in that file, you're likely (always?)
going to end up doing 10000 heap file reads plus the reads on the
index file.)
Joseph Shraibman <jks@selectacast.net> writes:
Stephan Szabo wrote:
Do you have a value that is not null that is very common?
It's estimating that there will be 10113 rows that match
nomsession='xxx' which makes a seq scan a much less bad plan.Err, why? There is an index, isn't there? Shouldn't the index allow
postgres to quickly find the %2 of rows that would match?
Define "quickly".
sitefr=# explain select nomsession from session where nomsession='xxx';
NOTICE: QUERY PLAN:Seq Scan on session (cost=0.00..16275.95 rows=10113 width=12)
We have here an estimate that 10113 rows will be matched (out of the
510069 in the table). The table contains something on the order of
16000 pages (guesstimate from the seqscan cost estimate). The planner
is assuming that the 10113 rows are randomly scattered in the table,
and therefore that the executor will have to fetch the majority of the
pages in the table. Under these circumstances a seqscan is cheaper
than an indexscan, because it works with the Unix kernel's preference
for sequential reads (to say nothing of the disk drive's ;-)), instead
of fighting that optimization. Random fetches are more than twice as
expensive as sequential fetches.
Of course, if the 10113-match estimate is wildly off (as it was in this
case), then the wrong plan may be chosen. But it IS NOT CORRECT to
suppose that indexscans always beat seqscans. The planner's job would
be a lot easier if that were true.
regards, tom lane
Tom Lane wrote:
Joseph Shraibman <jks@selectacast.net> writes:
Stephan Szabo wrote:
Do you have a value that is not null that is very common?
It's estimating that there will be 10113 rows that match
nomsession='xxx' which makes a seq scan a much less bad plan.Err, why? There is an index, isn't there? Shouldn't the index allow
postgres to quickly find the %2 of rows that would match?Define "quickly".
sitefr=# explain select nomsession from session where nomsession='xxx';
NOTICE: QUERY PLAN:Seq Scan on session (cost=0.00..16275.95 rows=10113 width=12)
We have here an estimate that 10113 rows will be matched (out of the
510069 in the table). The table contains something on the order of
16000 pages (guesstimate from the seqscan cost estimate). The planner
is assuming that the 10113 rows are randomly scattered in the table,
and therefore that the executor will have to fetch the majority of the
pages in the table. Under these circumstances a seqscan is cheaper
than an indexscan, because it works with the Unix kernel's preference
for sequential reads (to say nothing of the disk drive's ;-)), instead
of fighting that optimization. Random fetches are more than twice as
expensive as sequential fetches.Of course, if the 10113-match estimate is wildly off (as it was in this
case), then the wrong plan may be chosen. But it IS NOT CORRECT to
suppose that indexscans always beat seqscans. The planner's job would
be a lot easier if that were true.regards, tom lane
Can't postgres do the index lookup first and find out there are only a
few tuples that might match?
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
Joseph Shraibman wrote:
Can't postgres do the index lookup first and find out there are only a
few tuples that might match?
Actually it looks like postgres is doing this:
o=# explain select * from usertable where p = 33;
NOTICE: QUERY PLAN:
Seq Scan on usertable (cost=0.00..30.54 rows=502 width=72)
EXPLAIN
o=# explain select * from usertable where p = 1;
NOTICE: QUERY PLAN:
Index Scan using usertable_p_key on usertable (cost=0.00..25.68 rows=50
width=72)
EXPLAIN
o=# explain select count(*) from usertable where p = 1;
NOTICE: QUERY PLAN:
Aggregate (cost=25.81..25.81 rows=1 width=4)
-> Index Scan using usertable_p_key on usertable (cost=0.00..25.68
rows=50 width=4)
EXPLAIN
o=# explain select count(*) from usertable where p = 33;
NOTICE: QUERY PLAN:
Aggregate (cost=31.79..31.79 rows=1 width=4)
-> Seq Scan on usertable (cost=0.00..30.54 rows=502 width=4)
o=# select count(*) from usertable where p in(1,33) group by p;
count
-------
16
502
(2 rows)
This raises some other questions. Why can't postgres get the count(*)
from the index? Why doesn't it predict the correct number of rows in
the planner? (25 estimated vs 16 actual).
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Of course, if the 10113-match estimate is wildly off (as it was in this
case), then the wrong plan may be chosen. But it IS NOT CORRECT to
suppose that indexscans always beat seqscans. The planner's job would
be a lot easier if that were true.Can't postgres do the index lookup first and find out there are only a
few tuples that might match?
Well, theoretically the estimate is supposed to match reality. There are
still some cases where there isn't enough information kept to allow that
to be true (the case where there is a single very common non-NULL value is
one such case).
Joseph Shraibman <jks@selectacast.net> writes:
This raises some other questions. Why can't postgres get the count(*)
from the index? Why doesn't it predict the correct number of rows in
the planner? (25 estimated vs 16 actual).
The name of the game here is to make a plan *without* actually going
out and expending large amounts of time to find out the true state of
affairs; by the time you know for sure, you've already done the query.
We have to do a certain amount of guessing, otherwise the planner will
be a net drag on performance. Accordingly, the estimates will never be
perfectly accurate.
regards, tom lane
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Joseph Shraibman wrote:
Can't postgres do the index lookup first and find out there are only a
few tuples that might match?
o=# select count(*) from usertable where p in(1,33) group by p;
count
-------
16
502
(2 rows)
This raises some other questions. Why can't postgres get the count(*)
from the index? Why doesn't it predict the correct number of rows in
the planner? (25 estimated vs 16 actual).
First question: Mostly the same reason. Not all of the index entries
are necessarily real active rows that you can see, so you would still
have to hit the heap file to get that data, so I'd guess you're already
hitting the entire heap file.
[Tom answered the second]
Tom Lane wrote:
Joseph Shraibman <jks@selectacast.net> writes:
This raises some other questions. Why can't postgres get the count(*)
from the index? Why doesn't it predict the correct number of rows in
the planner? (25 estimated vs 16 actual).The name of the game here is to make a plan *without* actually going
out and expending large amounts of time to find out the true state of
affairs; by the time you know for sure, you've already done the query.
Well I'd hope that extracting the count from the index should be very
low cost. That is what indecies are for.
We have to do a certain amount of guessing, otherwise the planner will
be a net drag on performance. Accordingly, the estimates will never be
perfectly accurate.
But certain things could be done. Like planning for the case of there
being a single not null value, and updating the indecies not to point at
expired rows. Isn't the point of a vacuum to get rid of old rows? Then
why doesn't it update the index as well?
I mean the explain shows that getting the count(*) from the field that
is indexed has to do a seq scan, presumably to determine if the rows are
in fact valid. That is ridiculous.
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
Stephan Szabo wrote:
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Of course, if the 10113-match estimate is wildly off (as it was in this
case), then the wrong plan may be chosen. But it IS NOT CORRECT to
suppose that indexscans always beat seqscans. The planner's job would
be a lot easier if that were true.Can't postgres do the index lookup first and find out there are only a
few tuples that might match?Well, theoretically the estimate is supposed to match reality. There are
still some cases where there isn't enough information kept to allow that
to be true (the case where there is a single very common non-NULL value is
one such case).
But the index should give the upper bounds of the query and show that
this that this query is not going to return 10113 rows. It appeared to
work like this in my query. I don't really know what his database is
like or how many times it was updated since he last vacuumed, but it
seems that postgres should have been able to tell that query would have
returned much less than 10113 entries.
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Stephan Szabo wrote:
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Of course, if the 10113-match estimate is wildly off (as it was in this
case), then the wrong plan may be chosen. But it IS NOT CORRECT to
suppose that indexscans always beat seqscans. The planner's job would
be a lot easier if that were true.Can't postgres do the index lookup first and find out there are only a
few tuples that might match?Well, theoretically the estimate is supposed to match reality. There are
still some cases where there isn't enough information kept to allow that
to be true (the case where there is a single very common non-NULL value is
one such case).But the index should give the upper bounds of the query and show that
this that this query is not going to return 10113 rows. It appeared to
work like this in my query. I don't really know what his database is
like or how many times it was updated since he last vacuumed, but it
seems that postgres should have been able to tell that query would have
returned much less than 10113 entries.
The problem is that the stats that are kept are woefully inadequate for
these cases. The problem is basically that IIRC it's taking the
most common value's # of appearances and using a fraction of that
as the estimate for any other value. This is not a really meaningful
estimate of the number of rows to return and there's been talk of how
to add more detailed statistics to make this number more meaningful.
And btree indexes really aren't all that good for getting the exact
number of entries - you'd be better off keeping that number somewhere
else, but MVCC would probably make that difficult, since I'd guess
that the different versions of rows would each have index entries
and not all of them apply to your transaction - which is why I think
it goes to the heap to test for visibility.
Stephan Szabo wrote:
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Stephan Szabo wrote:
On Mon, 19 Feb 2001, Joseph Shraibman wrote:
Of course, if the 10113-match estimate is wildly off (as it was in this
case), then the wrong plan may be chosen. But it IS NOT CORRECT to
suppose that indexscans always beat seqscans. The planner's job would
be a lot easier if that were true.Can't postgres do the index lookup first and find out there are only a
few tuples that might match?Well, theoretically the estimate is supposed to match reality. There are
still some cases where there isn't enough information kept to allow that
to be true (the case where there is a single very common non-NULL value is
one such case).But the index should give the upper bounds of the query and show that
this that this query is not going to return 10113 rows. It appeared to
work like this in my query. I don't really know what his database is
like or how many times it was updated since he last vacuumed, but it
seems that postgres should have been able to tell that query would have
returned much less than 10113 entries.The problem is that the stats that are kept are woefully inadequate for
these cases. The problem is basically that IIRC it's taking the
most common value's # of appearances and using a fraction of that
as the estimate for any other value. This is not a really meaningful
estimate of the number of rows to return and there's been talk of how
to add more detailed statistics to make this number more meaningful.
And btree indexes really aren't all that good for getting the exact
number of entries - you'd be better off keeping that number somewhere
else, but MVCC would probably make that difficult, since I'd guess
that the different versions of rows would each have index entries
and not all of them apply to your transaction - which is why I think
it goes to the heap to test for visibility.
You didn't address my point. The point was that an explain shows that
it is using the index to get an upper bounds, so why isn't it using
that?
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
On Tue, 20 Feb 2001, Joseph Shraibman wrote:
But the index should give the upper bounds of the query and show that
this that this query is not going to return 10113 rows. It appeared to
work like this in my query. I don't really know what his database is
like or how many times it was updated since he last vacuumed, but it
seems that postgres should have been able to tell that query would have
returned much less than 10113 entries.The problem is that the stats that are kept are woefully inadequate for
these cases. The problem is basically that IIRC it's taking the
most common value's # of appearances and using a fraction of that
as the estimate for any other value. This is not a really meaningful
estimate of the number of rows to return and there's been talk of how
to add more detailed statistics to make this number more meaningful.
And btree indexes really aren't all that good for getting the exact
number of entries - you'd be better off keeping that number somewhere
else, but MVCC would probably make that difficult, since I'd guess
that the different versions of rows would each have index entries
and not all of them apply to your transaction - which is why I think
it goes to the heap to test for visibility.You didn't address my point. The point was that an explain shows that
it is using the index to get an upper bounds, so why isn't it using
that?
Where are you seeing something that says the estimator/planner using the
index to get an upper bound? The estimator shouldn't be asking either the
index or the heap for anything, it should be working entirely with the
statistics that were generated from vacuum.
Stephan Szabo wrote:
Where are you seeing something that says the estimator/planner using the
index to get an upper bound? The estimator shouldn't be asking either the
index or the heap for anything, it should be working entirely with the
statistics that were generated from vacuum.
Index Scan using usertable_p_key on usertable (cost=0.00..25.68 rows=50
width=72)
That rows=50, which is an overestimate by the way.
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
On Tue, 20 Feb 2001, Joseph Shraibman wrote:
Stephan Szabo wrote:
Where are you seeing something that says the estimator/planner using the
index to get an upper bound? The estimator shouldn't be asking either the
index or the heap for anything, it should be working entirely with the
statistics that were generated from vacuum.Index Scan using usertable_p_key on usertable (cost=0.00..25.68 rows=50
width=72)That rows=50, which is an overestimate by the way.
That's because the estimate in this case was 50 and so it's estimating
that going through the index and checking the heap is faster than a
sequence scan. The *estimator* didn't use the index to figure that out,
it's just saying that the best plan to actually *run* the query uses
the index.
IIRC, There's something which is effectively :
estimated rows = <most common value's frequency>*<fraction>
I think fraction defaults to (is always?) 1/10 for the standard
index type. That's where the 50 comes from. And the frequency is
probably from the last vacuum analyze.
Stephan Szabo wrote:
On Tue, 20 Feb 2001, Joseph Shraibman wrote:
Stephan Szabo wrote:
Where are you seeing something that says the estimator/planner using the
index to get an upper bound? The estimator shouldn't be asking either the
index or the heap for anything, it should be working entirely with the
statistics that were generated from vacuum.Index Scan using usertable_p_key on usertable (cost=0.00..25.68 rows=50
width=72)That rows=50, which is an overestimate by the way.
That's because the estimate in this case was 50 and so it's estimating
that going through the index and checking the heap is faster than a
sequence scan. The *estimator* didn't use the index to figure that out,
it's just saying that the best plan to actually *run* the query uses
the index.
IIRC, There's something which is effectively :
estimated rows = <most common value's frequency>*<fraction>
I think fraction defaults to (is always?) 1/10 for the standard
index type. That's where the 50 comes from. And the frequency is
probably from the last vacuum analyze.
Then it should do the same thing no matter what value I use, but when I
do different searches in one case it estimates 50 when there are 16 and
in the other it estimeates 502 where there are 502.
--
Joseph Shraibman
jks@selectacast.net
Increase signal to noise ratio. http://www.targabot.com
On Tue, 20 Feb 2001, Joseph Shraibman wrote:
That's because the estimate in this case was 50 and so it's estimating
that going through the index and checking the heap is faster than a
sequence scan. The *estimator* didn't use the index to figure that out,
it's just saying that the best plan to actually *run* the query uses
the index.
IIRC, There's something which is effectively :
estimated rows = <most common value's frequency>*<fraction>
I think fraction defaults to (is always?) 1/10 for the standard
index type. That's where the 50 comes from. And the frequency is
probably from the last vacuum analyze.Then it should do the same thing no matter what value I use, but when I
do different searches in one case it estimates 50 when there are 16 and
in the other it estimeates 502 where there are 502.
It knows enough to do the special case where you are searching for the
most common value. I'd guess that's what's happening on the 502.
I think it stores the most common value and the fraction of rows that
represents as of last vacuum analyze.