Wrong index choosen?

Started by Gaetano Mendolaover 21 years ago9 messages

Gaetano Mendola

mendola@bigfoot.com

over 21 years ago

I hall
I have a query in this form:

empdb=# explain analyze select * from v_past_connections where id_user = 26195 and login_time > '2004-07-21';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_user_logs_login_time on user_logs (cost=0.00..14.10 rows=1 width=28) (actual time=66.890..198.998 rows=5 loops=1)
Index Cond: (login_time > '2004-07-21 00:00:00+02'::timestamp with time zone)
Filter: (id_user = 26195)
Total runtime: 199.083 ms
(4 rows)

as you see the index on the time stamp column is used

The table have indexes on both columns:

empdb=# explain analyze select * from v_past_connections where login_time > '2004-07-21';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_user_logs_login_time on user_logs (cost=0.00..12.90 rows=481 width=28) (actual time=7.338..661.300 rows=22477 loops=1)
Index Cond: (login_time > '2004-07-21 00:00:00+02'::timestamp with time zone)
Total runtime: 676.472 ms
(3 rows)

empdb=# explain analyze select * from v_past_connections where id_user = 26195;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_user_user_logs on user_logs (cost=0.00..252.47 rows=320 width=28) (actual time=4.420..100.122 rows=221 loops=1)
Index Cond: (id_user = 26195)
Total runtime: 100.348 ms
(3 rows)

The rows filtered out with both condictions are two order of magnitude differents,
also the extimated rows are close to real numbers:

empdb=# select count(*) from v_past_connections where id_user = 26195;
count
-------
221
(1 row)

empdb=# select count(*) from v_past_connections where login_time > '2004-07-21';
count
-------
22441
(1 row)

why then the planner choose to do an index scan using the filter that retrieve a bigger ammount of rows ? A bug ?

Regards
Gaetano Mendola

Dennis Bjorklund

db@zigo.dhs.org

over 21 years ago

In reply to: Gaetano Mendola (#1)

Re: Wrong index choosen?

On Fri, 23 Jul 2004, Gaetano Mendola wrote:

empdb=# explain analyze select * from v_past_connections where login_time > '2004-07-21';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_user_logs_login_time on user_logs (cost=0.00..12.90 rows=481 width=28) (actual time=7.338..661.300 rows=22477 loops=1)
Index Cond: (login_time > '2004-07-21 00:00:00+02'::timestamp with time zone)
Total runtime: 676.472 ms
(3 rows)

In this plan it estimates to get 481 but it got 22477. So the estimation
was very wrong. You can increase the statistics tarhet on the login_time
and it will probably be better (after the next analyze).

why then the planner choose to do an index scan using the filter that
retrieve a bigger ammount of rows ? A bug ?

Because it has to decide on the plan before it knows exactly what the
result will be. As seen above, the estimation was wrong and thus the plan
was not as good as it could have been.

In this case you probably also want to create a combined index on both
columns:

CREATE INDEX foo ON user_log (id_user, login_time);

ps. This letter belonged to pgsql-performance and not pgsql-hackers.

--
/Dennis Bjï¿½rklund

tgl@sss.pgh.pa.us

over 21 years ago

In reply to: Dennis Bjorklund (#2)

Re: Wrong index choosen?

Dennis Bjorklund <db@zigo.dhs.org> writes:

In this plan it estimates to get 481 but it got 22477. So the estimation
was very wrong. You can increase the statistics tarhet on the login_time
and it will probably be better (after the next analyze).

Given the nature of the data (login times), I'd imagine that the problem
is simply that he hasn't analyzed recently enough. A bump in stats
target may not be needed, but he's going to have to re-analyze that
column often if he wants this sort of query to be estimated accurately,
because the fraction of entries later than a given time T is *always*
going to be changing.

regards, tom lane

Gaetano Mendola

mendola@bigfoot.com

over 21 years ago

In reply to: Tom Lane (#3)

Re: [HACKERS] Wrong index choosen?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:

| Dennis Bjorklund <db@zigo.dhs.org> writes:
|
|>In this plan it estimates to get 481 but it got 22477. So the estimation
|>was very wrong. You can increase the statistics tarhet on the login_time
|>and it will probably be better (after the next analyze).
|
|
| Given the nature of the data (login times), I'd imagine that the problem
| is simply that he hasn't analyzed recently enough. A bump in stats
| target may not be needed, but he's going to have to re-analyze that
| column often if he wants this sort of query to be estimated accurately,
| because the fraction of entries later than a given time T is *always*
| going to be changing.

Well know that I think about it, I felt my shoulders covered by
pg_autovacuum but looking at the log I see that table never analyzed!
Aaargh.

I already applied the patch for the autovacuum but evidently I have to
make it more aggressive, I'm sorry that I can not made him more aggressive
only for this table.

Thank you all.

Regards
Gaetano Mendola

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBAU3g7UpzwH2SGd4RAhbEAKDLbKXLGRqphBbfyBh6cu7QoqFQhACfdDtu
cGS0K1UuTuwTDp4P2JjQ30A=
=aepf
-----END PGP SIGNATURE-----

Matthew T. O'Connor

matthew@zeut.net

over 21 years ago

In reply to: Gaetano Mendola (#4)

Re: [HACKERS] Wrong index choosen?

Gaetano Mendola wrote:

Tom Lane wrote:
| Given the nature of the data (login times), I'd imagine that the problem
| is simply that he hasn't analyzed recently enough. A bump in stats
| target may not be needed, but he's going to have to re-analyze that
| column often if he wants this sort of query to be estimated accurately,
| because the fraction of entries later than a given time T is *always*
| going to be changing.

Well know that I think about it, I felt my shoulders covered by
pg_autovacuum but looking at the log I see that table never analyzed!
Aaargh.

I already applied the patch for the autovacuum but evidently I have to
make it more aggressive, I'm sorry that I can not made him more aggressive
only for this table.

Yeah, the version of autovacuum in 7.4 contrib doesn't allow table
specific settings. The patch I have sumbitted for 7.5 does, so
hopefully this will be better in the future.

You can however set the VACUUM and ANALYZE thresholds independently.
So perhpaps it will help you if you set your ANALYZE setting to be very
aggressive and your VACUUM settings to something more standard.

Matthew

systemguards@yahoo.com

over 21 years ago

In reply to: Matthew T. O'Connor (#5)

Re: [HACKERS] Wrong index choosen?

Hi all,

just as a question.

There will be some day a feature that let you force
the planner to use an specific index, like oracle
does?

Of course the planner is smart enough most times but
sometimes such an option would be usefull, don't you
think so?

Thanx in advance,
Jaime Casanova

_________________________________________________________
Do You Yahoo!?
Informaciï¿½n de Estados Unidos y Amï¿½rica Latina, en Yahoo! Noticias.
Visï¿½tanos en http://noticias.espanol.yahoo.com

smarlowe@qwest.net

over 21 years ago

In reply to: Jaime Casanova (#6)

Re: [HACKERS] Wrong index choosen?

On Fri, 2004-07-23 at 15:51, Jaime Casanova wrote:

Hi all,

just as a question.

There will be some day a feature that let you force
the planner to use an specific index, like oracle
does?

Of course the planner is smart enough most times but
sometimes such an option would be usefull, don't you
think so?

A planner that always made the right choice would be the most useful
thing. After that, the ability to "push" the planner towards an index
would be pretty nice.

Adding features that make PostgreSQL more error prone (i.e. forcing
particular index usage, etc.) and harder to drive but allow an expert to
get what they want is kind of a dangerous road to tread.

Gaetano Mendola

mendola@bigfoot.com

over 21 years ago

In reply to: Matthew T. O'Connor (#5)

Re: [HACKERS] Wrong index choosen?

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Matthew T. O'Connor wrote:

| Gaetano Mendola wrote:
|
|> Tom Lane wrote:
|> | Given the nature of the data (login times), I'd imagine that the
|> problem
|> | is simply that he hasn't analyzed recently enough. A bump in stats
|> | target may not be needed, but he's going to have to re-analyze that
|> | column often if he wants this sort of query to be estimated accurately,
|> | because the fraction of entries later than a given time T is *always*
|> | going to be changing.
|>
|> Well know that I think about it, I felt my shoulders covered by
|> pg_autovacuum but looking at the log I see that table never analyzed!
|> Aaargh.
|>
|> I already applied the patch for the autovacuum but evidently I have to
|> make it more aggressive, I'm sorry that I can not made him more
|> aggressive
|> only for this table.
|
|
| Yeah, the version of autovacuum in 7.4 contrib doesn't allow table
| specific settings. The patch I have sumbitted for 7.5 does, so
| hopefully this will be better in the future.
|
| You can however set the VACUUM and ANALYZE thresholds independently. So
| perhpaps it will help you if you set your ANALYZE setting to be very
| aggressive and your VACUUM settings to something more standard.

Well I think pg_autovacuum as is in 7.4 can not help me for this particular
table.

The table have 4.8 milions rows and I have for that table almost 10252 new
entries for day.

I'm using pg_autovacuum with -a 200 -A 0.8 this means a threashold for
that table equal to: 3849008 and if I understod well the way pg_autovacuum
works this means have an analyze each 375 days, and I need an analyze for
each day, at least.

So I think is better for me put an analyze for that table in the cron.

Am I wrong ?

Regards
Gaetano Mendola

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBAag87UpzwH2SGd4RAqb1AJ416ioVEY5T/dqnAQsaaqqoWcU3ZACghzsO
4xMowWp/MM8+i7DhoRO4018=
=/gNn
-----END PGP SIGNATURE-----

Matthew T. O'Connor

matthew@zeut.net

over 21 years ago

In reply to: Gaetano Mendola (#8)

Re: [HACKERS] Wrong index choosen?

Gaetano Mendola wrote:

Well I think pg_autovacuum as is in 7.4 can not help me for this particular
table.

The table have 4.8 milions rows and I have for that table almost 10252 new
entries for day.

I'm using pg_autovacuum with -a 200 -A 0.8 this means a threashold for
that table equal to: 3849008 and if I understod well the way pg_autovacuum
works this means have an analyze each 375 days, and I need an analyze for
each day, at least.

So I think is better for me put an analyze for that table in the cron.

Am I wrong ?

No, I think you are right. You could do something like -a 1000 -A
.00185, but that will probably for an analyze too often for most of your
other tables.