Index not being used in MAX function (7.2.3)
Hi all:
I have here a table belonging to a message board (Phorum 3.3), and
there's an index in it that is not being used for reasons that I don't
understand. The table is:
Table "todocinetv"
Column | Type | Modifiers
-------------+-----------------------------+----------------------
id | integer | not null default '0'
datestamp | timestamp without time zone | not null
thread | integer | not null default '0'
parent | integer | not null default '0'
author | character(37) | not null default ''
subject | character(255) | not null default ''
email | character(200) | not null default ''
attachment | character(64) | default ''
host | character(50) | not null default ''
email_reply | character(1) | not null default 'N'
approved | character(1) | not null default 'N'
msgid | character(100) | not null default ''
modifystamp | integer | not null default '0'
userid | integer | not null default '0'
Indexes: todocinetv_approved,
todocinetv_author,
todocinetv_datestamp,
todocinetv_modifystamp,
todocinetv_msgid,
todocinetv_parent,
todocinetv_subject,
todocinetv_thread,
todocinetv_userid,
todocinetvpri_key
And the index "todocinetvpri_key" is created on the primary key (id).
Yet when I do:
explain select max(id) from todocinetv;
NOTICE: QUERY PLAN:
Aggregate (cost=30939.22..30939.22 rows=1 width=4)
-> Seq Scan on todocinetv (cost=0.00..30882.98 rows=22498 width=4)
It doesn't use the index, and surely, it takes forever. I have tried
with VACUUM ANALYZE, and also dropping the index, creating it again and
VACUUMing it, and it never uses it. The only explanation I can come up
with is that the MAX() function doesn't use indices; I have tried with
tables in other databases (running Postgres 7.2.1), and it doesn't use
the indices in any of them.
Is this the right behaviour? Or there is something else going on? The
table mentioned above is in a Postgres 7.2.3 server, while the other
ones that I used for testing were, as I said, in 7.2.1.
Paulo Jan.
DDnet.
Is your index a hash or btree?
Jon
On Tue, 10 Jun 2003, Paulo Jan wrote:
Show quoted text
Hi all:
I have here a table belonging to a message board (Phorum 3.3), and
there's an index in it that is not being used for reasons that I don't
understand. The table is:Table "todocinetv"
Column | Type | Modifiers
-------------+-----------------------------+----------------------
id | integer | not null default '0'
datestamp | timestamp without time zone | not null
thread | integer | not null default '0'
parent | integer | not null default '0'
author | character(37) | not null default ''
subject | character(255) | not null default ''
email | character(200) | not null default ''
attachment | character(64) | default ''
host | character(50) | not null default ''
email_reply | character(1) | not null default 'N'
approved | character(1) | not null default 'N'
msgid | character(100) | not null default ''
modifystamp | integer | not null default '0'
userid | integer | not null default '0'
Indexes: todocinetv_approved,
todocinetv_author,
todocinetv_datestamp,
todocinetv_modifystamp,
todocinetv_msgid,
todocinetv_parent,
todocinetv_subject,
todocinetv_thread,
todocinetv_userid,
todocinetvpri_keyAnd the index "todocinetvpri_key" is created on the primary key (id).
Yet when I do:explain select max(id) from todocinetv;
NOTICE: QUERY PLAN:Aggregate (cost=30939.22..30939.22 rows=1 width=4)
-> Seq Scan on todocinetv (cost=0.00..30882.98 rows=22498 width=4)It doesn't use the index, and surely, it takes forever. I have tried
with VACUUM ANALYZE, and also dropping the index, creating it again and
VACUUMing it, and it never uses it. The only explanation I can come up
with is that the MAX() function doesn't use indices; I have tried with
tables in other databases (running Postgres 7.2.1), and it doesn't use
the indices in any of them.
Is this the right behaviour? Or there is something else going on? The
table mentioned above is in a Postgres 7.2.3 server, while the other
ones that I used for testing were, as I said, in 7.2.1.Paulo Jan.
DDnet.---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Yep. It's a "feature" :-)
I had hard time understanding why too...
Apparently, there is nothing special about max() aggregate from the
query planner's standpoint, compared to any other aggregate function. It
doesn't know its specifics, and thus is not able to figure out that all
it really needs is to grab the first entry from the index...
To mak it quicker do this instead:
select id from todocinetv order by id desc limit 1;
I hope, it helps...
Dima
Paulo Jan wrote:
Show quoted text
Hi all:
I have here a table belonging to a message board (Phorum 3.3), and
there's an index in it that is not being used for reasons that I don't
understand. The table is:Table "todocinetv"
Column | Type | Modifiers
-------------+-----------------------------+----------------------
id | integer | not null default '0'
datestamp | timestamp without time zone | not null
thread | integer | not null default '0'
parent | integer | not null default '0'
author | character(37) | not null default ''
subject | character(255) | not null default ''
email | character(200) | not null default ''
attachment | character(64) | default ''
host | character(50) | not null default ''
email_reply | character(1) | not null default 'N'
approved | character(1) | not null default 'N'
msgid | character(100) | not null default ''
modifystamp | integer | not null default '0'
userid | integer | not null default '0'
Indexes: todocinetv_approved,
todocinetv_author,
todocinetv_datestamp,
todocinetv_modifystamp,
todocinetv_msgid,
todocinetv_parent,
todocinetv_subject,
todocinetv_thread,
todocinetv_userid,
todocinetvpri_keyAnd the index "todocinetvpri_key" is created on the primary key
(id). Yet when I do:explain select max(id) from todocinetv;
NOTICE: QUERY PLAN:Aggregate (cost=30939.22..30939.22 rows=1 width=4)
-> Seq Scan on todocinetv (cost=0.00..30882.98 rows=22498 width=4)It doesn't use the index, and surely, it takes forever. I have
tried with VACUUM ANALYZE, and also dropping the index, creating it
again and VACUUMing it, and it never uses it. The only explanation I
can come up with is that the MAX() function doesn't use indices; I
have tried with tables in other databases (running Postgres 7.2.1),
and it doesn't use the indices in any of them.
Is this the right behaviour? Or there is something else going on?
The table mentioned above is in a Postgres 7.2.3 server, while the
other ones that I used for testing were, as I said, in 7.2.1.Paulo Jan.
DDnet.---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
At 18:39 10.06.2003, Paulo Jan said:
--------------------[snip]--------------------
Hi all:
I have here a table belonging to a message board (Phorum 3.3), and
there's an index in it that is not being used for reasons that I don't
understand. The table is:...
select max(id) from todocinetv;
--------------------[snip]--------------------
Your question has already been answered, but:
I suspect that you're doing this to retrieve the row ID of a newly inserted
row. This may not be foolproof since others could already have inserted
rows in between your insertion and this ID lookup.
Assuming that you're using a sequence to provide the primary key (which you
should) you may safely query its current value:
SELECT currval('todocinetv_id_seq') as "newid"
This is guaranteed to return the last value **for your connection** only,
regardless if the sequence has actually been incremented by others or not.
And it's lightning fast.
Just my 2c :)
--
O Ernest E. Vogelsinger
(\) ICQ #13394035
^ http://www.vogelsinger.at/
Jonathan Bartlett wrote:
Is your index a hash or btree?
Jon
It's a btree, but anyway, I see that others have already answered my
question. So it's a "feature" and not a bug? Hrmpf.
(BTW, the code I was running wasn't written by me; it was part of
Phorum, a PHP web posting board application. I'll try to patch it to
make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).
Paulo Jan.
DDnet.
Hi all,
in a psql script for GnuMed (www.gnumed.org) I am using a
snippet like the following for setting up predefined test
accounts:
CREATE USER "test-doc"
WITH PASSWORD 'test-doc'
IN GROUP "gm-doctors", "gm-public"
VALID UNTIL '2003-09-30'
;
I would like to constrain their validity to, say, six months. I
have tried but not found a way to tell the VALID UNTIL clause
something like
now() + '6 months'::interval
Anyone have a suggestion (short of calculating in the client at
runtime and substituting) on how to do this in plain SQL ?
Thanks a lot,
Karsten Hilbert, MD
GnuMed i18n coordinator
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
Karsten Hilbert <Karsten.Hilbert@gmx.net> writes:
I have tried but not found a way to tell the VALID UNTIL clause
something like
now() + '6 months'::interval
CREATE USER, like pretty much all utility statements in Postgres,
won't do any expression evaluation --- the parameters have to be
simple literal constants.
The only workaround I can think of is to do the CREATE USER inside
a helper plpgsql function that does the time calculation and then
uses EXECUTE to run the command with the appropriate values plugged
in. It seems unlikely that this is a cleaner solution than doing it
in the client code ...
regards, tom lane
I think it has to be done like this: "now() + '@ 6 month' as
six_months_from_now;" which yields
six_months_from_now
------------------------
2003-12-11 09:38:24-05
(1 row)
Jim Apsey
--------------------------------------------------------------------------------------
Karsten Hilbert wrote:
Show quoted text
Hi all,
in a psql script for GnuMed (www.gnumed.org) I am using a
snippet like the following for setting up predefined test
accounts:CREATE USER "test-doc"
WITH PASSWORD 'test-doc'
IN GROUP "gm-doctors", "gm-public"
VALID UNTIL '2003-09-30'
;I would like to constrain their validity to, say, six months. I
have tried but not found a way to tell the VALID UNTIL clause
something likenow() + '6 months'::interval
Anyone have a suggestion (short of calculating in the client at
runtime and substituting) on how to do this in plain SQL ?Thanks a lot,
Karsten Hilbert, MD
GnuMed i18n coordinator
Wouldn't it make sense to optimize max() and min() for indexed columns? I don't
know if I'm barking up the wrong tree, but would it be possible to create an
aggregate (o_max, o_min) to make the query planner treat it differently from
other aggregates? IMO, (if possible...) this would be a more elegant solution
than SQL'ing around the "feature". If it is possible, it might be a nifty
contrib module, poised for inclusion in the production code. Any
takers/thoughts? :)
CG
--- Paulo Jan <admin@digital.ddnet.es> wrote:
Jonathan Bartlett wrote:
Is your index a hash or btree?
Jon
It's a btree, but anyway, I see that others have already answered my
question. So it's a "feature" and not a bug? Hrmpf.
(BTW, the code I was running wasn't written by me; it was part of
Phorum, a PHP web posting board application. I'll try to patch it to
make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).Paulo Jan.
DDnet.---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
I think this is a wonderful idea. I often wondered why max(column) makes a
full table scan.
B.T.W.: If your max column is indexed, use
select column from table order by column desc limit 1
which gives you the maximum value in no time at all.
Detlef
-----Ursprungliche Nachricht-----
Von: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org]Im Auftrag von Chris Gamache
Gesendet: Mittwoch, 11. Juni 2003 17:44
An: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] Index not being used in MAX function (7.2.3)
Wouldn't it make sense to optimize max() and min() for indexed columns? I
don't
know if I'm barking up the wrong tree, but would it be possible to create an
aggregate (o_max, o_min) to make the query planner treat it differently from
other aggregates? IMO, (if possible...) this would be a more elegant
solution
than SQL'ing around the "feature". If it is possible, it might be a nifty
contrib module, poised for inclusion in the production code. Any
takers/thoughts? :)
CG
--- Paulo Jan <admin@digital.ddnet.es> wrote:
Jonathan Bartlett wrote:
Is your index a hash or btree?
Jon
It's a btree, but anyway, I see that others have already answered my
question. So it's a "feature" and not a bug? Hrmpf.
(BTW, the code I was running wasn't written by me; it was part of
Phorum, a PHP web posting board application. I'll try to patch it to
make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).Paulo Jan.
DDnet.---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
On Wed, Jun 11, 2003 at 02:21:00PM +0200, Paulo Jan wrote:
It's a btree, but anyway, I see that others have already answered my
question. So it's a "feature" and not a bug? Hrmpf.
(BTW, the code I was running wasn't written by me; it was part of
Phorum, a PHP web posting board application. I'll try to patch it to
make it use "SELECT id... ORDER BY id DESC LIMIT 1" and see how it goes).
Not to drag this out further, but you might want to hold off on that
patch. 7.4 is supposed to use indexes for max/min.
--
Jim C. Nasby (aka Decibel!) jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828
Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"
"Jim C. Nasby" <jim@nasby.net> writes:
Not to drag this out further, but you might want to hold off on that
patch. 7.4 is supposed to use indexes for max/min.
Where did you get that idea?
There's been no change in the basic problem, which is that no one has
proposed a reasonably general method of translating aggregates into
index manipulations. Postgres has an extremely general, extensible
concept of aggregates, and we're not going to mess it up with some
poorly-designed hack. But show me a clean design and implementation,
and it'll go in.
regards, tom lane
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 11, 2003 10:03 AM
To: jim@nasby.net
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)"Jim C. Nasby" <jim@nasby.net> writes:
Not to drag this out further, but you might want to hold
off on that
patch. 7.4 is supposed to use indexes for max/min.
Where did you get that idea?
There's been no change in the basic problem, which is that no
one has proposed a reasonably general method of translating
aggregates into index manipulations. Postgres has an
extremely general, extensible concept of aggregates, and
we're not going to mess it up with some poorly-designed hack.
But show me a clean design and implementation, and it'll go in.
Is this a poorly designed hack:
Select max(expression) from <join> where <filter>
Becomes:
If (non_hashed_index_exists_on_expression) then
/* Transform expression to: */
select (expression) from <join> where <filter> order by
<expression> limit to 1 rows
else
do_what_you_are_doing_right_now
endif
?
Import Notes
Resolved by subject fallback
-----Original Message-----
From: Dann Corbit
Sent: Wednesday, June 11, 2003 10:12 AM
To: Tom Lane; jim@nasby.net
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 11, 2003 10:03 AM
To: jim@nasby.net
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)"Jim C. Nasby" <jim@nasby.net> writes:
Not to drag this out further, but you might want to hold
off on that
patch. 7.4 is supposed to use indexes for max/min.
Where did you get that idea?
There's been no change in the basic problem, which is that no
one has proposed a reasonably general method of translating
aggregates into index manipulations. Postgres has an
extremely general, extensible concept of aggregates, and
we're not going to mess it up with some poorly-designed hack.
But show me a clean design and implementation, and it'll go in.Is this a poorly designed hack:
Select max(expression) from <join> where <filter>
Becomes:
If (non_hashed_index_exists_on_expression) then
/* Transform expression to: */
select (expression) from <join> where <filter> order
by <expression> limit to 1 rows
"Order by <expression> DESC" for max and ASC for min.
else
do_what_you_are_doing_right_now
endif
?---------------------------(end of
broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
Import Notes
Resolved by subject fallback
"Dann Corbit" <DCorbit@connx.com> writes:
Is this a poorly designed hack:
Select max(expression) from <join> where <filter>
Well, to start with, any nonempty WHERE probably invalidates the
optimization, and I doubt it works if there's a join rather than a
single table involved. But you're just handwaving --- the devil is in
the details. How will you identify which aggregates are MIN and MAX
(no, I don't like the idea of relying on the names; remember we have
an extensible set of aggregates)? What will you do when there are
multiple aggregates in the command --- or more generally, how complex a
context for the aggregate call are you going to be able to support?
Where exactly is this transformation going to occur in the planning
pipeline, and how many cycles will we waste checking for the possible
transform in cases where it doesn't apply? Questions like these are
where we've gotten bogged down in the past. You might care to consult
the pgsql-hackers archives for past discussions.
regards, tom lane
I wonder if a macro system might be warranted - then have max be a macro
instead of an aggregate. However, I don't know exactly how that would
work since it involves the whole statement. Anyway, just an idea to
hopefully spur someone else's thinking cap :)
Jon
On Wed, 11 Jun 2003, Tom Lane wrote:
Show quoted text
"Dann Corbit" <DCorbit@connx.com> writes:
Is this a poorly designed hack:
Select max(expression) from <join> where <filter>Well, to start with, any nonempty WHERE probably invalidates the
optimization, and I doubt it works if there's a join rather than a
single table involved. But you're just handwaving --- the devil is in
the details. How will you identify which aggregates are MIN and MAX
(no, I don't like the idea of relying on the names; remember we have
an extensible set of aggregates)? What will you do when there are
multiple aggregates in the command --- or more generally, how complex a
context for the aggregate call are you going to be able to support?
Where exactly is this transformation going to occur in the planning
pipeline, and how many cycles will we waste checking for the possible
transform in cases where it doesn't apply? Questions like these are
where we've gotten bogged down in the past. You might care to consult
the pgsql-hackers archives for past discussions.regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
On Wed, Jun 11, 2003 at 10:44:22 -0700,
Jonathan Bartlett <johnnyb@eskimo.com> wrote:
I wonder if a macro system might be warranted - then have max be a macro
instead of an aggregate. However, I don't know exactly how that would
work since it involves the whole statement. Anyway, just an idea to
hopefully spur someone else's thinking cap :)
I don't think that would work. There are going to be some cases
where the aggregate is better than the subselect (and not just when
there isn't an appropiate index). And in some cases distinct on order by
may be the best way to get what you want.
-----Original Message-----
From: Bruno Wolff III [mailto:bruno@wolff.to]
Sent: Wednesday, June 11, 2003 11:30 AM
To: Jonathan Bartlett
Cc: Tom Lane; Dann Corbit; jim@nasby.net; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Index not being used in MAX function (7.2.3)On Wed, Jun 11, 2003 at 10:44:22 -0700,
Jonathan Bartlett <johnnyb@eskimo.com> wrote:I wonder if a macro system might be warranted - then have max be a
macro instead of an aggregate. However, I don't knowexactly how that
would work since it involves the whole statement. Anyway, just an
idea to hopefully spur someone else's thinking cap :)I don't think that would work. There are going to be some
cases where the aggregate is better than the subselect (and
not just when there isn't an appropiate index). And in some
cases distinct on order by may be the best way to get what you want.
Isn't that the optimizer's job to figure out? The whole idea of SQL is
to abstract the queries and allow the optimizer to make all the smart
choices about plans and stuff.
I do realize that it is very "non-trivial" to implement, but min() and
max() are used so often it seems it might be useful.
Here are some "free to use" templates for statistical functions:
ftp://cap.connx.com/tournament_software/Kahan.Hpp
ftp://cap.connx.com/tournament_software/STATS.HPP
The Kahan template is an extremely accurate adder (does not lose
precision like direct summation).
The Stats template (which uses the Kahan adder) does all sorts of things
like skew, kurtosis, min, max, stddev, average, count, sum etc. all
simultaneously.
Our product uses a similar template to produce all kinds of useful
statistical information. See:
http://www.connx.com/products/connx/Connx%208.8%20UserGuide/connxcdd32.h
tm
And look at the statistical functions book.
No, we don't do the optimization I have suggested for min/max, but I
hope to poke it into our tool set some day. However, we do have a
function called "sortfirst()" and a function called "sortlast() " both
of which do perform the suggested optimizations [when possible].
Perhaps PostgreSQL could do something similar.
Import Notes
Resolved by subject fallback
On Wed, Jun 11, 2003 at 11:41:26 -0700,
Dann Corbit <DCorbit@connx.com> wrote:
Isn't that the optimizer's job to figure out? The whole idea of SQL is
to abstract the queries and allow the optimizer to make all the smart
choices about plans and stuff.
In theory yes. My comment was specifically addressing the idea of using
a macro. I don't think this would work because subselects aren't
always the best way to do this.
There are other areas where how you write queries makes a big difference
in performance even though the queries are logically equivalent.
On Wed, Jun 11, 2003 at 11:59:36 -0700,
Dennis Gearon <gearond@cvc.net> wrote:
I guess the question is, are other big iron data bases using indexes on
MAX/MIN functions, and how are they doing it?
It is easier for them to do it because they don't have to worry about
a function named max not really being a maximum.
Import Notes
Reply to msg id not found: 3EE77C18.8080305@cvc.net