Indexes?
I have a table where I need to use "..where curdate between fromDate and toDate".
Is it best to have two indexes, one for FromDate and one for toDate or just one index for
both the fields?
Regards,
BTJ
--
-----------------------------------------------------------------------------------------------
Bj�rn T Johansen
btj@havleik.no
-----------------------------------------------------------------------------------------------
Someone wrote:
"I understand that if you play a Windows CD backwards you hear strange Satanic messages"
To which someone replied:
"It's even worse than that; play it forwards and it installs Windows"
-----------------------------------------------------------------------------------------------
I believe that it is better to have a concatenated key of (toDate,FromDate).
The reason the toDate should come first is that for more "recent" records,
finding curDates less than toDate is much more selective than finding
curDates greater than fromDate. Actually I'm not sure if fromDate is that
helpful either as part of the concatenated key (it probably depends) but
definitely not by itself.
If your usual query is someEarlyHistoricalDate between toDate and fromDate,
then the concatenated key should be (fromDate,toDate) instead.
If toDate is sometimes not known, I would use some fixed date far in the
future rather than a null.
Vincent
----- Original Message -----
From: "Bj�rn T Johansen" <btj@havleik.no>
To: <pgsql-general@postgresql.org>
Sent: Wednesday, December 01, 2004 10:11 PM
Subject: [GENERAL] Indexes?
Show quoted text
I have a table where I need to use "..where curdate between fromDate and
toDate".
Is it best to have two indexes, one for FromDate and one for toDate or
just one index for both the fields?Regards,
BTJ
--
-----------------------------------------------------------------------------------------------
Bj�rn T Johansenbtj@havleik.no
-----------------------------------------------------------------------------------------------
Someone wrote:
"I understand that if you play a Windows CD backwards you hear strange
Satanic messages"
To which someone replied:
"It's even worse than that; play it forwards and it installs Windows"
--------------------------------------------------------------------------------------------------------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
On Thu, Dec 02, 2004 at 07:11:29AM +0100, Bj�rn T Johansen wrote:
I have a table where I need to use "..where curdate between fromDate and
toDate".
Is it best to have two indexes, one for FromDate and one for toDate or just
one index for both the fields?
You could try it both ways and use EXPLAIN ANALYZE to see which
results in a faster plan, if that's what you mean by "best."
--
Michael Fuhr
http://www.fuhr.org/~mfuhr/
On Thu, Dec 02, 2004 at 07:11:29 +0100,
Bj�rn T Johansen <btj@havleik.no> wrote:
I have a table where I need to use "..where curdate between fromDate and
toDate".
Is it best to have two indexes, one for FromDate and one for toDate or just
one index for both the fields?
Assuming that curdate is something like the date when the query is being
run and that FromDate and toDate are columns in the table you are
searching, then you probably want indexes on each column. A combined
index scan wouldn't be useful. An index scan on either FromDate or toDate
might be useful depending on the distribution of values in those columns
and the value of curdate.
On Wed, Dec 01, 2004 at 23:16:48 -0800,
Vincent Hikida <vhikida@inreach.com> wrote:
I believe that it is better to have a concatenated key of
(toDate,FromDate). The reason the toDate should come first is that for more
"recent" records, finding curDates less than toDate is much more selective
than finding curDates greater than fromDate. Actually I'm not sure if
fromDate is that helpful either as part of the concatenated key (it
probably depends) but definitely not by itself.
I combined index won't be very useful for the kind of search he is doing.
And not having an index on FromDate could hurt in some cases depending
on the distribution of values.
Well, then it's decided to try with two indexes...
Thx... :)
BTJ
Bruno Wolff III wrote:
Show quoted text
On Wed, Dec 01, 2004 at 23:16:48 -0800,
Vincent Hikida <vhikida@inreach.com> wrote:I believe that it is better to have a concatenated key of
(toDate,FromDate). The reason the toDate should come first is that for more
"recent" records, finding curDates less than toDate is much more selective
than finding curDates greater than fromDate. Actually I'm not sure if
fromDate is that helpful either as part of the concatenated key (it
probably depends) but definitely not by itself.I combined index won't be very useful for the kind of search he is doing.
And not having an index on FromDate could hurt in some cases depending
on the distribution of values.---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Perhaps I'm missing something but let's say that the index has the
following:
toDate fromDate
1992-03-02 1991-01-23
1992-04-03 1990-06-13
1993-05-03 1991-01-22
...
...
...
2004-12-01 2003-02-22
2005-03-04 2003-02-22 (a)
2005-03-05 2004-12-15 (b)
2005-03-05 2004-06-18 (c)
2007-04-12 2005-06-18 (d)
Let's say that there are a million entries where the toDate is less than
today 2004-12-02. That is less than (a) in the index. From the index then
only a, b, c, and d should be scanned further. a and c would be picked based
on the index values because 2004-12-02 is between the from and end date.
However, b and d would be excluded immediately because the the from date is
greater than 2004-12-02 and would save the optimizer from even reading the
table for these index entries because the fromDate is in the index.
This may be a somewhat extreme example but my experience is in most systems
old historical data makes up the bulk of the data and newer data is a much
smaller amount. In addition most people are interested in data from the most
recent month.
Of course I may be mistaken about the data distribution.
Vincent
----- Original Message -----
From: "Bruno Wolff III" <bruno@wolff.to>
To: "Vincent Hikida" <vhikida@inreach.com>
Cc: <pgsql-general@postgresql.org>
Sent: Thursday, December 02, 2004 12:10 PM
Subject: Re: [GENERAL] Indexes?
Show quoted text
On Wed, Dec 01, 2004 at 23:16:48 -0800,
Vincent Hikida <vhikida@inreach.com> wrote:I believe that it is better to have a concatenated key of
(toDate,FromDate). The reason the toDate should come first is that for
more
"recent" records, finding curDates less than toDate is much more
selective
than finding curDates greater than fromDate. Actually I'm not sure if
fromDate is that helpful either as part of the concatenated key (it
probably depends) but definitely not by itself.I combined index won't be very useful for the kind of search he is doing.
And not having an index on FromDate could hurt in some cases depending
on the distribution of values.---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Thu, Dec 02, 2004 at 22:37:38 -0800,
Vincent Hikida <vhikida@inreach.com> wrote:
2004-12-01 2003-02-22
2005-03-04 2003-02-22 (a)
2005-03-05 2004-12-15 (b)
2005-03-05 2004-06-18 (c)
2007-04-12 2005-06-18 (d)Let's say that there are a million entries where the toDate is less than
today 2004-12-02. That is less than (a) in the index. From the index then
only a, b, c, and d should be scanned further. a and c would be picked
That is correct, but that part relies only on the part of the index dependent
on toDate.
based on the index values because 2004-12-02 is between the from and end
date. However, b and d would be excluded immediately because the the from
date is greater than 2004-12-02 and would save the optimizer from even
reading the table for these index entries because the fromDate is in the
index.
That is not correct. Postgres currently doesn't have a way to skip ahead
on an index scan. So what will happen is that a, b, c, d and will all be
considered and b and d removed by a filter rule.
This may be a somewhat extreme example but my experience is in most systems
old historical data makes up the bulk of the data and newer data is a much
smaller amount. In addition most people are interested in data from the
most recent month.
The idea of having a toDate index is good, it is just that having a multicolumn
index doesn't help for this problem. In fact by making the index wider, it
will slow things down.
Of course I may be mistaken about the data distribution.
The distribution of values is what makes toDate or FromDate a better index
(if any) to use. You may very well be correct that for most people toDate
will more likely be the better index to use.
based on the index values because 2004-12-02 is between the from and end
date. However, b and d would be excluded immediately because the the
from
date is greater than 2004-12-02 and would save the optimizer from even
reading the table for these index entries because the fromDate is in the
index.That is not correct. Postgres currently doesn't have a way to skip ahead
on an index scan. So what will happen is that a, b, c, d and will all be
considered and b and d removed by a filter rule.
OK. I got it now. Thanks :)