Speeding up index scans by truncating timestamp?

Started by Derrick Riceabout 15 years ago6 messagesgeneral
Jump to latest
#1Derrick Rice
derrick.rice@gmail.com

Hey folks,

I've got a table of historical events that 10 million rows over 18+ months.
Currently there is an index for the event timestamp. I'm wondering if
someone can respond to my curiosity regarding the performance of indexes
(BTree).

Would creating an index on the timestamp truncated to the *day* make the
index more efficient for queries which are interested in events falling in a
range of 7+ days?

How much would it hurt when the planner needs to tease out rows which fall
within the day range but not the real time range?

Would having both indexes allow the planner to index by the day for
entirely-included days and by the actual time for the boundary days?

The performance of the index isn't terrible. It can take up to cost 15,000
(which I assume is 15 seconds) for a query looking at 5 weeks of data. I'd
like to improve it if it's simple to do, and changing the index to be on the
truncated time seems like it could possibly be useful.

Derrick

#2Derrick Rice
derrick.rice@gmail.com
In reply to: Derrick Rice (#1)
Re: Speeding up index scans by truncating timestamp?

Would creating an index on the timestamp truncated to the *day* make the
index more efficient for queries which are interested in events falling in a
range of 7+ days?

I gave this a shot, changing the index to be on date_trunc('day',
timestamp). PostgreSQL (8.2) then decided not to use the index at all. Not
sure if this is a limitation of PostgreSQL or if it decided it was going to
be more expensive than the alternative.

Is the query optimizer capable of using the relationship between an index on
date_trunc(foo) and a query with "where foo < bar and foo > baz" ? At this
point the question is to satisfy my own curiosity.

Derrick

#3Michael Glaesemann
grzm@seespotcode.net
In reply to: Derrick Rice (#2)
Re: Speeding up index scans by truncating timestamp?

On Feb 15, 2011, at 10:00, Derrick Rice wrote:

Would creating an index on the timestamp truncated to the *day* make the
index more efficient for queries which are interested in events falling in a
range of 7+ days?

I gave this a shot, changing the index to be on date_trunc('day',
timestamp). PostgreSQL (8.2) then decided not to use the index at all. Not
sure if this is a limitation of PostgreSQL or if it decided it was going to
be more expensive than the alternative.

Is the query optimizer capable of using the relationship between an index on
date_trunc(foo) and a query with "where foo < bar and foo > baz" ? At this
point the question is to satisfy my own curiosity.

Providing the table definition, queries, and EXPLAIN and EXPLAIN ANALYZE output would help people provide more specific feedback.

Michael Glaesemann
grzm seespotcode net

#4Vick Khera
vivek@khera.org
In reply to: Derrick Rice (#2)
Re: Speeding up index scans by truncating timestamp?

On Tue, Feb 15, 2011 at 10:00 AM, Derrick Rice <derrick.rice@gmail.com> wrote:

Is the query optimizer capable of using the relationship between an index on
date_trunc(foo) and a query with "where foo < bar and foo > baz" ?  At this
point the question is to satisfy my own curiosity.

No. The query has to use the same function as the index does.

I also don't think that the storage space will be any less. A
timestamp is always stored in the same amount of space. All you're
doing is zeroing out the higher resolution bits of time.

#5Derrick Rice
derrick.rice@gmail.com
In reply to: Vick Khera (#4)
Re: Speeding up index scans by truncating timestamp?

On Tue, Feb 15, 2011 at 10:16 AM, Michael Glaesemann
<grzm@seespotcode.net>wrote:

Providing the table definition, queries, and EXPLAIN and EXPLAIN ANALYZE
output would help people provide more specific feedback.

Seemed a general enough question that it wasn't necessary. That, and I
wanted to figure out as much of it on my own as I could, rather than just
get the end-result of an expert's answer without all the knowledge of the
leg work.

Thank you, though.

On Tue, Feb 15, 2011 at 10:20 AM, Vick Khera <vivek@khera.org> wrote:

On Tue, Feb 15, 2011 at 10:00 AM, Derrick Rice <derrick.rice@gmail.com>
wrote:

Is the query optimizer capable of using the relationship between an index

on

date_trunc(foo) and a query with "where foo < bar and foo > baz" ? At

this

point the question is to satisfy my own curiosity.

No. The query has to use the same function as the index does.

Well that settles it, then. Thanks, Vick.

I also don't think that the storage space will be any less. A
timestamp is always stored in the same amount of space. All you're
doing is zeroing out the higher resolution bits of time.

It's been a while since I was intimate with the implementation of a btree.
I was guessing that it might make the tree more "dense" if there were more
values that were equal. A "dense" tree would be easier to scan when
grabbing all of the children of a particular node (which is the case when
doing a wide range comparison).

The little bit of review that I was able to do reminded me that equal or
unequal values don't make a tree more or less "dense". It could arguably
make inserts easier (because there's more acceptable places to put an item)
but each node will have n to m items regardless of their relationship to
each other.

On that topic... are the details of PostgreSQL's b-tree implementation found
anywhere outside of the code? i.e. what n,m-tree values it uses? Searched
docs and wiki with no luck.

Having fun relearning this stuff

Derrick

#6Alban Hertroys
dalroi@solfertje.student.utwente.nl
In reply to: Vick Khera (#4)
Re: Speeding up index scans by truncating timestamp?

On 15 Feb 2011, at 16:20, Vick Khera wrote:

On Tue, Feb 15, 2011 at 10:00 AM, Derrick Rice <derrick.rice@gmail.com> wrote:

Is the query optimizer capable of using the relationship between an index on
date_trunc(foo) and a query with "where foo < bar and foo > baz" ? At this
point the question is to satisfy my own curiosity.

I also don't think that the storage space will be any less. A
timestamp is always stored in the same amount of space. All you're
doing is zeroing out the higher resolution bits of time.

I suppose it would help to cast the column to date, both in the index creation and in the queries. I don't have time right now to verify that the storage space of a date is actually less than a timestamp, but I expect it would be (although, dates in PG have an awful lot of range!).

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,4d5adbbf11734598963003!