Date indexing

Started by Ian Cassalmost 24 years ago5 messagesgeneral
Jump to latest
#1Ian Cass
ian.cass@mblox.com

Hi,

I've been trying to do date range index lookups. According to EXPLAIN, my
sql is using the appropriate index, but the result takes forever to return
(~10mins) even tho the dataset it's referencing is reasonably small (~5000
lines). Similar queries on a similar sized dataset on the same database that
are not referencing a date range, but a specific field (WHERE user_name =
'U12345' for example) take a matter of seconds to return.

The index is on a timestamp field, and the only way I could find to get it
to use this index was something like the following...

AND (messages.client_id = '89' AND messages.originator like '%' AND
messages.logtime >= timestamp '20/04/2002 00:00:00' AND messages.logtime <=
timestamp '20/04/2002 23:59:59')

Index is on logtime, client_id, originator
logtime = timezone
client id = int2
originator = text

I'll only ever be referencing this data to a granularity of 1 day, so is
there a way I can use a function to index this so that the date column in
the index is text containing just DD/MM/YYYY? I did try to index something
like..

create index messages_200204_ix3 on messages_200204
(to_char(logtime,'DD/MM/YYYY'), client_id, originator)

...but it didn't work. Gives me parse errors on '.

Are date range index lookups slow on Postgres?

Can forward more details if required.

--
Ian Cass

#2Dmitry Tkach
dmitry@openratings.com
In reply to: Ian Cass (#1)
Re: Date indexing

Ian Cass wrote:

Hi,

I've been trying to do date range index lookups. According to EXPLAIN, my
sql is using the appropriate index, but the result takes forever to return
(~10mins) even tho the dataset it's referencing is reasonably small (~5000
lines). Similar queries on a similar sized dataset on the same database that
are not referencing a date range, but a specific field (WHERE user_name =
'U12345' for example) take a matter of seconds to return.

The index is on a timestamp field, and the only way I could find to get it
to use this index was something like the following...

AND (messages.client_id = '89' AND messages.originator like '%' AND
messages.logtime >= timestamp '20/04/2002 00:00:00' AND messages.logtime <=
timestamp '20/04/2002 23:59:59')

Index is on logtime, client_id, originator
logtime = timezone
client id = int2
originator = text

This won't be able to make much use of the compound index, because you
are not searching for exact match on the first column, and the other
columns are not in your criteria at all ... Thus your index is only as
good as being just on the timestamp alone.

10 minutes still sounds pretty excessive though, even for a seq. scan
on a 5000 rows table. Have you vacuum'ed it lately?
Try running 'vacuum full' on that table, it might help a lot, if you
were ding lots of updates to it.

I'll only ever be referencing this data to a granularity of 1 day, so is
there a way I can use a function to index this so that the date column in
the index is text containing just DD/MM/YYYY?

Perhaps, you should use date instead of timestamp - that way you'd be
able to search for exact match, instead of a range...

I did try to index something
like..

create index messages_200204_ix3 on messages_200204
(to_char(logtime,'DD/MM/YYYY'), client_id, originator)

...but it didn't work. Gives me parse errors on '.

You can't do that - functional indexes can only be on a single column :-(
You could hack around that with something like this:

create function concat_fields (timestamp, int, text) returns text as
'select to_char ($1,'DD/MM/YYYY') || \':\' || $2 || \':\' || $3'
language 'sql' with (iscacahable);

And then,
create index ... on ... (concat_fields(logtime,client_id,originator));

BUT:

- There is a memory leak that will barf during the index creation if
your table is large (I have a patch for it, but Tom Lane thinks it is
not 'systemic' enough to put it in... I could send it to you if you
want to though)...

- You don't seem to be making use of the compound index anyway...
Perhaps, just having an index on timestamp alone will do better

- You'd have to modify your query to make it use the index like this:
select ... where
concat_fields (logtime,client_id,originator) like '20/04/2002:%:%'

Once again, this is only as good (actually, a little worse) as
just having the logtime as date, and scanning a single-column
index for an exact match.

And don't forget to run the vacuum - I think that this alone should
fix your problem (in general, you shouldn't need any indexes AT ALL
on a 5000 rows table).

I hope, it helps...

Dima

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ian Cass (#1)
Re: Date indexing

"Ian Cass" <ian.cass@mblox.com> writes:

I'll only ever be referencing this data to a granularity of 1 day, so is
there a way I can use a function to index this so that the date column in
the index is text containing just DD/MM/YYYY?

Don't use text, use a date.

regression=# create table foo (f1 timestamp without time zone);
CREATE
regression=# create index fooi on foo(date(f1));
CREATE
regression=# explain select * from foo where date(f1) = current_date;
QUERY PLAN
-----------------------------------------------------------------
Index Scan using fooi on foo (cost=0.00..17.09 rows=5 width=8)
Index Cond: (date(f1) = date('now'::text))
(2 rows)

(This is with current devel sources, as you can possibly tell from the
EXPLAIN format, but should work fine in 7.2; less sure about 7.1.)

Note that a coercion from timestamp *with* timezone to date will
not be allowed as an index function, because it's dependent on
external information --- viz, your timezone. So I had to use
timestamp without time zone in this example.

One might also wonder why you're not just storing the column as type
date rather than timestamp in the first place, if you're certain you
do not need finer resolution.

regards, tom lane

#4Ian Cass
ian.cass@mblox.com
In reply to: Ian Cass (#1)
Re: Date indexing

Don't use text, use a date.

Thanks, I'll try that.

One might also wonder why you're not just storing the column as type
date rather than timestamp in the first place, if you're certain you
do not need finer resolution.

Because in other instances, I do need the extra resolution. This is a high
level overview report I'm trying to write.

--
Ian Cass

#5Ian Cass
ian.cass@mblox.com
In reply to: Ian Cass (#1)
Re: Date indexing

This won't be able to make much use of the compound index, because you
are not searching for exact match on the first column, and the other
columns are not in your criteria at all ... Thus your index is only as
good as being just on the timestamp alone.

This makes sense. I'll change the order of my indexing.

10 minutes still sounds pretty excessive though, even for a seq. scan
on a 5000 rows table. Have you vacuum'ed it lately?
Try running 'vacuum full' on that table, it might help a lot, if you
were ding lots of updates to it.

No, the actual table contains more like 10mill lines, however, the dataset
I'd be referencing if the index were working as I intended would be a mere
5k rows.

Thanks for your help. I'll try this later today.

--
Ian Cass