index on timestamp performance

Started by Eric Choletover 23 years ago7 messagesgeneral

cholet@logilune.com

over 23 years ago

I have this schema:

The performance I'm getting from the index that contains
'date' is much slower than when using the objid index
(different queries of course). This is a 10 million row
table. Am I right to assume that postgres needs to do
more work because it has to convert the dates to some
internal (integer?) format?
Consequently would I gain performance if I changed the
schema so that 'date' is an integer (which would not
be a problem for my application) ?

Thanks,
--
Eric Cholet

Evan Zane Macosko

macosko@fas.harvard.edu

over 23 years ago

In reply to: Eric Cholet (#1)

UNSUBSCRIBE ME PLEASE!

To whomever runs this list:

I have tried repeatedly to get majordomo to unsubscribe me from this list,
to no avail. I do not currently have the means of filtering my inbox at
this address, so receiving these emails is EXTREMELY annoying for me.
PLEASE PLEASE PLEASE PLEASE unsubscribe me. PLEASE.

On Wed, 29 Jan 2003, Eric Cholet wrote:

Show quoted text

I have this schema:

motid | integer | not null
objid | integer | not null
date | timestamp without time zone | not null
Indexes: dico_frs_motid_date btree (motid, date)
dico_frs_objid btree (objid)

The performance I'm getting from the index that contains
'date' is much slower than when using the objid index
(different queries of course). This is a 10 million row
table. Am I right to assume that postgres needs to do
more work because it has to convert the dates to some
internal (integer?) format?
Consequently would I gain performance if I changed the
schema so that 'date' is an integer (which would not
be a problem for my application) ?

Thanks,
--
Eric Cholet

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 23 years ago

In reply to: Eric Cholet (#1)

Re: index on timestamp performance

On Wed, 29 Jan 2003, Eric Cholet wrote:

I have this schema:

motid | integer | not null
objid | integer | not null
date | timestamp without time zone | not null
Indexes: dico_frs_motid_date btree (motid, date)
dico_frs_objid btree (objid)

The performance I'm getting from the index that contains
'date' is much slower than when using the objid index
(different queries of course). This is a 10 million row
table. Am I right to assume that postgres needs to do
more work because it has to convert the dates to some
internal (integer?) format?

What does explain (analyze if possible) show for the two queries?
It could just be a difference in plans or estimates.

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Eric Cholet (#1)

Re: index on timestamp performance

Eric Cholet <cholet@logilune.com> writes:

Consequently would I gain performance if I changed the
schema so that 'date' is an integer

No, you wouldn't, at least not enough to notice. (But if you only need
a date and not a time of day, consider changing the column type to
"date".)

I suspect the issue is selectivity of the queries, but with no actual
query examples or EXPLAIN ANALYZE results to look at, that's just idle
speculation.

regards, tom lane

Eric Cholet

cholet@logilune.com

over 23 years ago

In reply to: Stephan Szabo (#3)

Re: index on timestamp performance

--On Wednesday, January 29, 2003 08:51:30 -0800 Stephan Szabo
<sszabo@megazone23.bigpanda.com> wrote:

On Wed, 29 Jan 2003, Eric Cholet wrote:

I have this schema:

motid | integer | not null
objid | integer | not null
date | timestamp without time zone | not null
Indexes: dico_frs_motid_date btree (motid, date)
dico_frs_objid btree (objid)

The performance I'm getting from the index that contains
'date' is much slower than when using the objid index
(different queries of course). This is a 10 million row
table. Am I right to assume that postgres needs to do
more work because it has to convert the dates to some
internal (integer?) format?

What does explain (analyze if possible) show for the two queries?

I can't really run two equivalent queries that will each use
a different index. Here's a query that uses the index with 'date'
(output wrapped manually)

=> explain analyze select objid from dico_frs where motid=1247
and date <= '2003-01-29 17:55:17' and date >= '2002-10-29 17:55:17'
order by date desc limit 11;

Limit (cost=4752.14..4752.17 rows=11 width=12) (actual time=63.20..63.37
rows=11 loops=1)
-> Sort (cost=4752.14..4755.11 rows=1187 width=12) (actual
time=63.17..63.23 rows=12 loops=1)
Sort Key: date
-> Index Scan using dico_frs_motid_date on dico_frs
(cost=0.00..4691.50 rows=1187 width=12) (actual time=0.08..41.88 rows=2924
loops=1)
Index Cond: ((motid = 1247) AND (date <= '2003-01-29
17:55:17'::timestamp without time zone) AND (date >= '2002-10-29
17:55:17'::timestamp without time zone))
Total runtime: 63.93 msec
(6 rows)

It could just be a difference in plans or estimates.

Right, but still I'd like to know whether the timestamp datatype in the
index
results in more work than an integer datatype.

Thanks,
--
Eric Cholet

Stephan Szabo

sszabo@megazone23.bigpanda.com

over 23 years ago

In reply to: Eric Cholet (#5)

Re: index on timestamp performance

On Wed, 29 Jan 2003, Eric Cholet wrote:

--On Wednesday, January 29, 2003 08:51:30 -0800 Stephan Szabo
<sszabo@megazone23.bigpanda.com> wrote:

On Wed, 29 Jan 2003, Eric Cholet wrote:

I have this schema:

motid | integer | not null
objid | integer | not null
date | timestamp without time zone | not null
Indexes: dico_frs_motid_date btree (motid, date)
dico_frs_objid btree (objid)

The performance I'm getting from the index that contains
'date' is much slower than when using the objid index
(different queries of course). This is a 10 million row
table. Am I right to assume that postgres needs to do
more work because it has to convert the dates to some
internal (integer?) format?

What does explain (analyze if possible) show for the two queries?

I can't really run two equivalent queries that will each use
a different index. Here's a query that uses the index with 'date'
(output wrapped manually)

Well, I was wondering for example, the objid queries you were comparing
to, were they returning an equivalent number of about 3000 rows from the
index scan? I think part of the difference may be that here the plan
grabs the rows from the index and then sorts them all so the limit doesn't
actually save you significant time. For better speed on this particular
sort of query, you might be better off with an order of :
order by motid desc, date desc

=> explain analyze select objid from dico_frs where motid=1247
and date <= '2003-01-29 17:55:17' and date >= '2002-10-29 17:55:17'
order by date desc limit 11;

Limit (cost=4752.14..4752.17 rows=11 width=12) (actual time=63.20..63.37
rows=11 loops=1)
-> Sort (cost=4752.14..4755.11 rows=1187 width=12) (actual
time=63.17..63.23 rows=12 loops=1)
Sort Key: date
-> Index Scan using dico_frs_motid_date on dico_frs
(cost=0.00..4691.50 rows=1187 width=12) (actual time=0.08..41.88 rows=2924
loops=1)
Index Cond: ((motid = 1247) AND (date <= '2003-01-29
17:55:17'::timestamp without time zone) AND (date >= '2002-10-29
17:55:17'::timestamp without time zone))
Total runtime: 63.93 msec
(6 rows)

It could just be a difference in plans or estimates.

Right, but still I'd like to know whether the timestamp datatype in the
index
results in more work than an integer datatype.

Almost certainly a little bit, but not so much that I would expect to see
order of magnitude differences or anything.

Eric Cholet

cholet@logilune.com

over 23 years ago

In reply to: Stephan Szabo (#6)

Re: index on timestamp performance

--On Wednesday, January 29, 2003 10:45:34 -0800 Stephan Szabo
<sszabo@megazone23.bigpanda.com> wrote:

motid | integer | not null
objid | integer | not null
date | timestamp without time zone | not null
Indexes: dico_frs_motid_date btree (motid, date)
dico_frs_objid btree (objid)

Well, I was wondering for example, the objid queries you were comparing
to, were they returning an equivalent number of about 3000 rows from the
index scan? I think part of the difference may be that here the plan
grabs the rows from the index and then sorts them all so the limit doesn't
actually save you significant time. For better speed on this particular
sort of query, you might be better off with an order of :
order by motid desc, date desc

Indeed, the speed gain is amazing, thanks for your explanation, I understand
things a little better now. Note to self: next time don't make any
assumptions,
just post the explain analyze!

=> explain analyze select objid from dico_frs where motid=1247
and date <= '2003-01-29 17:55:17' and date >= '2002-10-29 17:55:17'
order by date desc limit 11;

Limit (cost=4752.14..4752.17 rows=11 width=12) (actual
time=63.20..63.37 rows=11 loops=1)
-> Sort (cost=4752.14..4755.11 rows=1187 width=12) (actual
time=63.17..63.23 rows=12 loops=1)
Sort Key: date
-> Index Scan using dico_frs_motid_date on dico_frs
(cost=0.00..4691.50 rows=1187 width=12) (actual time=0.08..41.88
rows=2924 loops=1)
Index Cond: ((motid = 1247) AND (date <= '2003-01-29
17:55:17'::timestamp without time zone) AND (date >= '2002-10-29
17:55:17'::timestamp without time zone))
Total runtime: 63.93 msec
(6 rows)

--
Eric Cholet