Why no performance boost although I added an index?

Started by Holger Marzenabout 23 years ago7 messagesgeneral

holger@marzen.de

about 23 years ago

Hi *,

I have an accounting table on postgres 7.2.4, and my favourite select
gets no performance boost if I add an index on the date column. But it
should be faster.

Without index:

|db1=# explain select date,
|db1-# to_char(sum(in_local),'9 999 999 999') as in,
|db1-# to_char(sum(out_local),'9 999 999 999') as out,
|db1-# to_char(sum(in_forward),'9 999 999 999') as in_f,
|db1-# to_char(sum(out_forward),'9 999 999 999') as out_f,
|db1-# to_char(sum(out_local + out_forward + in_local + in_forward),'9999 999 999 99 9') as total
|db1-# from netacct
|db1-# where date > date 'today' - interval '10 days'
|db1-# and interface = 'ppp0'
|db1-# group by date;
|NOTICE: QUERY PLAN:
|
|Aggregate (cost=214.29..236.19 rows=146 width=20)
| -> Group (cost=214.29..217.94 rows=1460 width=20)
| -> Sort (cost=214.29..214.29 rows=1460 width=20)
| -> Seq Scan on netacct (cost=0.00..137.55 rows=1460 width=20)
|
|Aggregate (cost=215.13..237.13 rows=147 width=20)
| (actual time=3152.03..3161.54 rows=11 loops=1)
|-> Group (cost=215.13..218.80 rows=1467 width=20)
| (actual time=3150.96..3154.93 rows=265 loops=1)
| -> Sort (cost=215.13..215.13 rows=1467 width=20)
| (actual time=3150.93..3151.46 rows=265 loops=1)
| -> Seq Scan on netacct (cost=0.00..138.00 rows=1467 width=20)
| (actual time=2950.10..3147.15 rows=265 loops=1)
|Total runtime: 3162.27 msec

And now after a "create index netacct_ix1 on netacct(date)" and vacuum analyze:

|Aggregate (cost=0.00..176.40 rows=147 width=20)
| -> Group (cost=0.00..158.07 rows=1467 width=20)
| -> Index Scan using netacct_ix1 on netacct (cost=0.00..154.40 rows=1467 width=20

|Aggregate (cost=0.00..176.40 rows=147 width=20)
| (actual time=3128.57..3337.59 rows=11 loops=1)
|-> Group (cost=0.00..158.07 rows=1467 width=20)
| (actual time=3108.24..3327.61 rows=265 loops=1)
| -> Index Scan using netacct_ix1 on netacct (cost=0.00..154.40 rows=1467 width=20)
| (actual time=3108.21..3322.22 rows=265 loops=1)
|Total runtime: 3338.37 msec

So the index is used, but the execution time is greater than without
index. Why that? Is the overhead using an index the biggest factor
because there are only a few thousand rows in the table?

--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1

sszabo@megazone23.bigpanda.com

about 23 years ago

In reply to: Holger Marzen (#1)

Re: Why no performance boost although I added an index?

On Mon, 7 Apr 2003, Holger Marzen wrote:

I have an accounting table on postgres 7.2.4, and my favourite select
gets no performance boost if I add an index on the date column. But it
should be faster.

Not necessarily (see below)

So the index is used, but the execution time is greater than without
index. Why that? Is the overhead using an index the biggest factor
because there are only a few thousand rows in the table?

If the index scan is reading a large enough percentage of the rows (and
depending on the clustering of values), it may be reading enough pages
that there's no advantage (or even a disadvantage) to using the index.
This is due to both the reads of the index itself and the fact that it'll
often be reading the values in the main table (it still needs to get the
commit info from the table data) in random order rather than sequential
order which can lose some optimizations the OS often gives to sequential
reads.

Jonathan Bartlett

johnnyb@eskimo.com

about 23 years ago

In reply to: Stephan Szabo (#2)

Re: Why no performance boost although I added an index?

If the index scan is reading a large enough percentage of the rows (and
depending on the clustering of values), it may be reading enough pages
that there's no advantage (or even a disadvantage) to using the index.
This is due to both the reads of the index itself and the fact that it'll
often be reading the values in the main table (it still needs to get the
commit info from the table data) in random order rather than sequential
order which can lose some optimizations the OS often gives to sequential
reads.

This is a really big lose if your index and table are on the same disk.

Jon

Show quoted text

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

dan@langille.org

about 23 years ago

In reply to: Jonathan Bartlett (#3)

Re: Why no performance boost although I added an index?

On 7 Apr 2003 at 15:05, Jonathan Bartlett wrote:

This is a really big lose if your index and table are on the same
disk.

Can you quantity "really big" for us please? Not all of us have more
than one disk per box. I haven't looked into the ability for PG to
put tables and indexes on specific disks.
--
Dan Langille : http://www.langille.org/

nolan@celery.tssi.com

about 23 years ago

In reply to: Dan Langille (#4)

Re: Why no performance boost although I added an index?

This is a really big lose if your index and table are on the same disk.

That implies there is an easy way to ensure that the index and table
aren't on the same disk. That's not obvious from the documentation.

Maybe I just don't understand which subdirectories the data and index are
kept in?
--
Mike Nolan

Import Notes

Resolved by subject fallback

Jonathan Bartlett

johnnyb@eskimo.com

about 23 years ago

In reply to: Dan Langille (#4)

Re: Why no performance boost although I added an index?

The problem is that if you are iterating over a large portion of your
dataset using an index, your disk has to do this:

Seek to the next index entry

Seek to the corresponding table entry

Seek to the next index entry

Seek to the corresponding table entry

...

If you get a lot of that, your disk will spend more time seeking
back-and-forth than actually reading data. Imagine if you used a CD
player and were constantly switching back between track 5 and track 9 of
the CD every 2 seconds - you would spend more time seeking than listening.

With two disks, you have one read-write head on the index, and one on the
table, so although they may be jumping around a little in the file, they
aren't moving nearly as much as if they are on the same disk. Buffering
may take the jumps out completely.

For small index scans, it makes little difference. For large index scans,
having the index on the same drive can really kill you.

To put them on different disks, you have to do the following:

Find the OID of the entity you wish to move.
Stop the database.
Move the entity to wherever you want it.
Symlink it to the original location.

The problems with this method are that:

a) The DB has to be down to do it

b) The symlink gets destroyed if you rebuild the index or cluster the
table.

Having said all that, I must say that I don't normally do this to my
databases, I just keep myself aware of the potential problems.

Jon

On Mon, 7 Apr 2003, Dan Langille wrote:

Show quoted text

On 7 Apr 2003 at 15:05, Jonathan Bartlett wrote:

This is a really big lose if your index and table are on the same
disk.

Can you quantity "really big" for us please? Not all of us have more
than one disk per box. I haven't looked into the ability for PG to
put tables and indexes on specific disks.
--
Dan Langille : http://www.langille.org/

bruce@momjian.us

about 23 years ago

In reply to: Jonathan Bartlett (#6)

Re: Why no performance boost although I added an index?

We have an FAQ item about this, 4.8.

---------------------------------------------------------------------------

Jonathan Bartlett wrote:

The problem is that if you are iterating over a large portion of your
dataset using an index, your disk has to do this:

Seek to the next index entry

Seek to the corresponding table entry

Seek to the next index entry

Seek to the corresponding table entry

...

If you get a lot of that, your disk will spend more time seeking
back-and-forth than actually reading data. Imagine if you used a CD
player and were constantly switching back between track 5 and track 9 of
the CD every 2 seconds - you would spend more time seeking than listening.

With two disks, you have one read-write head on the index, and one on the
table, so although they may be jumping around a little in the file, they
aren't moving nearly as much as if they are on the same disk. Buffering
may take the jumps out completely.

For small index scans, it makes little difference. For large index scans,
having the index on the same drive can really kill you.

To put them on different disks, you have to do the following:

Find the OID of the entity you wish to move.
Stop the database.
Move the entity to wherever you want it.
Symlink it to the original location.

The problems with this method are that:

a) The DB has to be down to do it

b) The symlink gets destroyed if you rebuild the index or cluster the
table.

Having said all that, I must say that I don't normally do this to my
databases, I just keep myself aware of the potential problems.

Jon

On Mon, 7 Apr 2003, Dan Langille wrote:

On 7 Apr 2003 at 15:05, Jonathan Bartlett wrote:

This is a really big lose if your index and table are on the same
disk.

Can you quantity "really big" for us please? Not all of us have more
than one disk per box. I haven't looked into the ability for PG to
put tables and indexes on specific disks.
--
Dan Langille : http://www.langille.org/

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073