Multicolumn index for single-column queries?

Started by rihadalmost 7 years ago12 messagesgeneral
Jump to latest
#1rihad
rihad@mail.ru

Hi. Say there are 2 indexes:

"foo_index" btree (foo_id)

"multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to updating them both, but wouldn't searches
on foo_id alone become slower?

Thanks.

#2Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: rihad (#1)
Re: Multicolumn index for single-column queries?

On 18/04/2019 18:52, rihad wrote:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in their
WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index
and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

Thanks.

The multi column index will require more RAM to hold it.  So if there is
memory contention, then there would be an increased risk of swapping,
leading to slower query times.

I suspect that if there is more than enough RAM, then a multi column
index will be slightly slower than a single column index. However, the
difference will probably be lost in the noise -- in other words, the
various things happening in the background will most likely to have far
more significant impact on query duration.  IMHO

Cheers,
Gavin

#3Andreas Kretschmer
andreas@a-kretschmer.de
In reply to: rihad (#1)
Re: Multicolumn index for single-column queries?

Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in their
WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index
and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com

#4Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Andreas Kretschmer (#3)
Re: Multicolumn index for single-column queries?

Andreas Kretschmer wrote:

Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

"foo_index" btree (foo_id)

"multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in their
WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index
and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

To be more explicit: if you can live with a slightly less efficient
index scan and want fast data modifications, use only the second index.

If you hardly ever update the table, don't mind the wasted space and
want every bit of query speed (data warehouse), having both indexes
might be better.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

#5Ron
ronljohnsonjr@gmail.com
In reply to: Andreas Kretschmer (#3)
Re: Multicolumn index for single-column queries?

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:

Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in their
WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove foo_index
and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

But two indexes are larger than one index, and updating two indexes requires
more disk IO than updating one index.

(Prefix compression would obviate the need for this question.  Then your
multi-column index would be *much* smaller.)

--
Angular momentum makes the world go 'round.

#6Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Ron (#5)
Re: Multicolumn index for single-column queries?

On 19/04/2019 01:24, Ron wrote:

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:

Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in
their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove
foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

But two indexes are larger than one index, and updating two indexes
requires more disk IO than updating one index.

Agreed.

A key question would be: how often is the query run, compared to the
frequency Insertions, Updates, and Deletions -- wrt the table.

(Prefix compression would obviate the need for this question. Then
your multi-column index would be *much* smaller.)

True, but a multi column index will still be bigger than single column
index.

[...]

#7Harald Fuchs
hari.fuchs@gmail.com
In reply to: rihad (#1)
Re: Multicolumn index for single-column queries?

Andreas Kretschmer <andreas@a-kretschmer.de> writes:

Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in
their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove
foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.

#8Ron
ronljohnsonjr@gmail.com
In reply to: Gavin Flower (#6)
Re: Multicolumn index for single-column queries?

On 4/18/19 8:45 AM, Gavin Flower wrote:

On 19/04/2019 01:24, Ron wrote:

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:

[snip]

(Prefix compression would obviate the need for this question. Then your
multi-column index would be *much* smaller.)

True, but a multi column index will still be bigger than single column index.

TANSTAAFL.

--
Angular momentum makes the world go 'round.

#9Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Ron (#8)
Re: Multicolumn index for single-column queries?

On 19/04/2019 02:12, Ron wrote:

On 4/18/19 8:45 AM, Gavin Flower wrote:

On 19/04/2019 01:24, Ron wrote:

On 4/18/19 2:14 AM, Andreas Kretschmer wrote:

[snip]

(Prefix compression would obviate the need for this question. Then
your multi-column index would be *much* smaller.)

True, but a multi column index will still be bigger than single
column index.

TANSTAAFL.

QUOTE: [Oh, 'tanstaafl.' Means ~There ain't no such thing as a free lunch.']
From The Moon is a Harsh Mistress, by Robert Heinlein. Published 1966
Is where I first came across TANSTAAFL.

However, it appears to have been used at least as early as 1949.

Just adding this, as probably there are many people who don't know the
acronym.

#10Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Harald Fuchs (#7)
Re: Multicolumn index for single-column queries?

On 19/04/2019 01:47, Harald Fuchs wrote:

Andreas Kretschmer <andreas@a-kretschmer.de> writes:

Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in
their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove
foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.

The number of rows referenced by an index entry for the multi_index will
always be less than or equal to those for the matching foo_index.

Also there will be fewer index entries per block for the multi_index. 
Which is why the I/O count will be higher; even in the best case, where
there is an equal row referenced by the index entries.

#11Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Harald Fuchs (#7)
Re: Multicolumn index for single-column queries?

On 19/04/2019 01:47, Harald Fuchs wrote:

Andreas Kretschmer<andreas@a-kretschmer.de> writes:

Am 18.04.19 um 08:52 schrieb rihad:

Hi. Say there are 2 indexes:

    "foo_index" btree (foo_id)

    "multi_index" btree (foo_id, approved, expires_at)

foo_id is an integer. Some queries involve all three columns in
their WHERE clauses, some involve only foo_id.
Would it be ok from general performance standpoint to remove
foo_index and rely only on multi_index? I know that
PG would have to do less work updating just one index compared to
updating them both, but wouldn't searches
on foo_id alone become slower?

it depends .

it depends on the queries you are using, on your workload. a
multi-column-index will be large than an index over just one column,
therefore you will have more disk-io when you read from such an index.

I think it also depends on the average number of rows having the same foo_id.

The number of rows referenced by an index entry for the multi_index will
always be less than or equal to those for the matching foo_index.

Also there will be fewer index entries per block for the multi_index,
which is why the I/O count will be higher even in the best case where
there is an equal number of rows referenced by each index entry.

#12Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Gavin Flower (#11)
Re: Multicolumn index for single-column queries?

On 19/04/2019 14:01, Gavin Flower wrote:
[...]

Also there will be fewer index entries per block for the multi_index,
which is why the I/O count will be higher even in the best case where
there is an equal number of rows referenced by each index entry.

Not sure why my system had this still in my draft folder!

Sorry, for the duplication...