How to use the BRIN index properly?

Started by Siddharth Jainabout 3 years ago9 messagesgeneral
Jump to latest
#1Siddharth Jain
siddhsql@gmail.com

Hello,

We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.

Issues we are facing:

- as I understand, BRIN index is useful only if the data is stored in
index order. As an example we want to create a composite BRIN index on 3
columns - integers and strings (varchar). How can we tell Postgres to store
data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh
and up-to-date as new records are inserted. is this correct?

Thanks for your help.

S.

#2Siddharth Jain
siddhsql@gmail.com
In reply to: Siddharth Jain (#1)
Re: How to use the BRIN index properly?

our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:

Show quoted text

Hello,

We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.

Issues we are facing:

- as I understand, BRIN index is useful only if the data is stored in
index order. As an example we want to create a composite BRIN index on 3
columns - integers and strings (varchar). How can we tell Postgres to store
data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh
and up-to-date as new records are inserted. is this correct?

Thanks for your help.

S.

#3Ron
ronljohnsonjr@gmail.com
In reply to: Siddharth Jain (#2)
Re: How to use the BRIN index properly?

Is the data in your tables stored in natural correlation with those *three*
columns?  I'm dubious that can even happen.

BRIN is best for *range queries* on tables who's data is added in the same
order as the key in the BRIN index (for example, a BRIN index on a timestamp
field in a log table where new records are always being appended in
"timestamp" order).

It would also be great for history tables where you can pre-sort the data
by, for example, customer_id, and then put the BRIN on customer_id.

On 2/8/23 13:58, Siddharth Jain wrote:

our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:

Hello,

We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.

Issues we are facing:

* as I understand, BRIN index is useful only if the data is stored
in index order. As an example we want to create a composite BRIN
index on 3 columns - integers and strings (varchar). How can we
tell Postgres to store data in index order as new records are
inserted into the database?
* i understand that turning on autosummarize will keep the index
fresh and up-to-date as new records are inserted. is this correct?

Thanks for your help.

S.

--
Born in Arizona, moved to Babylonia.

#4Siddharth Jain
siddhsql@gmail.com
In reply to: Ron (#3)
Re: How to use the BRIN index properly?

As I explained in my question that is indeed our dilemma. Our insertion
order will not be equal to index order. i.e., referring to your response:

who's data is added in the same order as the key in the BRIN index

does NOT hold.

On Wed, Feb 8, 2023 at 12:27 PM Ron <ronljohnsonjr@gmail.com> wrote:

Show quoted text

Is the data in your tables stored in natural correlation with those
*three* columns? I'm dubious that can even happen.

BRIN is best for *range queries* on tables who's data is added in the
same order as the key in the BRIN index (for example, a BRIN index on a
timestamp field in a log table where new records are always being appended
in "timestamp" order).

It would also be great for history tables where you can pre-sort the data
by, for example, customer_id, and then put the BRIN on customer_id.

On 2/8/23 13:58, Siddharth Jain wrote:

our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14

On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:

Hello,

We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.

Issues we are facing:

- as I understand, BRIN index is useful only if the data is stored in
index order. As an example we want to create a composite BRIN index on 3
columns - integers and strings (varchar). How can we tell Postgres to store
data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index
fresh and up-to-date as new records are inserted. is this correct?

Thanks for your help.

S.

--
Born in Arizona, moved to Babylonia.

#5Christophe Pettus
xof@thebuild.com
In reply to: Siddharth Jain (#4)
Re: How to use the BRIN index properly?

On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:

As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:

who's data is added in the same order as the key in the BRIN index

does NOT hold.

A BRIN index is not a good choice in this case. You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER. If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.

#6Siddharth Jain
siddhsql@gmail.com
In reply to: Christophe Pettus (#5)
Re: How to use the BRIN index properly?

OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:

Show quoted text

On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:

As I explained in my question that is indeed our dilemma. Our insertion

order will not be equal to index order. i.e., referring to your response:

who's data is added in the same order as the key in the BRIN index

does NOT hold.

A BRIN index is not a good choice in this case. You can CLUSTER the data
on an index, but that's a one-time operation: PostgreSQL will not maintain
that order after the CLUSTER. If the number of rows in the table at the
time of the CLUSTER is much larger than the number that are inserted
between CLUSTER operations, then a BRIN index might be useful, but
clustering a very large table is an expensive operation, and requires an
exclusive lock on the table while it is being done.

#7Christophe Pettus
xof@thebuild.com
In reply to: Siddharth Jain (#6)
Re: How to use the BRIN index properly?

On Feb 8, 2023, at 14:14, Siddharth Jain <siddhsql@gmail.com> wrote:

If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all?

Yes. Of course, more memory is better, and more recently versions of PostgreSQL have optimizations that are valuable for large B-tree indexes.

#8Ron
ronljohnsonjr@gmail.com
In reply to: Siddharth Jain (#6)
Re: How to use the BRIN index properly?

1. The whole index does not need to fit in memory, just the parts of it you
need at that time.
2. Partition the table by the primary key.  Each index will be *much*
smaller, since each child will be smaller.

On 2/8/23 16:14, Siddharth Jain wrote:

OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:

On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:

As I explained in my question that is indeed our dilemma. Our

insertion order will not be equal to index order. i.e., referring to
your response:

who's data is added in the same order as the key in the BRIN index

does NOT hold.

A BRIN index is not a good choice in this case.  You can CLUSTER the
data on an index, but that's a one-time operation: PostgreSQL will not
maintain that order after the CLUSTER.  If the number of rows in the
table at the time of the CLUSTER is much larger than the number that
are inserted between CLUSTER operations, then a BRIN index might be
useful, but clustering a very large table is an expensive operation,
and requires an exclusive lock on the table while it is being done.

--
Born in Arizona, moved to Babylonia.

#9GF
phabriz@gmail.com
In reply to: Siddharth Jain (#6)
Re: How to use the BRIN index properly?

For membership/equality predicates (also partial, for multiple columns) you
could take a look at bloom indexes: they are quite efficient in terms of
space footprint, you can even choose how long is the signature for each
entry and how is distributed among the columns.
https://www.postgresql.org/docs/14/bloom.html
g

On Wed, 8 Feb 2023 at 23:15, Siddharth Jain <siddhsql@gmail.com> wrote:

Show quoted text

OK so in that case we are left with the B-Tree index.

If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?

On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:

On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:

As I explained in my question that is indeed our dilemma. Our insertion

order will not be equal to index order. i.e., referring to your response:

who's data is added in the same order as the key in the BRIN index

does NOT hold.

A BRIN index is not a good choice in this case. You can CLUSTER the data
on an index, but that's a one-time operation: PostgreSQL will not maintain
that order after the CLUSTER. If the number of rows in the table at the
time of the CLUSTER is much larger than the number that are inserted
between CLUSTER operations, then a BRIN index might be useful, but
clustering a very large table is an expensive operation, and requires an
exclusive lock on the table while it is being done.