How to use the BRIN index properly?
Hello,
We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.
Issues we are facing:
- as I understand, BRIN index is useful only if the data is stored in
index order. As an example we want to create a composite BRIN index on 3
columns - integers and strings (varchar). How can we tell Postgres to store
data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh
and up-to-date as new records are inserted. is this correct?
Thanks for your help.
S.
our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14
On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:
Show quoted text
Hello,
We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.Issues we are facing:
- as I understand, BRIN index is useful only if the data is stored in
index order. As an example we want to create a composite BRIN index on 3
columns - integers and strings (varchar). How can we tell Postgres to store
data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index fresh
and up-to-date as new records are inserted. is this correct?Thanks for your help.
S.
Is the data in your tables stored in natural correlation with those *three*
columns? I'm dubious that can even happen.
BRIN is best for *range queries* on tables who's data is added in the same
order as the key in the BRIN index (for example, a BRIN index on a timestamp
field in a log table where new records are always being appended in
"timestamp" order).
It would also be great for history tables where you can pre-sort the data
by, for example, customer_id, and then put the BRIN on customer_id.
On 2/8/23 13:58, Siddharth Jain wrote:
our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:
Hello,
We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.Issues we are facing:
* as I understand, BRIN index is useful only if the data is stored
in index order. As an example we want to create a composite BRIN
index on 3 columns - integers and strings (varchar). How can we
tell Postgres to store data in index order as new records are
inserted into the database?
* i understand that turning on autosummarize will keep the index
fresh and up-to-date as new records are inserted. is this correct?Thanks for your help.
S.
--
Born in Arizona, moved to Babylonia.
As I explained in my question that is indeed our dilemma. Our insertion
order will not be equal to index order. i.e., referring to your response:
who's data is added in the same order as the key in the BRIN index
does NOT hold.
On Wed, Feb 8, 2023 at 12:27 PM Ron <ronljohnsonjr@gmail.com> wrote:
Show quoted text
Is the data in your tables stored in natural correlation with those
*three* columns? I'm dubious that can even happen.BRIN is best for *range queries* on tables who's data is added in the
same order as the key in the BRIN index (for example, a BRIN index on a
timestamp field in a log table where new records are always being appended
in "timestamp" order).It would also be great for history tables where you can pre-sort the data
by, for example, customer_id, and then put the BRIN on customer_id.On 2/8/23 13:58, Siddharth Jain wrote:
our insertion order is of course != index order otherwise the question
would have been trivial.
we use postgres 14On Wed, Feb 8, 2023 at 11:51 AM Siddharth Jain <siddhsql@gmail.com> wrote:
Hello,
We have large tables with billions of rows in them and want to take
advantage of the BRIN index on them.Issues we are facing:
- as I understand, BRIN index is useful only if the data is stored in
index order. As an example we want to create a composite BRIN index on 3
columns - integers and strings (varchar). How can we tell Postgres to store
data in index order as new records are inserted into the database?
- i understand that turning on autosummarize will keep the index
fresh and up-to-date as new records are inserted. is this correct?Thanks for your help.
S.
--
Born in Arizona, moved to Babylonia.
On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
As I explained in my question that is indeed our dilemma. Our insertion order will not be equal to index order. i.e., referring to your response:
who's data is added in the same order as the key in the BRIN index
does NOT hold.
A BRIN index is not a good choice in this case. You can CLUSTER the data on an index, but that's a one-time operation: PostgreSQL will not maintain that order after the CLUSTER. If the number of rows in the table at the time of the CLUSTER is much larger than the number that are inserted between CLUSTER operations, then a BRIN index might be useful, but clustering a very large table is an expensive operation, and requires an exclusive lock on the table while it is being done.
OK so in that case we are left with the B-Tree index.
If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?
On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:
Show quoted text
On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
As I explained in my question that is indeed our dilemma. Our insertion
order will not be equal to index order. i.e., referring to your response:
who's data is added in the same order as the key in the BRIN index
does NOT hold.
A BRIN index is not a good choice in this case. You can CLUSTER the data
on an index, but that's a one-time operation: PostgreSQL will not maintain
that order after the CLUSTER. If the number of rows in the table at the
time of the CLUSTER is much larger than the number that are inserted
between CLUSTER operations, then a BRIN index might be useful, but
clustering a very large table is an expensive operation, and requires an
exclusive lock on the table while it is being done.
On Feb 8, 2023, at 14:14, Siddharth Jain <siddhsql@gmail.com> wrote:
If the B-Tree index will be so large that it cannot fit in memory, then is it worth creating it at all?
Yes. Of course, more memory is better, and more recently versions of PostgreSQL have optimizations that are valuable for large B-tree indexes.
1. The whole index does not need to fit in memory, just the parts of it you
need at that time.
2. Partition the table by the primary key. Each index will be *much*
smaller, since each child will be smaller.
On 2/8/23 16:14, Siddharth Jain wrote:
OK so in that case we are left with the B-Tree index.
If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:
On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
As I explained in my question that is indeed our dilemma. Our
insertion order will not be equal to index order. i.e., referring to
your response:who's data is added in the same order as the key in the BRIN index
does NOT hold.
A BRIN index is not a good choice in this case. You can CLUSTER the
data on an index, but that's a one-time operation: PostgreSQL will not
maintain that order after the CLUSTER. If the number of rows in the
table at the time of the CLUSTER is much larger than the number that
are inserted between CLUSTER operations, then a BRIN index might be
useful, but clustering a very large table is an expensive operation,
and requires an exclusive lock on the table while it is being done.
--
Born in Arizona, moved to Babylonia.
For membership/equality predicates (also partial, for multiple columns) you
could take a look at bloom indexes: they are quite efficient in terms of
space footprint, you can even choose how long is the signature for each
entry and how is distributed among the columns.
https://www.postgresql.org/docs/14/bloom.html
g
On Wed, 8 Feb 2023 at 23:15, Siddharth Jain <siddhsql@gmail.com> wrote:
Show quoted text
OK so in that case we are left with the B-Tree index.
If the B-Tree index will be so large that it cannot fit in memory, then is
it worth creating it at all? Are there any established patterns here?On Wed, Feb 8, 2023 at 1:21 PM Christophe Pettus <xof@thebuild.com> wrote:
On Feb 8, 2023, at 13:17, Siddharth Jain <siddhsql@gmail.com> wrote:
As I explained in my question that is indeed our dilemma. Our insertion
order will not be equal to index order. i.e., referring to your response:
who's data is added in the same order as the key in the BRIN index
does NOT hold.
A BRIN index is not a good choice in this case. You can CLUSTER the data
on an index, but that's a one-time operation: PostgreSQL will not maintain
that order after the CLUSTER. If the number of rows in the table at the
time of the CLUSTER is much larger than the number that are inserted
between CLUSTER operations, then a BRIN index might be useful, but
clustering a very large table is an expensive operation, and requires an
exclusive lock on the table while it is being done.