Creating big indexes

Started by Lok Palmost 2 years ago5 messagesgeneral
Jump to latest
#1Lok P
loknath.73@gmail.com

Hello,
We have a few tables having size ~5TB and are partitioned on a timestamp
column. They have ~90 partitions in them and are storing 90 days of data.
We want to create a couple of indexes on those tables. They are getting the
incoming transactions(mainly inserts) 24/7 , which are mostly happening on
the current day/live partition. Its RDS postgres version 15.4. So in this
situation

Should we go with below i.e one time create index command on the table..

CREATE INDEX CONCURRENTLY idx1 ON tab(column_name);
Or
create index on individual partitions from different sessions, say for
example create indexes on 30 partitions each from three different sessions
so as to finish all the 90 partitions faster?
CREATE INDEX CONCURRENTLY idx1 ON tab_part1(column_name);
CREATE INDEX CONCURRENTLY idx1 ON tab_part2(column_name);
.....
.....

Basically I have three questions:
1)If we can do this index creation activity online without impacting the
incoming transactions or do we have to take down time for this activity?
2)If we can't do it online then , what is the fastest method to do this
index creation activity ?
3)Should we change the DB parameters in a certain way to make the process
faster? We have currently set below parameters

max_parallel_workers-16
max_parallel_maintenance_workers-2
maintenance_work_mem- 4GB

Regards
Lok

#2sud
suds1434@gmail.com
In reply to: Lok P (#1)
Re: Creating big indexes

On Sat, Jun 8, 2024 at 12:53 PM Lok P <loknath.73@gmail.com> wrote:

Hello,
We have a few tables having size ~5TB and are partitioned on a timestamp
column. They have ~90 partitions in them and are storing 90 days of data.
We want to create a couple of indexes on those tables. They are getting the
incoming transactions(mainly inserts) 24/7 , which are mostly happening on
the current day/live partition. Its RDS postgres version 15.4. So in this
situation

Should we go with below i.e one time create index command on the table..

CREATE INDEX CONCURRENTLY idx1 ON tab(column_name);
Or
create index on individual partitions from different sessions, say for
example create indexes on 30 partitions each from three different sessions
so as to finish all the 90 partitions faster?
CREATE INDEX CONCURRENTLY idx1 ON tab_part1(column_name);
CREATE INDEX CONCURRENTLY idx1 ON tab_part2(column_name);
.....
.....

Basically I have three questions:
1)If we can do this index creation activity online without impacting the
incoming transactions or do we have to take down time for this activity?
2)If we can't do it online then , what is the fastest method to do this
index creation activity ?
3)Should we change the DB parameters in a certain way to make the process
faster? We have currently set below parameters

max_parallel_workers-16
max_parallel_maintenance_workers-2
maintenance_work_mem- 4GB

You can first create the index on the table using the "On ONLY"keyword,
something as below.

CREATE INDEX idx ON ONLY tab(col1);

Then create indexes on each partition in "concurrently" from multiple
sessions in chunks.

CREATE INDEX CONCURRENTLY idx_1 ON tab_part1(col1);
CREATE INDEX CONCURRENTLY idx_2 ON tab_part2(col1);

After this step finishes the table level index which was created in the
first step will be in valid state automatically.

#3Lok P
loknath.73@gmail.com
In reply to: sud (#2)
Re: Creating big indexes

On Sun, Jun 9, 2024 at 10:36 AM sud <suds1434@gmail.com> wrote:

You can first create the index on the table using the "On ONLY"keyword,
something as below.

CREATE INDEX idx ON ONLY tab(col1);

Then create indexes on each partition in "concurrently" from multiple
sessions in chunks.

CREATE INDEX CONCURRENTLY idx_1 ON tab_part1(col1);
CREATE INDEX CONCURRENTLY idx_2 ON tab_part2(col1);

After this step finishes the table level index which was created in the
first step will be in valid state automatically.

Thank you so much.
Should we also tweak the parameters related to the parallelism and memory
as I mentioned in the first post?

#4Lok P
loknath.73@gmail.com
In reply to: Lok P (#3)
Re: Creating big indexes

On Sun, Jun 9, 2024 at 10:39 AM Lok P <loknath.73@gmail.com> wrote:

On Sun, Jun 9, 2024 at 10:36 AM sud <suds1434@gmail.com> wrote:

You can first create the index on the table using the "On ONLY"keyword,
something as below.

CREATE INDEX idx ON ONLY tab(col1);

Then create indexes on each partition in "concurrently" from multiple
sessions in chunks.

CREATE INDEX CONCURRENTLY idx_1 ON tab_part1(col1);
CREATE INDEX CONCURRENTLY idx_2 ON tab_part2(col1);

After this step finishes the table level index which was created in the
first step will be in valid state automatically.

Thank you so much.
Should we also tweak the parameters related to the parallelism and memory
as I mentioned in the first post?

Additionally ,is it also possible to drop the indexes also from the big
partition table efficiently? To avoid the "transaction id wrap around" or
"table bloat" when the index drop runs for longer duration?

#5sud
suds1434@gmail.com
In reply to: Lok P (#4)
Re: Creating big indexes

On Sun, Jun 9, 2024 at 1:40 PM Lok P <loknath.73@gmail.com> wrote:

On Sun, Jun 9, 2024 at 10:39 AM Lok P <loknath.73@gmail.com> wrote:

On Sun, Jun 9, 2024 at 10:36 AM sud <suds1434@gmail.com> wrote:

You can first create the index on the table using the "On ONLY"keyword,
something as below.

CREATE INDEX idx ON ONLY tab(col1);

Then create indexes on each partition in "concurrently" from multiple
sessions in chunks.

CREATE INDEX CONCURRENTLY idx_1 ON tab_part1(col1);
CREATE INDEX CONCURRENTLY idx_2 ON tab_part2(col1);

After this step finishes the table level index which was created in the
first step will be in valid state automatically.

Thank you so much.
Should we also tweak the parameters related to the parallelism and memory
as I mentioned in the first post?

Additionally ,is it also possible to drop the indexes also from the big
partition table efficiently? To avoid the "transaction id wrap around" or
"table bloat" when the index drop runs for longer duration?

I have never tried , but I think you can do "drop index concurrently" from
multiple sessions at same time for each of the partitions to make the drop
index finish quicker, similar to the "create index" statement as mentioned
above. Others may comment.