Does the work made by vaccum in the current pass is lost when interrupted?

Started by Jean-Marc Lessardover 5 years ago4 messagesgeneral
Jump to latest
#1Jean-Marc Lessard
Jean-Marc.Lessard@ultra-ft.com

I have a large table (billions of records) which has not been vacuum and bloated.
Vacuum scale factor was left at the default.

I ran a vacuum on a DEV system and it makes several passes (scanning heap/vacuuming indexes/vacumming heap) which take more than an hour each.
On a PROD system, I may have to kill the job midway.
Should I reduce the autovacuum_work_mem of my session? Currently 1GB
Does the work made in the current pass is lost when interrupted?
Correct me if I am not right, but vacuumed indexes and heap made by the current pass will go to disk, but the heap should be rescan on the next vacuum.
I guess that the visibility map is updated only at the end of each pass.

My best regards

Thank you

Jean-Marc Lessard
Administrateur de base de données / Database Administrator

[cid:image001.jpg@01D69D5F.64DB0EA0]

Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com<http://www.ultra-forensictechnology.com/&gt;

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
#2Allan Kamau
kamauallan@gmail.com
In reply to: Jean-Marc Lessard (#1)
Re: Does the work made by vaccum in the current pass is lost when interrupted?

On Thu, Oct 8, 2020 at 10:16 PM Jean-Marc Lessard <
Jean-Marc.Lessard@ultra-ft.com> wrote:

I have a large table (billions of records) which has not been vacuum and
bloated.

Vacuum scale factor was left at the default.

I ran a vacuum on a DEV system and it makes several passes (scanning
heap/vacuuming indexes/vacumming heap) which take more than an hour each.

On a PROD system, I may have to kill the job midway.

Should I reduce the autovacuum_work_mem of my session? Currently 1GB

Does the work made in the current pass is lost when interrupted?

Correct me if I am not right, but vacuumed indexes and heap made by the
current pass will go to disk, but the heap should be rescan on the next
vacuum.

I guess that the visibility map is updated only at the end of each pass.

My best regards

Thank you

*Jean-Marc Lessard*
Administrateur de base de données / Database Administrator

Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com

A suggestion, considering that you have a table with billions of tuples,
you could look into table partitioning, see "
https://www.postgresql.org/docs/current/ddl-partitioning.html&quot;.

First you may need to identify a criteria by which to partition your data,
maybe by record population timestamp binned into yyyymm resolution.

One a test environment, you may construct a table similar to the one
currently having the many records but with a different name.

Then write code which will construct the child tables, run the sql to
construct the child tables.
If your child tables are based on the record date, you may construct child
tables for the timestamps of the record you have as well as several months
or timepoints in the future. Maybe have a cron job to construct new tables
of future timepoints.

Then write code to populate the new tables directly in piecemeal by the use
of WHERE clause with data from the current production table.

You may choose to write the above code to use "COPY" or "INSERT" to
populate the specific partition table.

Clone your current application and modify the code such that it inserts
directly to the specific child table or leave the writing to the specific
child table to be done by the on insert trigger.

Test the data population via the application to see if the child tables are
being populated accordingly.

If all is well. Schedule downtime where you can implement these changes to
your production environment.

Allan.

Attachments:

image001.jpgimage/jpeg; name=image001.jpgDownload
#3Michael Lewis
mlewis@entrata.com
In reply to: Jean-Marc Lessard (#1)
Re: Does the work made by vaccum in the current pass is lost when interrupted?

On Thu, Oct 8, 2020, 1:16 PM Jean-Marc Lessard <
Jean-Marc.Lessard@ultra-ft.com> wrote:

I have a large table (billions of records) which has not been vacuum and
bloated.

Vacuum scale factor was left at the default.

I ran a vacuum on a DEV system and it makes several passes (scanning
heap/vacuuming indexes/vacumming heap) which take more than an hour each.

On a PROD system, I may have to kill the job midway.

Should I reduce the autovacuum_work_mem of my session? Currently 1GB

Increase if you can. You want to maximize the work being done before it
needs to pause.

Have you tuned any settings related to vacuum? If your I/O system can
handle it, turning cost delay very low, especially PG12+ where it can be
less than 1ms, may be helpful. Otherwise you might reduce to 1ms and also
increase cost limit so you do more work before stopping.

Show quoted text
#4Jean-Marc Lessard
Jean-Marc.Lessard@ultra-ft.com
In reply to: Michael Lewis (#3)
RE: Does the work made by vaccum in the current pass is lost when interrupted?

Thank you Michael for your suggestion,
I will reduce vacuum_cost_delay and increasing vacuum_cost_limit to speedup vacuum.
But the I/O system is very solicitated on PROD system and I do not want to impact end user performance.
If aggressive cost_delay and vacuum_cost_limit slow down significatively end user performance, I may have to kill the vacuum, but would like to keep as much vacuum work as possible.

My concern is about the size of the max_dead_tuples buffer determined by the autovacuum_work_mem

* autovacuum_work_mem is currently 1GB (default to maintenance_work_mem)

When I ran a vacuum on a DEV system and it made a single “scanning heap/vacuuming indexes/vacuuming heap” cycle of about 12 hours.
The vacuuming indexes phase is half the total time (6 hrs).

Q1: I guess that if I have to kill the vacuum midway, I will lose all the work it did during the scanning heap step? Right?
Q2: If I reduce autovacuum_work_mem to 32MB, is there any side effect performance on the vacuum?

thanks