AutoVacuum and growing transaction XID's
Hello Team,
We are using a PostgreSQL version -9.6.12 version and from last 4 weeks our
Transaction ID's (XID's) have increased by 195 million to 341 million
transactions. I see the below from pg_stat_activity from the postGreSQL DB.
1) Viewing the pg_stat-activity I noticed that the vacuum query is
running for a runtime interval of few hours to 3-5 days whenever I check
the pg_stat-activity. Is this a common process postgreSQL runs ? I have
noticed this running and show in the pg_stat activity from last few weeks
only. Also the query shows the table name with
(to prevent wrap around) for each of the tables in the vacuum query as
output. What does this mean ?
2) Does it mean I need to run a manual auto vacuum process for these
tables ? as the transaction ids have increased from 195 million to 341
million ?.
What other things I need to check in the database around this ?.
Thanks !!
It is trying to do a vacuum freeze. Do you have autovacuum turned off? Any
settings changed from default related to autovacuum?
https://www.postgresql.org/docs/9.6/routine-vacuuming.html
Read 24.1.5. Preventing Transaction ID Wraparound Failures
These may also be of help-
https://info.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql
https://www.2ndquadrant.com/en/blog/managing-freezing/
Note that you need to ensure the server gets caught up, or you risk being
locked out to prevent data corruption.
On Thu, May 7, 2020 at 1:33 PM Michael Lewis <mlewis@entrata.com> wrote:
It is trying to do a vacuum freeze. Do you have autovacuum turned off? Any
settings changed from default related to autovacuum?https://www.postgresql.org/docs/9.6/routine-vacuuming.html
Read 24.1.5. Preventing Transaction ID Wraparound FailuresThese may also be of help-
https://info.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql
https://www.2ndquadrant.com/en/blog/managing-freezing/Note that you need to ensure the server gets caught up, or you risk being
locked out to prevent data corruption.
Thanks Mike.
1) We haven't changed anything related to autovacuum except a work_mem
parameter which was increased to 4 GB which I believe is not related to
autovacuum
2) The vacuum was not turned off and few parameters we had on vacuum are
*autovacuum_analyze_scale_factor = 0.02* and
*autovacuum_vacuum_scale_factor
= 0.05*
*3) *The database curently we are running is 2 years old for now and we
have around close to 40 partitions and the datfrozenxid on the table is 343
million whereas the default is 200 million. I would try doing a manual
auto vacuum on those tables
where the autovacuum_freeze_max_age > 200 million. Do you think It's a
right thing to do ?.
I will also go through this documents.
Tahnks
On Thu, May 7, 2020 at 4:18 PM github kran <githubkran@gmail.com> wrote:
On Thu, May 7, 2020 at 1:33 PM Michael Lewis <mlewis@entrata.com> wrote:
It is trying to do a vacuum freeze. Do you have autovacuum turned off?
Any settings changed from default related to autovacuum?https://www.postgresql.org/docs/9.6/routine-vacuuming.html
Read 24.1.5. Preventing Transaction ID Wraparound FailuresThese may also be of help-
https://info.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql
https://www.2ndquadrant.com/en/blog/managing-freezing/Note that you need to ensure the server gets caught up, or you risk being
locked out to prevent data corruption.Thanks Mike.
1) We haven't changed anything related to autovacuum except a work_mem
parameter which was increased to 4 GB which I believe is not related to
autovacuum
2) The vacuum was not turned off and few parameters we had on vacuum are
*autovacuum_analyze_scale_factor = 0.02* and *autovacuum_vacuum_scale_factor
= 0.05*
*3) *The database curently we are running is 2 years old for now and we
have around close to 40 partitions and the *datfrozenxid on the table is
343 million whereas the default is 200 million*. I would try doing a
manual auto vacuum on those tables
where the *autovacuum_freeze_max_age > 200 million*. Do you think It's a
right thing to do ?.I will also go through this documents.
* Few more things 5/7 - 8:40 PM CDT*
1) I see there are *8 Vacuum workers* ( Not sure what changed) running
in the background and the concern I have is all of these vacuum processes
are running with wrap around and while they are running
I can't either DROP or ALTER any other tables ( REMOVE Inheritance
for any of old tables where the WRITES are not getting written to).* Any of
the ALTER TABLE OR DROP TABLE DDL's arer not getting exeucted even I
WAITED FOR SEVERAL MINUTES , so I have terminated those queries as I didn't
have luck.*
2) T*he VACUUM Process wrap around is running for last 1 day and
several hrs on other tables. *
3) *Can I increase the autovacuum_freeze_max_age on the tables
on production system* ?
Show quoted text
Thanks
On Fri, 8 May 2020 at 09:18, github kran <githubkran@gmail.com> wrote:
1) We haven't changed anything related to autovacuum except a work_mem parameter which was increased to 4 GB which I believe is not related to autovacuum
It might want to look into increasing vacuum_cost_limit to something
well above 200 or dropping autovacuum_vacuum_cost_delay down from 20
to something much lower. However, you say you've not changed the
autovacuum settings, but you've also said:
1) I see there are 8 Vacuum workers ( Not sure what changed) running in the background and the concern I have is all of these vacuum processes are running with wrap around and while they are running
The default is 3, so if you have 8 then the settings are non-standard.
It might be good to supply the output of:
SELECT name,setting from pg_Settings where name like '%vacuum%';
You should know that the default speed that autovacuum runs at is
quite slow in 9.6. If you end up with all your autovacuum workers tied
up with anti-wraparound vacuums then other tables are likely to get
neglected and that could lead to stale stats or bloated tables. Best
to aim to get auto-vacuum running faster or aim to perform some manual
vacuums of tables that are over their max freeze age during an
off-peak period to make use of the lower load during those times.
Start with tables in pg_class with the largest age(relfrozenxid).
You'll still likely want to look at the speed autovacuum runs at
either way.
Please be aware that the first time a new cluster crosses the
autovacuum_freeze_max_age threshold can be a bit of a pain point as it
can mean that many tables require auto-vacuum activity all at once.
The impact of this is compounded if you have many tables that never
receive an UPDATE/DELETE as auto-vacuum, in 9.6, does not visit those
tables for any other reason. After the first time, the relfrozenxids
of tables tend to be more staggered so their vacuum freeze
requirements are also more staggered and that tends to cause fewer
problems.
David
On Fri, 8 May 2020 at 13:51, github kran <githubkran@gmail.com> wrote:
I can't either DROP or ALTER any other tables ( REMOVE Inheritance for any of old tables where the WRITES are not getting written to). Any of the ALTER TABLE OR DROP TABLE DDL's arer not getting exeucted even I WAITED FOR SEVERAL MINUTES , so I have terminated those queries as I didn't have luck.
The auto-vacuum freeze holds an SharedUpdateExclusiveLock on the table
being vacuumed. If you try any DDL that requires an
AccessExclusiveLock, it'll have to wait until the vacuum has
completed. If you leave the DDL running then all accesses to the table
will be queued behind the ungranted AccessExclusiveLock. It's likely
a good idea to always run DDL with a fairly short lock_timeout, just
in case this happens.
3) Can I increase the autovacuum_freeze_max_age on the tables on production system ?
Yes, but you cannot increase the per-table setting above the global
setting. Changing the global setting requires a restart.
David
Thanks David for your replies.
On Thu, May 7, 2020 at 11:01 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 8 May 2020 at 09:18, github kran <githubkran@gmail.com> wrote:
1) We haven't changed anything related to autovacuum except a work_mem
parameter which was increased to 4 GB which I believe is not related to
autovacuumIt might want to look into increasing vacuum_cost_limit to something
well above 200 or dropping autovacuum_vacuum_cost_delay down from 20
to something much lower. However, you say you've not changed the
autovacuum settings, but you've also said:1) I see there are 8 Vacuum workers ( Not sure what changed) running
in the background and the concern I have is all of these vacuum processes
are running with wrap around and while they are running
- Yes I said it was originally 3 but I noticed the work_mem parameter
was changed few weeks back to 4 GB and then from that day onwards there is
an increasing trend of the MaxUsedTransactionIds from 200 Million to 347
million ( It's growing day by day from last 2 -3 weeks)
- Do you think there could be a formula on how the workers could have
increased based on this increase in WORK_MEM controlled by database ?.
The default is 3, so if you have 8 then the settings are non-standard.
It might be good to supply the output of:
SELECT name,setting from pg_Settings where name like '%vacuum%';
Output of vacuum
name setting min_val max_val boot_val reset_val
autovacuum on null null on on
autovacuum_analyze_scale_factor 0.02 0 100 0.1 0.02
autovacuum_analyze_threshold 50 0 2147483647 50 50
autovacuum_freeze_max_age 200000000 100000 2000000000 200000000 200000000
autovacuum_max_workers 8 1 262143 3 8
autovacuum_multixact_freeze_max_age 400000000 10000 2000000000 400000000
400000000
autovacuum_naptime 5 1 2147483 60 5
autovacuum_vacuum_cost_delay 5 -1 100 20 5
autovacuum_vacuum_cost_limit -1 -1 10000 -1 -1
autovacuum_vacuum_scale_factor 0.05 0 100 0.2 0.05
autovacuum_vacuum_threshold 50 0 2147483647 50 50
autovacuum_work_mem -1 -1 2147483647 -1 -1
You should know that the default speed that autovacuum runs at is
quite slow in 9.6. If you end up with all your autovacuum workers tied
up with anti-wraparound vacuums then other tables are likely to get
neglected and that could lead to stale stats or bloated tables. Best
to aim to get auto-vacuum running faster or aim to perform some manual
vacuums of tables that are over their max freeze age during an
off-peak period to make use of the lower load during those times.
Start with tables in pg_class with the largest age(relfrozenxid).
You'll still likely want to look at the speed autovacuum runs at
either way.Please be aware that the first time a new cluster crosses the
autovacuum_freeze_max_age threshold can be a bit of a pain point as it
can mean that many tables require auto-vacuum activity all at once.
The impact of this is compounded if you have many tables that never
receive an UPDATE/DELETE as auto-vacuum, in 9.6, does not visit those
tables for any other reason. After the first time, the relfrozenxids
of tables tend to be more staggered so their vacuum freeze
requirements are also more staggered and that tends to cause fewer
problems.
The current situation I have is the auto vacuum kicked with 8 tables with
each of those tied to each worker and it's running very slow in 9.6 as you
mentioned
i observed VACUUM on those 8 tables is running from last 15 hrs and
other process are running for 1 hr+ and others for few minutes for
different tables.
Finally I would wait for your reply to see what could be done for this
VACUUM and growing TXIDs values.
- Do you think I should consider changing back the work_mem back to 4
MB what it was originally ?
- Can I apply your recommendations on a production instance directly
or you prefer me to apply initially in other environment before applying on
Prod ?
- Also like I said I want to clean up few unused tables OR MANUAL
VACUUM but current system doesn't allow me to do it considering these
factors.
- I will try to run VACUUM Manually during off peak hrs , Can I STOP
the Manual VACUUM process if its take more than 10 minutes or what is the
allowed time in mins I can have it running ?.
David
Show quoted text
On Thu, May 7, 2020 at 11:04 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Fri, 8 May 2020 at 13:51, github kran <githubkran@gmail.com> wrote:
I can't either DROP or ALTER any other tables ( REMOVE Inheritance
for any of old tables where the WRITES are not getting written to). Any of
the ALTER TABLE OR DROP TABLE DDL's arer not getting exeucted even I
WAITED FOR SEVERAL MINUTES , so I have terminated those queries as I didn't
have luck.The auto-vacuum freeze holds an SharedUpdateExclusiveLock on the table
being vacuumed. If you try any DDL that requires an
AccessExclusiveLock, it'll have to wait until the vacuum has
completed. If you leave the DDL running then all accesses to the table
will be queued behind the ungranted AccessExclusiveLock. It's likely
a good idea to always run DDL with a fairly short lock_timeout, just
in case this happens.
* How much value I can assign to lock_timeout so that I dont get into
trouble to test my DDL commands and without impacting other sessions.*
3) Can I increase the autovacuum_freeze_max_age on the tables on
production system ?
Yes, but you cannot increase the per-table setting above the global
setting. Changing the global setting requires a restart.How can I change the value of the global setting of the
autovacuum_freeze_max_Age value.
Show quoted text
David
autovacuum_naptime being only 5 seconds seems too frequent. A lock_timeout
might be 1-5 seconds depending on your system. Usually, DDL can fail and
wait a little time rather than lock the table for minutes and have all
reads back up behind the DDL.
Given you have autovacuum_vacuum_cost_limit set to unlimited (seems very
odd), I'm not sure a manual vacuum freeze command on the tables with high
age would perform differently. Still, issuing a vacuum freeze and then
killing the autovacuum process might be worth trying.
Thanks for yous suggestions Michael and David.
On Fri, May 8, 2020 at 4:11 PM Michael Lewis <mlewis@entrata.com> wrote:
Show quoted text
autovacuum_naptime being only 5 seconds seems too frequent. A lock_timeout
might be 1-5 seconds depending on your system. Usually, DDL can fail and
wait a little time rather than lock the table for minutes and have all
reads back up behind the DDL.Given you have autovacuum_vacuum_cost_limit set to unlimited (seems very
odd), I'm not sure a manual vacuum freeze command on the tables with high
age would perform differently. Still, issuing a vacuum freeze and then
killing the autovacuum process might be worth trying.