template0 needing vacuum freeze?

Started by Don Seilerabout 6 years ago8 messagesgeneral

don@seiler.us

about 6 years ago

PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both
over 50% towards TXID wraparound. I could vacuum template1 but couldn't
vacuum template0 without first allowing connections. This is what it looked
like before:

# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
datname | age | current_setting
--------------------+------------+-----------------
foo_db | 1022106099 | 200000000
template0 | 1000278345 | 200000000
postgres | 643729 | 200000000
template1 | 643729 | 200000000
(4 rows)

I've since allowed connections and ran "vacuumdb --freeze" on it and then
immediately disabled the connections to it again. But I'm curious how
template0 would be growing in age like this. Even now I see the template0
age growing. I can say that these DB has previously been altered for locale
changes as well.

I'm also running a long "vacuum freeze" on foo_db that will take a few days
after seeing that autovacuum on a big table had been running on it since
Feb 2 and making no progress, with over 850M dead tuples according to
pg_stat_all_tables. I estimate 3-4 more days to go on that one. Once that's
done I'll be scheduling manual vacuum jobs. Just wondering if that would
somehow affect regular template0 cleanup though.

I don't see anything in postgres log related to template0 other than my
manual interactions today.

--
Don Seiler
www.seiler.us

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Don Seiler (#1)

Re: template0 needing vacuum freeze?

Don Seiler <don@seiler.us> writes:

PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both
over 50% towards TXID wraparound. I could vacuum template1 but couldn't
vacuum template0 without first allowing connections. This is what it looked
like before:

template0 shouldn't really need freezing, if it's unchanged since initdb,
but the autovacuum logic doesn't know that and will periodically scan it
anyway. That should be pretty cheap (since that DB is small and there's
not really any work to do), so we haven't considered it to be something
to prevent --- especially since it is a good safety valve in case
someone does change template0.

So it's unsurprising that the freeze age increases until autovacuum
decides to do something about it. I'm suspicious that your alert settings
are too aggressive and are notifying you before autovacuum kicks in.
You should *not* have had to do anything manual about this, unless you
have frobbed your autovac settings to the point of brokenness.

regards, tom lane

don@seiler.us

about 6 years ago

In reply to: Tom Lane (#2)

Re: template0 needing vacuum freeze?

On Sat, May 16, 2020 at 12:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

So it's unsurprising that the freeze age increases until autovacuum
decides to do something about it. I'm suspicious that your alert settings

are too aggressive and are notifying you before autovacuum kicks in.

You should *not* have had to do anything manual about this, unless you
have frobbed your autovac settings to the point of brokenness.

Shouldn't autovacuum have kicked in when the age of a table reaches 200M
(our autovacuum_freeze_max_age is left at that default)? I see other tables
in our app DB triggering the autovacuum "to prevent wrap-around" when they
reach 200M. That's what had me concerned to see template0 with an age over
1B and no autovacuum even trying to clean up for it.

Don.

--
Don Seiler
www.seiler.us

laurenz.albe@cybertec.at

about 6 years ago

In reply to: Don Seiler (#1)

Re: template0 needing vacuum freeze?

On Sat, 2020-05-16 at 12:19 -0500, Don Seiler wrote:

PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both over 50%
towards TXID wraparound. I could vacuum template1 but couldn't vacuum template0
without first allowing connections. This is what it looked like before:

# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
datname | age | current_setting
--------------------+------------+-----------------
foo_db | 1022106099 | 200000000
template0 | 1000278345 | 200000000
postgres | 643729 | 200000000
template1 | 643729 | 200000000
(4 rows)

I've since allowed connections and ran "vacuumdb --freeze" on it and then immediately
disabled the connections to it again. But I'm curious how template0 would be growing
in age like this. Even now I see the template0 age growing.

That is indeed strange.

Did you see any weird messages when you vacuumed "template0"?
Did "datfrozenxid" shrink after the operation?

"foo_db" seems to be the bigger problem.
Perhaps autovacuum never handled "template0" because it concluded (rightly) that
it has to deal with "foo_db" first.

I can say that these DB
has previously been altered for locale changes as well.

Would you care to explain that? You changed "template0"? How?

I'm also running a long "vacuum freeze" on foo_db that will take a few days after
seeing that autovacuum on a big table had been running on it since Feb 2 and making
no progress, with over 850M dead tuples according to pg_stat_all_tables.
I estimate 3-4 more days to go on that one. Once that's done I'll be scheduling
manual vacuum jobs. Just wondering if that would somehow affect regular template0
cleanup though.

As I said, perhaps.

What are your non-default autovacuum settings? Perhaps you should speed up autovacuum
by reducing "autovacuum_vacuum_cost_delay" to 2ms or less, and by increasing
"maintenance_work_mem".

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

don@seiler.us

about 6 years ago

In reply to: Laurenz Albe (#4)

Re: template0 needing vacuum freeze?

On Mon, May 18, 2020 at 1:40 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Did you see any weird messages when you vacuumed "template0"?

No.

Did "datfrozenxid" shrink after the operation?

"foo_db" seems to be the bigger problem.

Perhaps autovacuum never handled "template0" because it concluded (rightly)

that
it has to deal with "foo_db" first.

Yes this DB had a table in it that had been autovacuuming since Feb 2. It's
age is half way to wraparound so I'm in the middle of a manual VACUUM
FREEZE on it. I'd be interested in knowing if that prevents template0 from
autovacuuming itself. There are no other autovacuum jobs running.

I can say that these DB
has previously been altered for locale changes as well.

Would you care to explain that? You changed "template0"? How?

It was changed before my time here to change the encoding from LATIN1 to
UTF manually. One of the fun treats I'm working to correct as I also
prepare these for upgrade to PG12.

What are your non-default autovacuum settings? Perhaps you should speed up

autovacuum
by reducing "autovacuum_vacuum_cost_delay" to 2ms or less, and by
increasing
"maintenance_work_mem".

All autovacuum settings on this DB are default. Cost delay is at the
default 20ms. maintenance_work_mem I've already increased to 512MB (this VM
has 8GB RAM).

--
Don Seiler
www.seiler.us

tgl@sss.pgh.pa.us

about 6 years ago

In reply to: Don Seiler (#5)

Re: template0 needing vacuum freeze?

Don Seiler <don@seiler.us> writes:

On Mon, May 18, 2020 at 1:40 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Perhaps autovacuum never handled "template0" because it concluded (rightly)
that it has to deal with "foo_db" first.

Yes this DB had a table in it that had been autovacuuming since Feb 2. It's
age is half way to wraparound so I'm in the middle of a manual VACUUM
FREEZE on it. I'd be interested in knowing if that prevents template0 from
autovacuuming itself. There are no other autovacuum jobs running.

I think we did put in a change that would prevent any one database from
completely consuming autovacuum's attention, even in wraparound-hazard
situations. Don't recall when.

Do you have an idea why autovac was failing to clear the issue on that one
problem table, though?

regards, tom lane

don@seiler.us

about 6 years ago

In reply to: Tom Lane (#6)

Re: template0 needing vacuum freeze?

On Mon, May 18, 2020 at 8:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Do you have an idea why autovac was failing to clear the issue on that one
problem table, though?

Before I intervened, the maintenance_work_mem was only 16MB (they had
encoded and carried over PG 8 defaults in the chef recipe). I bumped it to
512MB before kicking off my freezeThat's the big factor I can think of.
This is a huge table obviously as well. By the time it caught my attention
there were 850M dead tuples to be cleaned up. My VACUUM FREEZE has been
running just just about 5 days and is probably half way done. We shouldn't
be at risk of hitting wraparound though (only 52% there).

Don.

--
Don Seiler
www.seiler.us

adrian.klaver@aklaver.com

about 6 years ago

In reply to: Tom Lane (#6)

Re: template0 needing vacuum freeze?

On 5/18/20 6:51 PM, Tom Lane wrote:

Don Seiler <don@seiler.us> writes:

On Mon, May 18, 2020 at 1:40 AM Laurenz Albe <laurenz.albe@cybertec.at>
wrote:

Perhaps autovacuum never handled "template0" because it concluded (rightly)
that it has to deal with "foo_db" first.

Yes this DB had a table in it that had been autovacuuming since Feb 2. It's
age is half way to wraparound so I'm in the middle of a manual VACUUM
FREEZE on it. I'd be interested in knowing if that prevents template0 from
autovacuuming itself. There are no other autovacuum jobs running.

I think we did put in a change that would prevent any one database from
completely consuming autovacuum's attention, even in wraparound-hazard
situations. Don't recall when.

This?:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=dd9ac7d5d80608a640bb82cffb6a805ce84cf112

which I believe is only for 12 and is in the 12.3 release.

Do you have an idea why autovac was failing to clear the issue on that one
problem table, though?

regards, tom lane

--
Adrian Klaver
adrian.klaver@aklaver.com