Transaction ID Wraparound Monitoring

Started by Jan Keirseover 10 years ago7 messagesgeneral
Jump to latest
#1Jan Keirse
jan.keirse@tvh.com

Hello,

we have some very write heavy databases and I have our monitoring
system watch the transaction age of my databases to be alerted before
we get into problems in case autovacuum can't keep up to avoid
transaction ID wraparound.

The query I am executing is this:
SELECT max(age(pg_database.datfrozenxid)) / 2147483648.0 * 100.0 AS
"Percentage of transaction ID's used" FROM pg_database;

My believe was that if this reaches 100 the database will stop
accepting writes and one must vacuum. I have set alerts on 50 and 90,
the result is around 9 so my believe was autovacuum is working fine
for my workload.
I often see autovacuum kicking in to prevent XID Wraparround, I
thought that was just to be on the safe side and vacuum well before
it's too late.

However today I saw this post:
http://blog.getsentry.com/2015/07/23/transaction-id-wraparound-in-postgres.html

The following line has me worried:
... that database is going to reach a situation where the XID counter
has reached its maximum value. The absolute peak is something around 2
billion, but it can be far lower than that in some situations...

Could someone shed some light on this? Is my query insufficient? Can
the transaction wrapparound freeze problem indeed occur earlier? And
if so, could someone suggest a better query to monitor?

Kind Regards,

Jan Keirse

--

**** DISCLAIMER ****

http://www.tvh.com/glob/en/email-disclaimer

"This message is delivered to all addressees subject to the conditions
set forth in the attached disclaimer, which is an integral part of this
message."

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Jan Keirse (#1)
Re: Transaction ID Wraparound Monitoring

On 07/30/2015 02:55 AM, Jan Keirse wrote:

Hello,

we have some very write heavy databases and I have our monitoring
system watch the transaction age of my databases to be alerted before
we get into problems in case autovacuum can't keep up to avoid
transaction ID wraparound.

The query I am executing is this:
SELECT max(age(pg_database.datfrozenxid)) / 2147483648.0 * 100.0 AS
"Percentage of transaction ID's used" FROM pg_database;

My believe was that if this reaches 100 the database will stop
accepting writes and one must vacuum. I have set alerts on 50 and 90,
the result is around 9 so my believe was autovacuum is working fine
for my workload.
I often see autovacuum kicking in to prevent XID Wraparround, I
thought that was just to be on the safe side and vacuum well before
it's too late.

However today I saw this post:
http://blog.getsentry.com/2015/07/23/transaction-id-wraparound-in-postgres.html

The following line has me worried:
... that database is going to reach a situation where the XID counter
has reached its maximum value. The absolute peak is something around 2
billion, but it can be far lower than that in some situations...

Could someone shed some light on this? Is my query insufficient? Can
the transaction wrapparound freeze problem indeed occur earlier? And
if so, could someone suggest a better query to monitor?

I would look at:

http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Which includes some query examples.

Kind Regards,

Jan Keirse

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Jan Keirse
jan.keirse@tvh.com
In reply to: Adrian Klaver (#2)
Re: Transaction ID Wraparound Monitoring

On Thu, Jul 30, 2015 at 2:56 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 07/30/2015 02:55 AM, Jan Keirse wrote:

Hello,

we have some very write heavy databases and I have our monitoring
system watch the transaction age of my databases to be alerted before
we get into problems in case autovacuum can't keep up to avoid
transaction ID wraparound.

The query I am executing is this:
SELECT max(age(pg_database.datfrozenxid)) / 2147483648.0 * 100.0 AS
"Percentage of transaction ID's used" FROM pg_database;

My believe was that if this reaches 100 the database will stop
accepting writes and one must vacuum. I have set alerts on 50 and 90,
the result is around 9 so my believe was autovacuum is working fine
for my workload.
I often see autovacuum kicking in to prevent XID Wraparround, I
thought that was just to be on the safe side and vacuum well before
it's too late.

However today I saw this post:

http://blog.getsentry.com/2015/07/23/transaction-id-wraparound-in-postgres.html

The following line has me worried:
... that database is going to reach a situation where the XID counter
has reached its maximum value. The absolute peak is something around 2
billion, but it can be far lower than that in some situations...

Could someone shed some light on this? Is my query insufficient? Can
the transaction wrapparound freeze problem indeed occur earlier? And
if so, could someone suggest a better query to monitor?

I would look at:

http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Which includes some query examples.

Yes, I have seen that documentation and it is because of it that I
believed that my queries were ok, but now I think I may be
misinterpreting or misunderstanding the documentation and have to look
at more information, like autovacuum_multixact_freeze_max_age?

--

**** DISCLAIMER ****

http://www.tvh.com/glob/en/email-disclaimer

"This message is delivered to all addressees subject to the conditions
set forth in the attached disclaimer, which is an integral part of this
message."

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Jan Keirse (#3)
Re: Transaction ID Wraparound Monitoring

On 07/30/2015 08:41 AM, Jan Keirse wrote:

On Thu, Jul 30, 2015 at 2:56 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 07/30/2015 02:55 AM, Jan Keirse wrote:

Hello,

we have some very write heavy databases and I have our monitoring
system watch the transaction age of my databases to be alerted before
we get into problems in case autovacuum can't keep up to avoid
transaction ID wraparound.

The query I am executing is this:
SELECT max(age(pg_database.datfrozenxid)) / 2147483648.0 * 100.0 AS
"Percentage of transaction ID's used" FROM pg_database;

My believe was that if this reaches 100 the database will stop
accepting writes and one must vacuum. I have set alerts on 50 and 90,
the result is around 9 so my believe was autovacuum is working fine
for my workload.
I often see autovacuum kicking in to prevent XID Wraparround, I
thought that was just to be on the safe side and vacuum well before
it's too late.

However today I saw this post:

http://blog.getsentry.com/2015/07/23/transaction-id-wraparound-in-postgres.html

The following line has me worried:
... that database is going to reach a situation where the XID counter
has reached its maximum value. The absolute peak is something around 2
billion, but it can be far lower than that in some situations...

Could someone shed some light on this? Is my query insufficient? Can
the transaction wrapparound freeze problem indeed occur earlier? And
if so, could someone suggest a better query to monitor?

I would look at:

http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Which includes some query examples.

Yes, I have seen that documentation and it is because of it that I
believed that my queries were ok, but now I think I may be
misinterpreting or misunderstanding the documentation and have to look
at more information, like autovacuum_multixact_freeze_max_age?

Well if you click on the parameters in the above page you will go to
their definitions:

So for autovacuum_multixact_freeze_max_age:

http://www.postgresql.org/docs/9.4/interactive/runtime-config-autovacuum.html#GUC-AUTOVACUUM-MULTIXACT-FREEZE-MAX-AGE

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5William Dunn
dunnwjr@gmail.com
In reply to: Adrian Klaver (#4)
Re: Transaction ID Wraparound Monitoring

Hello Jan,

I think your calculation is slightly off because per the docs when
PostgreSQL comes within 1 million of the age at which an actual wraparound
occurs it will go into the safety shutdown mode. Thus the calculation
should be ((2^32)-1)/2-1000000 rather than just ((2^32)-1)/2 as I think you
are using.

When I first started building out my group's PostgreSQL monitoring solution
I too found the wording of transaction freeze to be a bit difficult to
understand. For my team's internal documentation I have summarized it as
follows, I hope it might be more clear:

...normal XIDs are compared using modulo-2^32 arithmetic, which means that
~(2^32-1)/2- transactions appear in the future and ~(2^32-1)/2 transactions
appear in the past.

This [Transaction ID freeze] behavior of autovacuum is primarily dependent
on the settings autovacuum_freeze_table_age and autovacuum_freeze_max_age,
which are set as database defaults but can also be specified on a per table
basis (as storage parameters in CREATE TABLE or ALTER TABLE)

- When a table's oldest transaction reaches autovacuum_freeze_table_age,
the next autovacuum that is performed on that table will be a vacuum freeze
- PostgreSQL implicitly caps autovacuum_freeze_table_age at
0.95*autovacuum_freeze_max_age.
- When a table reaches autovacuum_freeze_max_age PostgreSQL will force
an autovacuum freeze on that table, even if the table would not otherwise
be autovacuumed or autovacuum is disabled.
- PostgreSQL implicitly caps autovacuum_freeze_max_age at 2 billion
(2000000000)

The actual age that a wraparound occurs is ((2^32)/2)-1. When a PostgreSQL
database comes within 1 million of this age (2^32/2-1-1000000) the database
will go into the safety shutdown mode" and no longer accept commands,
including the vacuum commands, and your only recovery option is to stop the
server and use a single-user backend (where shutdown mode is not enforced)
to execute VACUUM. This should, obviously, be avoided at all costs.

References:

-
http://www.PostgreSQL.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
-
http://www.PostgreSQL.org/docs/current/static/runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE
-
http://www.PostgreSQL.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE

Based on the above explanation we consider the following to be the most
correct check for how close you are to an actual wraparound freeze:

CAST (age(relfrozenxid) AS real) / CAST(trunc(((2^32)/2)-1-1000000) AS
real) AS perc_until_wraparound_server_freeze

(Note that we do this at the table level rather than the database level
like you did, though, so that we have the information we need to tune the
settings for individual tables.)

However it is better to set autovacuum max freeze age well below that value
and monitor that instead. Autovacuum should always do a vacuum freeze for a
table that has exceeded max freeze age, and if you are monitoring for that
you should avoid a wrap around freeze:

CAST (age(relfrozenxid) AS real) / CAST ((least(autovacuum_freeze_max_age,
2000000000)) AS real) AS perc_until_freeze_max_age

And ensure that value does not exceed 100%. Though it is important to note
that max freeze age can be set on a per table basis, so to get the true
autovacuum_freeze_max_age of a table (or the real max of the database) you
would need to check the reloptions field of pg_class for that table and
only if there is no value specified for '%autovacuum_freeze_table_age%' use
current_setting('autovacuum_freeze_max_age')

Let me know if I missed something or can clarify it further.

*Will J. Dunn*
*willjdunn.com <http://willjdunn.com&gt;*

On Thu, Jul 30, 2015 at 11:44 AM, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 07/30/2015 08:41 AM, Jan Keirse wrote:

On Thu, Jul 30, 2015 at 2:56 PM, Adrian Klaver
<adrian.klaver@aklaver.com> wrote:

On 07/30/2015 02:55 AM, Jan Keirse wrote:

Hello,

we have some very write heavy databases and I have our monitoring
system watch the transaction age of my databases to be alerted before
we get into problems in case autovacuum can't keep up to avoid
transaction ID wraparound.

The query I am executing is this:
SELECT max(age(pg_database.datfrozenxid)) / 2147483648.0 * 100.0 AS
"Percentage of transaction ID's used" FROM pg_database;

My believe was that if this reaches 100 the database will stop
accepting writes and one must vacuum. I have set alerts on 50 and 90,
the result is around 9 so my believe was autovacuum is working fine
for my workload.
I often see autovacuum kicking in to prevent XID Wraparround, I
thought that was just to be on the safe side and vacuum well before
it's too late.

However today I saw this post:

http://blog.getsentry.com/2015/07/23/transaction-id-wraparound-in-postgres.html

The following line has me worried:
... that database is going to reach a situation where the XID counter
has reached its maximum value. The absolute peak is something around 2
billion, but it can be far lower than that in some situations...

Could someone shed some light on this? Is my query insufficient? Can
the transaction wrapparound freeze problem indeed occur earlier? And
if so, could someone suggest a better query to monitor?

I would look at:

http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND

Which includes some query examples.

Yes, I have seen that documentation and it is because of it that I
believed that my queries were ok, but now I think I may be
misinterpreting or misunderstanding the documentation and have to look
at more information, like autovacuum_multixact_freeze_max_age?

Well if you click on the parameters in the above page you will go to their
definitions:

So for autovacuum_multixact_freeze_max_age:

http://www.postgresql.org/docs/9.4/interactive/runtime-config-autovacuum.html#GUC-AUTOVACUUM-MULTIXACT-FREEZE-MAX-AGE

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#6Jan Keirse
jan.keirse@tvh.com
In reply to: William Dunn (#5)
Re: Transaction ID Wraparound Monitoring

On Tue, Aug 4, 2015 at 6:24 AM, William Dunn <dunnwjr@gmail.com> wrote:

Hello Jan,

I think your calculation is slightly off because per the docs when
PostgreSQL comes within 1 million of the age at which an actual wraparound
occurs it will go into the safety shutdown mode. Thus the calculation should
be ((2^32)-1)/2-1000000 rather than just ((2^32)-1)/2 as I think you are
using.

When I first started building out my group's PostgreSQL monitoring solution
I too found the wording of transaction freeze to be a bit difficult to
understand. For my team's internal documentation I have summarized it as
follows, I hope it might be more clear:

...normal XIDs are compared using modulo-2^32 arithmetic, which means that
~(2^32-1)/2- transactions appear in the future and ~(2^32-1)/2 transactions
appear in the past.

This [Transaction ID freeze] behavior of autovacuum is primarily dependent
on the settings autovacuum_freeze_table_age and autovacuum_freeze_max_age,
which are set as database defaults but can also be specified on a per table
basis (as storage parameters in CREATE TABLE or ALTER TABLE)

When a table's oldest transaction reaches autovacuum_freeze_table_age, the
next autovacuum that is performed on that table will be a vacuum freeze

PostgreSQL implicitly caps autovacuum_freeze_table_age at
0.95*autovacuum_freeze_max_age.

When a table reaches autovacuum_freeze_max_age PostgreSQL will force an
autovacuum freeze on that table, even if the table would not otherwise be
autovacuumed or autovacuum is disabled.

PostgreSQL implicitly caps autovacuum_freeze_max_age at 2 billion
(2000000000)

The actual age that a wraparound occurs is ((2^32)/2)-1. When a PostgreSQL
database comes within 1 million of this age (2^32/2-1-1000000) the database
will go into the safety shutdown mode" and no longer accept commands,
including the vacuum commands, and your only recovery option is to stop the
server and use a single-user backend (where shutdown mode is not enforced)
to execute VACUUM. This should, obviously, be avoided at all costs.

References:

http://www.PostgreSQL.org/docs/current/static/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
http://www.PostgreSQL.org/docs/current/static/runtime-config-client.html#GUC-VACUUM-FREEZE-TABLE-AGE
http://www.PostgreSQL.org/docs/current/static/runtime-config-autovacuum.html#GUC-AUTOVACUUM-FREEZE-MAX-AGE

Based on the above explanation we consider the following to be the most
correct check for how close you are to an actual wraparound freeze:

CAST (age(relfrozenxid) AS real) / CAST(trunc(((2^32)/2)-1-1000000) AS real)
AS perc_until_wraparound_server_freeze

(Note that we do this at the table level rather than the database level like
you did, though, so that we have the information we need to tune the
settings for individual tables.)

Thanks for the correction regarding the 1.000.000 safety margin! I
chose to monitor only the total value value to limit the amount of
extra metrics in the monitoring database. In case the value increased
we'll execute the queries to find out what table(s) is/are causing the
problem interactively.

However it is better to set autovacuum max freeze age well below that value
and monitor that instead. Autovacuum should always do a vacuum freeze for a
table that has exceeded max freeze age, and if you are monitoring for that
you should avoid a wrap around freeze:

CAST (age(relfrozenxid) AS real) / CAST ((least(autovacuum_freeze_max_age,
2000000000)) AS real) AS perc_until_freeze_max_age

And ensure that value does not exceed 100%. Though it is important to note
that max freeze age can be set on a per table basis, so to get the true
autovacuum_freeze_max_age of a table (or the real max of the database) you
would need to check the reloptions field of pg_class for that table and only
if there is no value specified for '%autovacuum_freeze_table_age%' use
current_setting('autovacuum_freeze_max_age')

I'll see to add this one to the monitoring too. Thanks for your clarifications!

--

**** DISCLAIMER ****

http://www.tvh.com/glob/en/email-disclaimer

"This message is delivered to all addressees subject to the conditions
set forth in the attached disclaimer, which is an integral part of this
message."

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Jan Keirse (#6)
Re: Transaction ID Wraparound Monitoring

On 8/4/15 2:47 AM, Jan Keirse wrote:

CAST (age(relfrozenxid) AS real) / CAST(trunc(((2^32)/2)-1-1000000) AS real)

AS perc_until_wraparound_server_freeze

(Note that we do this at the table level rather than the database level like
you did, though, so that we have the information we need to tune the
settings for individual tables.)

Thanks for the correction regarding the 1.000.000 safety margin! I
chose to monitor only the total value value to limit the amount of
extra metrics in the monitoring database. In case the value increased
we'll execute the queries to find out what table(s) is/are causing the
problem interactively.

Something to be aware of; with certain workloads you can actually run
out of MXIDs faster than XIDs, and I don't think there's any easy way to
monitor MXID consumption.

BTW, you might find my talk from pgCon or the related video useful:

http://www.pgcon.org/2015/schedule/events/829.en.html
https://www.youtube.com/watch?v=b1fcvkl0ffQ
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general