Transaction wraparound problem with database postgres

Started by Markus Wollnyabout 18 years ago14 messagesgeneral
Jump to latest
#1Markus Wollny
Markus.Wollny@computec.de

Hi!

My database cluster has just stopped working. I get the following message:
psql: FATAL: Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen Transaktionsnummernüberlauf zu vermeiden
TIP: Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres« auszuführen.

I did as suggested, stopped tzhe postmaster and started a single backend on database 'postgres'; I issued a "VACCUM" there.
su postgres -c "/opt/pgsql/bin/postgres --single -D /var/lib/pgsql/data postgres"

backend> VACUUM

However, this doesn't seem to help - I receive lots and lots of messages like this:

<2008-03-21 21:43:27 CET - 11845: @>WARNUNG: Datenbank »postgres« muss innerhalb von 4294805194 Transaktionen gevacuumt werden
<2008-03-21 21:43:27 CET - 11845: @>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.

i.e. "database 'postgres' must be vacuumed within 4294805194 transactions."

That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases.

This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have configured autovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is there some alternative to reinit and going back to the last dump?

Urgent help would be very much appreciated.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#2Markus Wollny
Markus.Wollny@computec.de
In reply to: Markus Wollny (#1)
Re: Transaction wraparound problem with database postgres

Hi!

Just some more info, hoping that it helps with a diagnosis:

1: datname (typeid = 19, len = 64, typmod = -1, byval = f)
2: age (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid (typeid = 28, len = 4, typmod = -1, byval = t)
----
1: datname = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "-2147321465" (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)
----
Then I issue a vacuum:
1: datname (typeid = 19, len = 64, typmod = -1, byval = f)
2: age (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid (typeid = 28, len = 4, typmod = -1, byval = t)
----
1: datname = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "-2147321383" (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)

It worries me, that 'age' is negative.

Kind regards

Markus

________________________________

Von: Markus Wollny
Gesendet: Fr 21.03.2008 21:50
An: pgsql-general@postgresql.org
Betreff: Transaction wraparound problem with database postgres

Hi!

My database cluster has just stopped working. I get the following message:
psql: FATAL: Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen Transaktionsnummernüberlauf zu vermeiden
TIP: Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres« auszuführen.

I did as suggested, stopped tzhe postmaster and started a single backend on database 'postgres'; I issued a "VACCUM" there.
su postgres -c "/opt/pgsql/bin/postgres --single -D /var/lib/pgsql/data postgres"

backend> VACUUM

However, this doesn't seem to help - I receive lots and lots of messages like this:

<2008-03-21 21:43:27 CET - 11845: @>WARNUNG: Datenbank »postgres« muss innerhalb von 4294805194 Transaktionen gevacuumt werden
<2008-03-21 21:43:27 CET - 11845: @>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.

i.e. "database 'postgres' must be vacuumed within 4294805194 transactions."

That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases.

This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have configured autovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is there some alternative to reinit and going back to the last dump?

Urgent help would be very much appreciated.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#3Joshua D. Drake
jd@commandprompt.com
In reply to: Markus Wollny (#1)
Re: Transaction wraparound problem with database postgres

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 21 Mar 2008 21:50:57 +0100
"Markus Wollny" <Markus.Wollny@computec.de> wrote:

That's what I just did, but the problem persists. Whenever I issue a
'vacuum', the number of transactions simply decreases.

This is PostgreSQL 8.2.4 on x86_64-unknown-linux-gnu, compiled by GCC
gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

I am absolutely lost about what to do now - and it's a puzzle how
this could have happened in the first place. I have configured
autovaccum AND I run a vacuum verbose analyze over all databases
every single night. What do I do now? Is there some alternative to
reinit and going back to the last dump?

Urgent help would be very much appreciated.

Vacuum every database. (template1,postgres too). This could happen if
you have long running transactions that are not allow VACUUM to
actually work.

Sincerely,

Joshua D. Drake

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels
Herrmann Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

- --
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH5CcAATb/zqfZUUQRAl7CAJ9QGYJVqNVfHFgjVyCBswp1+d8kJgCfe+io
5d28sM4Gw4OkSBh/+U4jMDI=
=iBx9
-----END PGP SIGNATURE-----

#4Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Markus Wollny (#1)
Re: Transaction wraparound problem with database postgres

Hello,

On Fri, 21 Mar 2008 21:50:57 +0100 Markus Wollny wrote:

My database cluster has just stopped working. I get the following message:
psql: FATAL: Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen Transaktionsnummernüberlauf zu vermeiden
TIP: Halten Sie den Postmaster an und verwenden Sie ein Standalone-Backend, um VACUUM in der Datenbank »postgres« auszuführen.

First of all, it would help you and most of the readers on this list, if
you have the error messages in english. There is a german mailinglist
too, if you want to ask in german.

That's what I just did, but the problem persists. Whenever I issue a 'vacuum', the number of transactions simply decreases.

vacuum all databases, add the VERBOSE option to see, what actually
happens.

I am absolutely lost about what to do now - and it's a puzzle how this could have happened in the first place. I have configured autovaccum AND I run a vacuum verbose analyze over all databases every single night. What do I do now? Is there some alternative to reinit and going back to the last dump?

Are you using the database 'postgres' at all? And are you sure, that
you include all databases? Any error messages in the vacuum output?

Oh, and by the way: why do you have autovacuum and a manual vacuum run
every night plus the vacuum run with verbose?

Urgent help would be very much appreciated.

That's a bit late here ;-)

Kind regards

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Markus Wollny (#2)
Re: Transaction wraparound problem with database postgres

"Markus Wollny" <Markus.Wollny@computec.de> writes:

Just some more info, hoping that it helps with a diagnosis:

1: datname (typeid = 19, len = 64, typmod = -1, byval = f)
2: age (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid (typeid = 28, len = 4, typmod = -1, byval = t)
----
1: datname = "postgres" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "-2147321465" (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod = -1, byval = t)

What are the datfrozenxid's of the other rows in pg_database?
Do the other fields of postgres' row look sane?

regards, tom lane

#6Markus Wollny
Markus.Wollny@computec.de
In reply to: Markus Wollny (#1)
Re: Transaction wraparound problem with database postgres

Hi!

Sorry for the quick updates to my own messages, but I didn't want to lean back and wait - so I took to more aggressive measures. All my other databases in this cluster are fine - and the 'postgres' database doesn't seem to do anything really useful except being the default database. I dropped it and recreated it with template1 as template, afterwards I could start up my cluster with no problems whatsoever. I'd still like to find out what exactly happened here so I can prevent the same from happening again in the future. The age(datfrozenxid) is positive again:

# SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database where datname='postgres';
datname | age | datfrozenxid
----------+-----------+--------------
postgres | 100291695 | 3882762765
(1 Zeile)

As I mentioned earlier, I'm running autovaccuum and use a nightly cron to run vacuum verbose analyze over all my databases. So lack of vacuum cannot be the issue, I think. But what else could have happened here? I regularly scan my logs, and there was no early warning for this issue.

The first event of this type in the server log was from today:

<2008-03-21 17:08:48 CET - 32161: xxx.xxx.xxx.xxx(52833)@magazine>WARNUNG: Datenbank »postgres« muss innerhalb von 11000000 Transaktionen gevacuumt werden
<2008-03-21 17:08:48 CET - 32161: xxx.xxx.xxx.xxx(52833)@magazine>TIPP: Um ein Abschalten der Datenbank zu vermeiden, führen Sie in »postgres« ein VACUUM über die komplette Datenbank aus.

(i.e. database 'postgres' need to be vacuumed within 11000000 transactions...)

A mere three hours later, the server already refused any further requests:
<2008-03-21 20:05:21 CET - 25184: xxx.xxx.xxx.xxx(60837)@magazine>FEHLER: Datenbank nimmt keine Befehle an, um Datenverlust in Datenbank »postgres« wegen Transaktionsnummernüberlauf zu vermeiden

(ie. database no longer accepts any commands in order to prevent data loss in database 'postgres' because of transaction id wraparound)

Now that the adrenaline level has dropped to normal, I'd still like to know what exactly has happened here; The cluster has been initdb'ed on 2007-04-27.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#7Markus Wollny
Markus.Wollny@computec.de
In reply to: Tom Lane (#5)
Re: Transaction wraparound problem with database postgres

Hi!

Thanks for all the quick replies :)

Tom Lane wrote:

"Markus Wollny" <Markus.Wollny@computec.de> writes:

Just some more info, hoping that it helps with a diagnosis:

1: datname (typeid = 19, len = 64, typmod = -1, byval = f)
2: age (typeid = 23, len = 4, typmod = -1, byval = t)
3: datfrozenxid (typeid = 28, len = 4, typmod = -1,
byval = t) ---- 1: datname = "postgres"
(typeid = 19, len = 64, typmod = -1, byval = f) 2: age =
"-2147321465" (typeid = 23, len = 4, typmod = -1, byval =
t) 3: datfrozenxid = "1835116837" (typeid = 28, len = 4, typmod =
-1, byval = t)

What are the datfrozenxid's of the other rows in pg_database?
Do the other fields of postgres' row look sane?

Yes, there were no issues on any of the databases that are actually in use:

# select datname, age(datfrozenxid), datfrozenxid from pg_database;
datname | age | datfrozenxid
------------+-----------+--------------
rpfcms | 104213725 | 3881601233
rpfflash | 147289015 | 3838525943
postgres | 103052193 | 3882762765
template1 | 104213787 | 3881601171
template0 | 3052193 | 3982762765
ezpublish | 147419044 | 3838395914
community | 147566532 | 3838248426
abo | 147689637 | 3838125321
bluebox | 147679271 | 3838135687
cbox | 147582662 | 3838232296
mpo | 147309716 | 3838505242
newsletter | 147309110 | 3838505848
pcaction | 147297707 | 3838517251
pcgames | 147291588 | 3838523370
magazine | 147419044 | 3838395914

Only the 'postgres' db was affected - which is puzzling because we don't actually use this database actively for anything.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Markus Wollny (#6)
Re: Transaction wraparound problem with database postgres

"Markus Wollny" <Markus.Wollny@computec.de> writes:

Sorry for the quick updates to my own messages, but I didn't want to
lean back and wait - so I took to more aggressive measures. All my
other databases in this cluster are fine - and the 'postgres' database
doesn't seem to do anything really useful except being the default
database. I dropped it and recreated it with template1 as template,
afterwards I could start up my cluster with no problems
whatsoever.

Yeah, if there were no other problems apparent in pg_database I was
going to suggest that as a recovery method.

I'd still like to find out what exactly happened here so I
can prevent the same from happening again in the future.

Me too. It would seem that something did a vacuum of postgres
with a strange choice of xid cutoff, but I can't think of what
would cause that.

Do you ever do VACUUM FREEZE on your databases?

regards, tom lane

#9Markus Wollny
Markus.Wollny@computec.de
In reply to: Andreas 'ads' Scherbaum (#4)
Re: Transaction wraparound problem with database postgres

Andreas 'ads' Scherbaum wrote:

Hello,
First of all, it would help you and most of the readers on this list,
if you have the error messages in english. There is a german
mailinglist too, if you want to ask in german.

Sorry, I tried to describe the issue as best as I could and included the actual log entries only for completeness, but was in too much of a hurry to find the correct translations.

vacuum all databases, add the VERBOSE option to see, what actually
happens.

Alas, too late, I got rid of the offending 'postgres' database already by dropping and recreating.

Are you using the database 'postgres' at all?

No, not at all. Didn't touch it ever after initdb.

And are you sure, that you include all databases?

Yes. I run the following every night:

su postgres -c '/opt/pgsql/bin/psql -t -c "select datname from pg_database order by datname;" template1 | xargs -n 1 /opt/pgsql/bin/psql -q -c "vacuum verbose analyze;"'

Any error messages in the vacuum output?

None.

Oh, and by the way: why do you have autovacuum and a manual vacuum
run every night plus the vacuum run with verbose?

Paranoia, mostly, I think. I'm using PostgreSQL since long before autovacuum was introduced and always thought that it couldn't do any harm to keep my original vacuum job running once every night, even though autovacuum does a remarkable job, especially for a couple of busy tables where the nightly vacuum was not quite enough. Plus, having the verbose output from the log, I get useful info for setting the 'max_fsm_pages'/'max_fsm_relations'-options to sensible values. Is it a problem to have cron'ed VACUUM-runs in parallel with autovacuum?

Urgent help would be very much appreciated.

That's a bit late here ;-)

Ah, well obviously it wasn't - it's always an extremely pleasant surprise when one is actually in dire need of help and gets an almost immediate and helpful response.

I wish you all happy Easter!

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#10Markus Wollny
Markus.Wollny@computec.de
In reply to: Tom Lane (#8)
Re: Transaction wraparound problem with database postgres

Tom Lane wrote:

"Markus Wollny" <Markus.Wollny@computec.de> writes:

I'd still like to find out what exactly happened here so I can
prevent the same from happening again in the future.

Me too. It would seem that something did a vacuum of postgres with a
strange choice of xid cutoff, but I can't think of what would cause
that.

Do you ever do VACUUM FREEZE on your databases?

No, I actually never heard of VACUUM FREEZE, I have to admit.

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Johannes S. Gözalan (Vorsitzender) und Niels Herrmann
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#11Markus Wollny
Markus.Wollny@computec.de
In reply to: Tom Lane (#8)
Re: Transaction wraparound problem with database postgres

Hi!

After going several months without such incidents, we now got bit by the same problem again. We have since upgraded the hardware we ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't changed much though - we still don't use the database 'postgres' except for one lone pgagent-job which has only been configured a couple of weeks back and we do a nightly vacuum over all databases in addition the the running of the autovacuum daemon. As I expect that this might hit again in a couple of months: Any suggestions on what sort of forensic data might be required to actually find out the root of what's causing it? As I needed to get the cluster back up and running again, I used the same remedy as last time and simply dropped the database and recreated it from template1, so there's not much left to be looked into right now, but if I knew what kind of data to retain I mit be able to come up with some more useful info next time...

Kind regards

Markus

-----Ursprüngliche Nachricht-----
Von: Markus Wollny
Gesendet: Freitag, 21. März 2008 23:34
An: 'Tom Lane'
Cc: pgsql-general@postgresql.org
Betreff: AW: [GENERAL] Transaction wraparound problem with
database postgres

Tom Lane wrote:

"Markus Wollny" <Markus.Wollny@computec.de> writes:

I'd still like to find out what exactly happened here so I can
prevent the same from happening again in the future.

Me too. It would seem that something did a vacuum of

postgres with a

strange choice of xid cutoff, but I can't think of what would cause
that.

Do you ever do VACUUM FREEZE on your databases?

No, I actually never heard of VACUUM FREEZE, I have to admit.

Computec Media AG
Sitz der Gesellschaft und Registergericht: Fürth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jürg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#12Scott Marlowe
scott.marlowe@gmail.com
In reply to: Markus Wollny (#11)
Re: Transaction wraparound problem with database postgres

On Sun, Mar 7, 2010 at 6:06 PM, Markus Wollny <Markus.Wollny@computec.de> wrote:

Hi!

After going several months without such incidents, we now got bit by the same problem again. We have since upgraded the hardware we ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't changed much though - we still don't use the database 'postgres' except for one lone pgagent-job which has only been configured a couple of weeks back and we do a nightly vacuum over all databases in addition the the running of the autovacuum daemon. As I expect that this might hit again in a couple of months: Any suggestions on what sort of forensic data might be required to actually find out the root of what's causing it? As I needed to get the cluster back up and running again, I used the same remedy as last time and simply dropped the database and recreated it from template1, so there's not much left to be looked into right now, but if I knew what kind of data to retain I mit be able to come up with some more useful info next time...

Do your logs show any kind of error when vacuuming about "only owner
can vacuum" a table or anything?

#13Markus Wollny
Markus.Wollny@computec.de
In reply to: Scott Marlowe (#12)
Re: Transaction wraparound problem with database postgres

Hi!

From: Scott Marlowe [mailto:scott.marlowe@gmail.com]

Do your logs show any kind of error when vacuuming about
"only owner can vacuum" a table or anything?

I grepped through the logs from the last four days and, no, there were
none such errors whatsoever. Last vacuum analyze run returned the
following:

INFO: free space map contains 1974573 pages in 9980 relations
DETAIL: A total of 2043408 page slots are in use (including overhead).
2043408 page slots are required to track all free space.
Current limits are: 2100000 page slots, 10000 relations, using 13376
kB.

I have since increased these limits by 50% as we've come quite close to
what was configured. But as they hadn't been reached yet anyway, so I
don't think we did have any sort of apparent problem with the running of
vaccuum as such.

Kind regards

Markus

Computec Media AG
Sitz der Gesellschaft und Registergericht: Furth (HRB 8818)
Vorstandsmitglieder: Albrecht Hengstenberg (Vorsitzender) und Rainer Rosenbusch
Vorsitzender des Aufsichtsrates: Jurg Marquard
Umsatzsteuer-Identifikationsnummer: DE 812 575 276

#14Scott Marlowe
scott.marlowe@gmail.com
In reply to: Markus Wollny (#11)
Re: Transaction wraparound problem with database postgres

On Sun, Mar 7, 2010 at 6:06 PM, Markus Wollny <Markus.Wollny@computec.de> wrote:

Hi!

After going several months without such incidents, we now got bit by the same problem again. We have since upgraded the hardware we ran the database cluster on and currently use version 8.3.7. The general outline of the problem hasn't changed much though

I'm not sure I'm entirely sure exactly what the problem was and now
again is. Could you fill me in on that?