Cannot use a standalone backend to VACUUM in "postgres""

Started by Manuel Sugawaraabout 18 years ago14 messagesgeneral
Jump to latest
#1Manuel Sugawara
masm@fciencias.unam.mx

We have a PostgreSQL 8.2.6 installation running for about six-months
now. There was a lot of log entries saying (sometimes 10 or more in
just one second):

WARNING: oldest xmin is far in the past
HINT: Close open transactions soon to avoid wraparound problems.

(actually it was in Spanish but I think that's irrelevant).

The funny thing is that there was no open transactions, even after
restarting the cluster the same message was logged. Today, the
database stopped working as expected:

ERROR: database is shut down to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to VACUUM in "postgres"

So, the postmaster was stopped to follow the hint but even in
stand-alone mode postgres keeps saying:

WARNING: database "postgres" must be vacuumed within 999805 transactions
HINT: To avoid a database shutdown, execute a full-database VACUUM in "postgres".
WARNING: oldest xmin is far in the past
HINT: Close open transactions soon to avoid wraparound problems.

Every time vacuum is run the number decreases by one but after a few
runs I still cannot access the cluster :-(. (My plan was to take a
pg_dumpall and then re-init the cluster.)

Attached is the output of:

SELECT relname, age(relfrozenxid) FROM pg_class WHERE relkind = 'r';
SELECT datname, age(datfrozenxid) FROM pg_database;

pg_controldata says:

pg_control version number: 822
Catalog version number: 200611241
Database system identifier: 5040396405114363383
Database cluster state: in production
pg_control last modified: Mon 07 Apr 2008 09:22:19 PM CDT
Current log file ID: 33
Next log file segment: 91
Latest checkpoint location: 21/5A8EC824
Prior checkpoint location: 21/5A8C8CDC
Latest checkpoint's REDO location: 21/5A8EC824
Latest checkpoint's UNDO location: 0/0
Latest checkpoint's TimeLineID: 1
Latest checkpoint's NextXID: 0/2280224912
Latest checkpoint's NextOID: 103405
Latest checkpoint's NextMultiXactId: 64513935
Latest checkpoint's NextMultiOffset: 154155767
Time of latest checkpoint: Mon 07 Apr 2008 09:20:54 PM CDT
Minimum recovery ending location: 0/0
Maximum data alignment: 4
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Date/time type storage: floating-point numbers
Maximum length of locale name: 128
LC_COLLATE: es_MX.ISO-8859-1
LC_CTYPE: es_MX.ISO-8859-1

Please let me know if there is more information needed.

Regards,
Manuel.

Attachments:

selectstext/plain; charset=big5Download
#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Manuel Sugawara (#1)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Manuel Sugawara wrote:

Hi Manuel,

The funny thing is that there was no open transactions, even after
restarting the cluster the same message was logged. Today, the
database stopped working as expected:

ERROR: database is shut down to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to VACUUM in "postgres"

I suggest you look for temp tables that have not been reclaimed. We've
had a couple of reports where leftover temp tables have stopped the
frozen-xid counter from advancing. (They would have a very old
relfrozenxid.)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Manuel Sugawara
masm@fciencias.unam.mx
In reply to: Alvaro Herrera (#2)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Alvaro Herrera <alvherre@commandprompt.com> writes:

Hi Manuel,

Hi Alvaro!

I suggest you look for temp tables that have not been reclaimed.
We've had a couple of reports where leftover temp tables have
stopped the frozen-xid counter from advancing. (They would have a
very old relfrozenxid.)

Thank you very much for the suggestion. Any pointers on how to do
that? A quick serch in google didn't show anything relevant.

Regards,
Manuel.

#4Manuel Sugawara
masm@fciencias.unam.mx
In reply to: Manuel Sugawara (#3)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Manuel Sugawara <masm@fciencias.unam.mx> writes:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Hi Manuel,

Hi Alvaro!

I suggest you look for temp tables that have not been reclaimed.
We've had a couple of reports where leftover temp tables have
stopped the frozen-xid counter from advancing. (They would have a
very old relfrozenxid.)

Thank you very much for the suggestion. Any pointers on how to do
that? A quick serch in google didn't show anything relevant.

Will look into pg_class, of course. Somehow I was thinking something
else. Thanks again.

Regards,
Manuel.

#5Manuel Sugawara
masm@fciencias.unam.mx
In reply to: Alvaro Herrera (#2)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Alvaro Herrera <alvherre@commandprompt.com> writes:

Manuel Sugawara wrote:

Hi Manuel,

The funny thing is that there was no open transactions, even after
restarting the cluster the same message was logged. Today, the
database stopped working as expected:

ERROR: database is shut down to avoid wraparound data loss in database "postgres"
HINT: Stop the postmaster and use a standalone backend to VACUUM in "postgres"

I suggest you look for temp tables that have not been reclaimed. We've
had a couple of reports where leftover temp tables have stopped the
frozen-xid counter from advancing. (They would have a very old
relfrozenxid.)

In each database executed:

select relname, age(relfrozenxid) from pg_class where relnamespace = '10406'::oid;

(note that 10406 is the oid of the pg_temp_1 namespace) none of them
showed temp tables, Is this the correct way?, Any other idea?.

Regards,
Manuel.

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Manuel Sugawara (#5)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Manuel Sugawara wrote:

In each database executed:

select relname, age(relfrozenxid) from pg_class where relnamespace = '10406'::oid;

(note that 10406 is the oid of the pg_temp_1 namespace) none of them
showed temp tables, Is this the correct way?, Any other idea?.

Hmm, nope -- take away the relnamespace check, because there is a
different namespace for each backend (pg_temp_2, pg_temp_3, etc). And
as far as I've seen, most leftover temp tables are on "high" temp
schemas (i.e. those belonging to backends that are only used when the
load is high).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#6)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Alvaro Herrera <alvherre@commandprompt.com> writes:

Hmm, nope -- take away the relnamespace check, because there is a
different namespace for each backend (pg_temp_2, pg_temp_3, etc). And
as far as I've seen, most leftover temp tables are on "high" temp
schemas (i.e. those belonging to backends that are only used when the
load is high).

Yeah, because the low-numbered ones get recycled first. To have a temp
table survive for long enough to create this problem, it's pretty much
got to be in a high-numbered temp schema.

regards, tom lane

#8Manuel Sugawara
masm@fciencias.unam.mx
In reply to: Alvaro Herrera (#6)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Alvaro Herrera <alvherre@commandprompt.com> writes:

Hmm, nope -- take away the relnamespace check, because there is a
different namespace for each backend (pg_temp_2, pg_temp_3, etc).

Still no luck, changed the query to:

select relname, age(relfrozenxid) from pg_class join pg_namespace n on (n.oid = relnamespace) where nspname ~ '^pg_temp'

and no temp tables showed in any database :-(. Any other idea?.

Regards,
Manuel.

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Manuel Sugawara (#8)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Manuel Sugawara wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Hmm, nope -- take away the relnamespace check, because there is a
different namespace for each backend (pg_temp_2, pg_temp_3, etc).

Still no luck, changed the query to:

select relname, age(relfrozenxid) from pg_class join pg_namespace n on (n.oid = relnamespace) where nspname ~ '^pg_temp'

and no temp tables showed in any database :-(. Any other idea?.

None :-( Is there any table with a large age value, regardless of
temp-ness? Does the age of the oldest table correspond to the age of
pg_database.datfrozenxid? The interesting database is the one with the
largest age(pg_database.datfrozenxid).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Manuel Sugawara (#1)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Manuel Sugawara wrote:

Going back to your first message I see that I missed something
important:

1: relname = "pg_tablespace" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "2146484675" (typeid = 23, len = 4, typmod = -1, byval = t)

1: relname = "pg_pltemplate" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "2146484675" (typeid = 23, len = 4, typmod = -1, byval = t)

1: relname = "pg_shdepend" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "2146484675" (typeid = 23, len = 4, typmod = -1, byval = t)

1: relname = "pg_shdescription" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "2146484675" (typeid = 23, len = 4, typmod = -1, byval = t)

1: relname = "pg_database" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "2146484675" (typeid = 23, len = 4, typmod = -1, byval = t)

1: relname = "pg_auth_members" (typeid = 19, len = 64, typmod = -1, byval = f)
2: age = "2146484675" (typeid = 23, len = 4, typmod = -1, byval = t)

These are all shared catalogs AFAIR. Have you vacuumed those?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#11Manuel Sugawara
masm@fciencias.unam.mx
In reply to: Alvaro Herrera (#9)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Alvaro Herrera <alvherre@commandprompt.com> writes:

Manuel Sugawara wrote:

Alvaro Herrera <alvherre@commandprompt.com> writes:

Hmm, nope -- take away the relnamespace check, because there is a
different namespace for each backend (pg_temp_2, pg_temp_3, etc).

Still no luck, changed the query to:

select relname, age(relfrozenxid) from pg_class join pg_namespace n on (n.oid = relnamespace) where nspname ~ '^pg_temp'

and no temp tables showed in any database :-(. Any other idea?.

None :-( Is there any table with a large age value, regardless of
temp-ness? Does the age of the oldest table correspond to the age of
pg_database.datfrozenxid? The interesting database is the one with the
largest age(pg_database.datfrozenxid).

A friend is doing the queries right now but it takes time :-(, mean
time I was wondering if it will be safe to apply the following patch
just to get the database up and be able to run pg_dumpall:

*** postgresql-8.2.6/src/backend/access/transam/varsup.c~       2006-11-05 16:42:07.000000000 -0600
--- postgresql-8.2.6/src/backend/access/transam/varsup.c        2008-04-08 18:34:51.000000000 -0500
***************
*** 225,231 ****
         * vacuuming requires one transaction per table cleaned, we had better be
         * sure there's lots of XIDs left...)
         */
!       xidStopLimit = xidWrapLimit - 1000000;
        if (xidStopLimit < FirstNormalTransactionId)
                xidStopLimit -= FirstNormalTransactionId;
--- 225,231 ----
         * vacuuming requires one transaction per table cleaned, we had better be
         * sure there's lots of XIDs left...)
         */
!       xidStopLimit = xidWrapLimit - 500000;
        if (xidStopLimit < FirstNormalTransactionId)
                xidStopLimit -= FirstNormalTransactionId;

Regards,
Manuel.

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Manuel Sugawara (#11)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Manuel Sugawara wrote:

A friend is doing the queries right now but it takes time :-(, mean
time I was wondering if it will be safe to apply the following patch
just to get the database up and be able to run pg_dumpall:

Yes, it is safe. Just make sure to get a copy of the database out in
500k transactions ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#13Manuel Sugawara
masm@fciencias.unam.mx
In reply to: Alvaro Herrera (#12)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Alvaro Herrera <alvherre@commandprompt.com> writes:

Manuel Sugawara wrote:

A friend is doing the queries right now but it takes time :-(, mean
time I was wondering if it will be safe to apply the following patch
just to get the database up and be able to run pg_dumpall:

Yes, it is safe. Just make sure to get a copy of the database out in
500k transactions ...

Ok, will do that. Thanks a lot Alvaro. Want me to send more details to
debug the problem?.

Regards,
Manuel.

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Manuel Sugawara (#13)
Re: Cannot use a standalone backend to VACUUM in "postgres""

Manuel Sugawara wrote:

Ok, will do that. Thanks a lot Alvaro. Want me to send more details to
debug the problem?.

Yes, it would be good to know why the shared catalogs were not being
vacuumed, if you can find that out. I would have guessed that they
weren't being vacuumed due to the fact that they can only be vacuumed by
a superuser, but in a standalone backend you are always superuser. So
if you ran a database-wide vacuum, they should have been processed.

What happens if you try "vacuum pg_database", etc? Does the
age(relfrozenxid) change?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.