Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"

Started by John Smithalmost 18 years ago4 messagesbugs
Jump to latest
#1John Smith
sodgodofall@gmail.com

Hi,

I hit an issue running PG 8.2.3 with the continuous archiving feature
where I was unable to recover from the backup. I was wondering if
this may be related to bug #3245?

These are the steps that occurred before I saw this problem:

1. Prepare transaction.
2. A base backup of the database was taken to a warm standby system.
3. Commit prepared. The commit prepared never finished as it hit a PANIC:

2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
page PANIC: failed to re-find shared lock object
2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
page STATEMENT: commit prepared '148969' ;

I believe this panic is probably bug #3245 based on the description of
that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php

At this point I attempted to do a recovery using the continuous
archive backup on the warm standby system. Instead of recovering
correctly it encountered this FATAL error where a AccessSharedLock was
already held.

2008-06-18 00:05:34.045 Local time zone must be set--see zic manual
page LOG: database system was interrupted at 2008-06-17 23:53:16
Local time zone must be set--see zic manual page
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: checkpoint record is at 70/E600DC18
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: redo record is at 70/E600DC18; undo record is at 0/0;
shutdown FALSE
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: next transaction ID: 0/1099178; next OID: 413234
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: next MultiXactId: 1; next MultiXactOffset: 0
2008-06-18 00:05:34.077 Local time zone must be set--see zic manual
page LOG: database system was not properly shut down; automatic
recovery in progress
2008-06-18 00:05:34.105 Local time zone must be set--see zic manual
page LOG: redo starts at 70/E600DC68
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG: could not open file "pg_xlog/0000000100000070000000E7" (log
file 112, segment 231): No such file or directory
2008-06-18 00:05:34.106 Local time zone must be set--see zic manual
page LOG: redo done at 70/E600DC68
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099169
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099156
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099157
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099161
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099164
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099162
2008-06-18 00:05:34.293 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099166
2008-06-18 00:05:34.294 Local time zone must be set--see zic manual
page LOG: recovering prepared transaction 1099131
2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
page FATAL: lock AccessShareLock on object 16477/244169/0 is already
held
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: startup process (PID 17377) exited with exit code 1
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: aborting startup due to startup process failure

Is this FATAL error seen on recovery a different bug or is it just a
direct result of bug #3245?

Unfortunately I do not have a way to deterministically reproduce this
problem but I have seen it 3 times so far.

thanks,

John

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Smith (#1)
Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"

"John Smith" <sodgodofall@gmail.com> writes:

2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
page PANIC: failed to re-find shared lock object
2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
page STATEMENT: commit prepared '148969' ;

I believe this panic is probably bug #3245 based on the description of
that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php

Yeah, looks like it to me too.

At this point I attempted to do a recovery using the continuous
archive backup on the warm standby system. Instead of recovering
correctly it encountered this FATAL error where a AccessSharedLock was
already held.
2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
page FATAL: lock AccessShareLock on object 16477/244169/0 is already
held
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: startup process (PID 17377) exited with exit code 1
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: aborting startup due to startup process failure

Is this FATAL error seen on recovery a different bug or is it just a
direct result of bug #3245?

It probably is the same bug. The underlying cause of that bug is
explained here:
http://archives.postgresql.org/pgsql-bugs/2007-04/msg00129.php
I think what you are seeing is just a variant case caused by the same
lock being written out to the twophase file twice. In any case there's
probably little point in digging further until you've updated to a
version with that fix --- if you still see the problem afterward,
we can look closer.

BTW, what's with the bizarre "Local time zone must be set--see zic
manual" where the timezone should be? Are you intentionally selecting
the "Factory" zone?

regards, tom lane

#3John Smith
sodgodofall@gmail.com
In reply to: Tom Lane (#2)
Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"

Thanks for the quick reply Tom. I'll be updating my PG version to one
with a fix for bug #3245 so hopefully we won't see this anymore.

BTW, what's with the bizarre "Local time zone must be set--see zic
manual" where the timezone should be? Are you intentionally selecting
the "Factory" zone?

I don't think I've put the correct timezone file in /etc/localtime so
it is using some default file from the Gentoo install.

John

Show quoted text

On Mon, Jun 30, 2008 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"John Smith" <sodgodofall@gmail.com> writes:

2008-06-17 23:53:53.206 Local time zone must be set--see zic manual
page PANIC: failed to re-find shared lock object
2008-06-17 23:53:53.207 Local time zone must be set--see zic manual
page STATEMENT: commit prepared '148969' ;

I believe this panic is probably bug #3245 based on the description of
that bug - http://archives.postgresql.org/pgsql-bugs/2007-04/msg00075.php

Yeah, looks like it to me too.

At this point I attempted to do a recovery using the continuous
archive backup on the warm standby system. Instead of recovering
correctly it encountered this FATAL error where a AccessSharedLock was
already held.
2008-06-18 00:05:34.298 Local time zone must be set--see zic manual
page FATAL: lock AccessShareLock on object 16477/244169/0 is already
held
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: startup process (PID 17377) exited with exit code 1
2008-06-18 00:05:34.299 Local time zone must be set--see zic manual
page LOG: aborting startup due to startup process failure

Is this FATAL error seen on recovery a different bug or is it just a
direct result of bug #3245?

It probably is the same bug. The underlying cause of that bug is
explained here:
http://archives.postgresql.org/pgsql-bugs/2007-04/msg00129.php
I think what you are seeing is just a variant case caused by the same
lock being written out to the twophase file twice. In any case there's
probably little point in digging further until you've updated to a
version with that fix --- if you still see the problem afterward,
we can look closer.

BTW, what's with the bizarre "Local time zone must be set--see zic
manual" where the timezone should be? Are you intentionally selecting
the "Factory" zone?

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: John Smith (#3)
Re: Recovery failed on a backup with " lock AccessShareLock on object 16477/244169/0 is already held"

"John Smith" <sodgodofall@gmail.com> writes:

BTW, what's with the bizarre "Local time zone must be set--see zic
manual" where the timezone should be? Are you intentionally selecting
the "Factory" zone?

I don't think I've put the correct timezone file in /etc/localtime so
it is using some default file from the Gentoo install.

Ah, yes, I was able to duplicate that behavior by overwriting
/etc/localtime with /usr/share/zoneinfo/Factory. I guess the Gentoo
folks failed in their intention to annoy you enough to make you set
the zone correctly ;-)

regards, tom lane