Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

Started by Dorochevsky,Michelalmost 19 years ago7 messagesbugs
Jump to latest
#1Dorochevsky,Michel
michel.dorochevsky@softcon.de

Question: do you have any leftover files in $PGDATA/pg_twophase ?

I'm wondering why the log contains no warning messages about stale
two-phase state files. It looks to me like the system should have
found the two-phase file still there upon restart, but the transaction
should have been marked already committed.

BTW, can you tell whether the failing transactions actually were committed
--- are their effects still visible in the database?

regards, tom lane

Tom,
Thanks for your continuous support, I appreciate a lot.

The failing transaction is visible in the database after restart, I have
checked three of the last inserts, e.g.
2007-04-21 18:06:18.921 20160 LOG: execute <unnamed>: insert into
CHECKRESULT (COMMENT, POSITIONINCHAIN, MDSD_OPT_LOCK, MDSD_CLASS, ID) values
($1, $2, $3, 'CheckResult', $4)
2007-04-21 18:06:18.921 20160 DETAIL: parameters: $1 = 'geht schon', $2 =
'2', $3 = '2007-04-21 18:06:18.64', $4 = '4046'
is visible. I can tell from the application, that this record will never be
updated later on and always has the current timestamp.

I have no leftover file in $PGDATA/pg_twophase, it is empty.

Best Regards
-- Michel

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dorochevsky,Michel (#1)
Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

"Dorochevsky,Michel" <michel.dorochevsky@softcon.de> writes:

The failing transaction is visible in the database after restart, I have
checked three of the last inserts, e.g.

Good, at least we're not losing data ;-). But I expected that because
this PANIC must be occurring after the RecordTransactionCommitPrepared
step.

I have no leftover file in $PGDATA/pg_twophase, it is empty.

[ digs in code some more... ] Oh, I see how that happens: the 2PC
state file is removed when the XLOG_XACT_COMMIT_PREPARED xlog entry
is replayed, so the various code paths that might emit a warning
won't be reached.

Heikki, have you been paying attention to this thread? You have any
idea what's happening? The whole thing seems pretty unexplainable
to me, especially since Michel's log shows this happening without any
concurrent activity that might confuse matters. I confess bafflement.

regards, tom lane

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#2)
Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

Tom Lane wrote:

"Dorochevsky,Michel" <michel.dorochevsky@softcon.de> writes:

The failing transaction is visible in the database after restart, I have
checked three of the last inserts, e.g.

Good, at least we're not losing data ;-). But I expected that because
this PANIC must be occurring after the RecordTransactionCommitPrepared
step.

I have no leftover file in $PGDATA/pg_twophase, it is empty.

[ digs in code some more... ] Oh, I see how that happens: the 2PC
state file is removed when the XLOG_XACT_COMMIT_PREPARED xlog entry
is replayed, so the various code paths that might emit a warning
won't be reached.

Heikki, have you been paying attention to this thread? You have any
idea what's happening? The whole thing seems pretty unexplainable
to me, especially since Michel's log shows this happening without any
concurrent activity that might confuse matters. I confess bafflement.

Oh, no I wasn't. I'm up to speed now.

I can't see any way that can happen either. There's some other
transactions running, but not at the time of prepare or commit. And
there's no other errors or unusual activity in the logs.

The only thing I can think of is that a lock is released between the
calls to AtPrepare_Locks and PostPrepare_Locks. But I don't see how that
could happen.

I think we need to see more debug-information. Is there a debug- and
assertion-enabled binary available for Windows?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#4Dave Page
dpage@pgadmin.org
In reply to: Heikki Linnakangas (#3)
Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

Heikki Linnakangas wrote:

I think we need to see more debug-information. Is there a debug- and
assertion-enabled binary available for Windows?

Unfortunately no - 95% of the time we've found that Mingw/gdb on windows
simply doesn't work. That's one of the major reasons why we're working
on moving to VC++.

I can build one tomorrow if you want to try for the 5%. What version was
this?

Regards, Dave.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dave Page (#4)
Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

Dave Page <dpage@postgresql.org> writes:

Heikki Linnakangas wrote:

I think we need to see more debug-information. Is there a debug- and
assertion-enabled binary available for Windows?

Unfortunately no - 95% of the time we've found that Mingw/gdb on windows
simply doesn't work. That's one of the major reasons why we're working
on moving to VC++.

I can build one tomorrow if you want to try for the 5%. What version was
this?

Having assertions turned on would be useful regardless of debug support.
What I was going to suggest was to add some detail printout to the PANIC
message --- in particular, dump the fields of the problem LOCKTAG, so
we can at least find out *what* lock is being lost. If you build a
custom copy for Michel, please add this patch (untested but should work):

*** src/backend/storage/lmgr/lock.c.orig	Thu Feb  1 15:09:33 2007
--- src/backend/storage/lmgr/lock.c	Sun Apr 22 16:17:01 2007
***************
*** 2430,2437 ****
  												HASH_FIND,
  												NULL);
  	if (!lock)
! 		elog(PANIC, "failed to re-find shared lock object");
! 
  	/*
  	 * Re-find the proclock object (ditto).
  	 */
--- 2430,2443 ----
  												HASH_FIND,
  												NULL);
  	if (!lock)
! 		elog(PANIC, "failed to re-find shared lock object: %u %u %u %u %u %u",
! 			 locktag->locktag_field1,
! 			 locktag->locktag_field2,
! 			 locktag->locktag_field3,
! 			 locktag->locktag_field4,
! 			 locktag->locktag_type,
! 			 locktag->locktag_lockmethodid);
! } 
  	/*
  	 * Re-find the proclock object (ditto).
  	 */

regards, tom lane

#6Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Dave Page (#4)
Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

Dave Page wrote:

Heikki Linnakangas wrote:

I think we need to see more debug-information. Is there a debug- and
assertion-enabled binary available for Windows?

Unfortunately no - 95% of the time we've found that Mingw/gdb on windows
simply doesn't work. That's one of the major reasons why we're working
on moving to VC++.

I'm not so much interested in using gdb, but in having assertions
enabled and getting the output of LOCK_DEBUG.

I can build one tomorrow if you want to try for the 5%. What version was
this?

Thanks, it was 8.2.3.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#7Dave Page
dpage@pgadmin.org
In reply to: Heikki Linnakangas (#6)
Re: BUG #3245: PANIC: failed to re-find shared loc k ob ject

Heikki Linnakangas wrote:

I can build one tomorrow if you want to try for the 5%. What version was
this?

Thanks, it was 8.2.3.

Actually, no reason this needs to wait until I'm in the office.

Michel; I've uploaded an 8.2.3 postgres.exe to
http://developer.pgadmin.org/~dpage/postgres-8.2.3-debug.zip. This is
the same as the release version, but configured with --enable-debug
--enable-cassert, and patched with Tom's patch.

Regards, Dave.