We've broken something in error recovery

Started by Tom Lanealmost 16 years ago4 messages
#1Tom Lane
tgl@sss.pgh.pa.us

In a somewhat misguided attempt to test something else, I did this in
CVS HEAD:

do $$begin
for i in 1 .. 10000 loop
execute 'create table t' || i::text || ' (f1 int primary key)';
end loop;
end$$;

This ran for awhile and then ran out of lock table space, which was
not surprising in hindsight:

ERROR: out of shared memory
HINT: You might need to increase max_locks_per_transaction.

But what was surprising was what happened next: the autovac launcher
immediately crashed.

TRAP: FailedAssertion("!(nestLevel > 0 && nestLevel <= GUCNestLevel)", File: "guc.c", Line: 3907)
LOG: autovacuum launcher process (PID 25220) was terminated by signal 6

Stack trace looks like

#4 0x4e85b4 in ExceptionalCondition (
conditionName=0x1ac4ac "!(nestLevel > 0 && nestLevel <= GUCNestLevel)",
errorType=0x1abf44 "FailedAssertion", fileName=0x1abee4 "guc.c",
lineNumber=3907) at assert.c:57
#5 0x501f48 in AtEOXact_GUC (isCommit=-86 '�', nestLevel=84) at guc.c:3907
#6 0x20618c in AbortTransaction () at xact.c:2194
#7 0x20688c in AbortCurrentTransaction () at xact.c:2568
#8 0x3b0f84 in AutoVacLauncherMain (argc=2063670312, argv=0x7b03b94c)
at autovacuum.c:491
#9 0x3b0bd8 in StartAutoVacLauncher () at autovacuum.c:371

Haven't dug any deeper yet --- who's touched this code lately?

regards, tom lane

#2Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#1)
Re: We've broken something in error recovery

Tom Lane wrote:

#4 0x4e85b4 in ExceptionalCondition (
conditionName=0x1ac4ac "!(nestLevel > 0 && nestLevel <= GUCNestLevel)",
errorType=0x1abf44 "FailedAssertion", fileName=0x1abee4 "guc.c",
lineNumber=3907) at assert.c:57
#5 0x501f48 in AtEOXact_GUC (isCommit=-86 '�', nestLevel=84) at guc.c:3907
#6 0x20618c in AbortTransaction () at xact.c:2194

This looks like maybe a corrupted stack - the args to AtEOXact_GUC at
that location in xact.c are hardwired.

cheers

andrew

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#2)
Re: We've broken something in error recovery

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

#5 0x501f48 in AtEOXact_GUC (isCommit=-86 '�', nestLevel=84) at guc.c:3907

This looks like maybe a corrupted stack - the args to AtEOXact_GUC at
that location in xact.c are hardwired.

No, that's just a fairly typical behavior of debugging with -O greater
than zero --- the registers holding those parameter values got recycled
for something else. This is a rather old version of gdb and it doesn't
always print <<value optimized away>> when it should.

regards, tom lane

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: We've broken something in error recovery

I wrote:

#4 0x4e85b4 in ExceptionalCondition (
conditionName=0x1ac4ac "!(nestLevel > 0 && nestLevel <= GUCNestLevel)",
errorType=0x1abf44 "FailedAssertion", fileName=0x1abee4 "guc.c",
lineNumber=3907) at assert.c:57
#5 0x501f48 in AtEOXact_GUC (isCommit=-86 '�', nestLevel=84) at guc.c:3907
#6 0x20618c in AbortTransaction () at xact.c:2194
#7 0x20688c in AbortCurrentTransaction () at xact.c:2568
#8 0x3b0f84 in AutoVacLauncherMain (argc=2063670312, argv=0x7b03b94c)
at autovacuum.c:491

On investigation I think that Assert may just be overenthusiastic.
The problem is that StartTransaction is failing at
VirtualXactLockTableInsert, for lack of any shared memory to acquire
the lock with; and then we try to do AbortTransaction and GUC is
unhappy because it's not been initialized yet. So this isn't a
new bug at all, it's been there awhile ...

regards, tom lane