Assertion failure in REL9_5_STABLE

Started by Marko Tiikkajaover 9 years ago14 messages
#1Marko Tiikkaja
marko@joh.to
1 attachment(s)

Hi,

Running one specific test from our application against REL9_5_STABLE
(current as of today) gives me this:

#2 0x00007effb59595be in ExceptionalCondition (
conditionName=conditionName@entry=0x7effb5b27a88
"!(CritSectionCount > 0 || TransactionIdIsCurrentTransactionId((
(!((tup)->t_infomask & 0x0800) && ((tup)->t_infomask & 0x1000) &&
!((tup)->t_infomask & 0x0080)) ? HeapTupleGetUpdateXid(tup) : (
(tup)-"..., errorType=errorType@entry=0x7effb599b74b "FailedAssertion",
fileName=fileName@entry=0x7effb5b2796c "combocid.c",
lineNumber=lineNumber@entry=132) at assert.c:54
#3 0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at
combocid.c:131
#4 0x00007effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90,
tuple=tuple@entry=0x7fffcee73690, cid=346,
mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock,
follow_updates=follow_updates@entry=1 '\001',
buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680)
at heapam.c:4813
#5 0x00007effb5753e82 in ExecLockRows (node=node@entry=0x7effb6cebb00)
at nodeLockRows.c:179
#6 0x00007effb573cbc8 in ExecProcNode (node=node@entry=0x7effb6cebb00)
at execProcnode.c:516
#7 0x00007effb5739432 in ExecutePlan (dest=0x7effb5dd8160
<spi_printtupDR>, direction=<optimized out>, numberTuples=0,
sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x7effb6cebb00,
estate=0x7effb6ceb8f8)
at execMain.c:1549
#8 standard_ExecutorRun (queryDesc=0x7effb6ae3b40, direction=<optimized
out>, count=0) at execMain.c:337
#9 0x00007effb57661db in _SPI_pquery (tcount=0, fire_triggers=1 '\001',
queryDesc=0x7effb6ae3b40) at spi.c:2411

The failure is a number of levels down a call stack of PL/PgSQL
functions, but I can reproduce it at will by calling the function. I
unfortunately can't narrow it down to a smaller test case, but attached
is an xlogdump of the session. The query that finally breaks the
elephant's back is a SELECT .. FOR UPDATE from relid=54511.

Any ideas on how to debug this?

.m

Attachments:

assert.xlogtext/plain; charset=UTF-8; name=assert.xlog; x-mac-creator=0; x-mac-type=0Download
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Tiikkaja (#1)
Re: Assertion failure in REL9_5_STABLE

Marko Tiikkaja <marko@joh.to> writes:

The failure is a number of levels down a call stack of PL/PgSQL
functions, but I can reproduce it at will by calling the function. I
unfortunately can't narrow it down to a smaller test case, but attached
is an xlogdump of the session. The query that finally breaks the
elephant's back is a SELECT .. FOR UPDATE from relid=54511.

Any ideas on how to debug this?

If this test case used to pass, it'd be pretty useful to git bisect
to find where it started to fail.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Marko Tiikkaja (#1)
Re: Assertion failure in REL9_5_STABLE

Marko Tiikkaja wrote:

Hi,

Running one specific test from our application against REL9_5_STABLE
(current as of today) gives me this:

#2 0x00007effb59595be in ExceptionalCondition (
conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0
|| TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) &&
((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ?
HeapTupleGetUpdateXid(tup) : ( (tup)-"...,
errorType=errorType@entry=0x7effb599b74b "FailedAssertion",
fileName=fileName@entry=0x7effb5b2796c "combocid.c",
lineNumber=lineNumber@entry=132) at assert.c:54
#3 0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at
combocid.c:131
#4 0x00007effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90,
tuple=tuple@entry=0x7fffcee73690, cid=346,
mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock,
follow_updates=follow_updates@entry=1 '\001',
buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680) at
heapam.c:4813

Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after
already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught
the same assertion). Something is odd there ...

Can you share the test case?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Marko Tiikkaja
marko@joh.to
In reply to: Alvaro Herrera (#3)
Re: Assertion failure in REL9_5_STABLE

On 2016-08-10 19:28, Alvaro Herrera wrote:

Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after
already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught
the same assertion). Something is odd there ...

HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case.
HeapTupleSelfUpdated comes from here (line 4749):

/* if there are updates, follow the update chain */
if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
{
HTSU_Result res;
res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
GetCurrentTransactionId(),
mode);
if (res != HeapTupleMayBeUpdated)
{
result = res;
/* recovery code expects to have buffer lock held */
LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
goto failed;
}
}

Can you share the test case?

Not at this time, sorry.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Marko Tiikkaja (#4)
Re: Assertion failure in REL9_5_STABLE

Marko Tiikkaja wrote:

On 2016-08-10 19:28, Alvaro Herrera wrote:

Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after
already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught
the same assertion). Something is odd there ...

HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case.
HeapTupleSelfUpdated comes from here (line 4749):

/* if there are updates, follow the update chain */
if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask))
{
HTSU_Result res;
res = heap_lock_updated_tuple(relation, tuple, &t_ctid,
GetCurrentTransactionId(),
mode);
if (res != HeapTupleMayBeUpdated)
{
result = res;
/* recovery code expects to have buffer lock held */
LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
goto failed;
}
}

Oh, I see ... so there's an update chain, and you're updating a earlier
tuple. But the later tuple has a multixact and one of the members is
the current transaction.

I wonder how can you lock a tuple that's not the latest, where that
update chain was produced by your own transaction. I suppose this is
because of plpgsql use of cursors.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Alvaro Herrera (#5)
Re: Assertion failure in REL9_5_STABLE

BTW this is not a regression, right? It's been broken all along. Or am
I mistaken?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Marko Tiikkaja
marko@joh.to
In reply to: Alvaro Herrera (#5)
Re: Assertion failure in REL9_5_STABLE

On 2016-08-10 11:01 PM, Alvaro Herrera wrote:

Oh, I see ... so there's an update chain, and you're updating a earlier
tuple. But the later tuple has a multixact and one of the members is
the current transaction.

I wonder how can you lock a tuple that's not the latest, where that
update chain was produced by your own transaction. I suppose this is
because of plpgsql use of cursors.

There's a rolled back subtransaction that also did some magic on the
rows AFAICT. I can try and spend some time producing a smaller test
case over the weekend. No hurry since this missed the today's point
release anyway.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Marko Tiikkaja
marko@joh.to
In reply to: Alvaro Herrera (#6)
Re: Assertion failure in REL9_5_STABLE

On 2016-08-11 12:09 AM, Alvaro Herrera wrote:

BTW this is not a regression, right? It's been broken all along. Or am
I mistaken?

You're probably right. I just hadn't realized I could run our app
against 9.5 with --enable-cassert until last week.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Michael Paquier
michael.paquier@gmail.com
In reply to: Marko Tiikkaja (#8)
Re: Assertion failure in REL9_5_STABLE

On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote:

On 2016-08-11 12:09 AM, Alvaro Herrera wrote:

BTW this is not a regression, right? It's been broken all along. Or am
I mistaken?

You're probably right. I just hadn't realized I could run our app against
9.5 with --enable-cassert until last week.

Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still
see a problem?
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Marko Tiikkaja
marko@joh.to
In reply to: Michael Paquier (#9)
Re: Assertion failure in REL9_5_STABLE

On 11/08/16 8:48 AM, Michael Paquier wrote:

On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote:

On 2016-08-11 12:09 AM, Alvaro Herrera wrote:

BTW this is not a regression, right? It's been broken all along. Or am
I mistaken?

You're probably right. I just hadn't realized I could run our app against
9.5 with --enable-cassert until last week.

Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still
see a problem?

Yeah, no effect.

.m

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andres Freund
andres@anarazel.de
In reply to: Marko Tiikkaja (#10)
Re: Assertion failure in REL9_5_STABLE

On 2016-08-11 11:01:18 +0200, Marko Tiikkaja wrote:

On 11/08/16 8:48 AM, Michael Paquier wrote:

On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote:

On 2016-08-11 12:09 AM, Alvaro Herrera wrote:

BTW this is not a regression, right? It's been broken all along. Or am
I mistaken?

You're probably right. I just hadn't realized I could run our app against
9.5 with --enable-cassert until last week.

Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still
see a problem?

Yeah, no effect.

Phew. Alvaro, are you tackling this?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Andres Freund (#11)
Re: Assertion failure in REL9_5_STABLE

Andres Freund wrote:

Phew. Alvaro, are you tackling this?

Sure.

Marko Tiikkaja wrote:

There's a rolled back subtransaction that also did some magic on the rows
AFAICT. I can try and spend some time producing a smaller test case over
the weekend. No hurry since this missed the today's point release anyway.

Marko, any luck?

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Marko Tiikkaja (#1)
Re: Assertion failure in REL9_5_STABLE

Marko Tiikkaja wrote:

Running one specific test from our application against REL9_5_STABLE
(current as of today) gives me this:

#2 0x00007effb59595be in ExceptionalCondition (
conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0
|| TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) &&
((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ?
HeapTupleGetUpdateXid(tup) : ( (tup)-"...,
errorType=errorType@entry=0x7effb599b74b "FailedAssertion",
fileName=fileName@entry=0x7effb5b2796c "combocid.c",
lineNumber=lineNumber@entry=132) at assert.c:54
#3 0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at
combocid.c:131

Can you please do
frame 3
print tup->t_ctid
in gdb? That would print the tuple address with which to grep the xlog
sequence.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Marko Tiikkaja (#1)
Re: Assertion failure in REL9_5_STABLE

Marko Tiikkaja wrote:

Hi,

Running one specific test from our application against REL9_5_STABLE
(current as of today) gives me this:

Just pushed the fix. Thanks for the report!

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers