"invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Started by Frank van Vugtover 21 years ago30 messagesbugs
Jump to latest
#1Frank van Vugt
ftm.van.vugt@foxi.nl

L.S.

Situation:

- PostgreSQL v8.0.0beta5 running on x86 hardware

- a database schema that uses before/after/deferred triggers to do additional
processing on stuff like orders and invoices based on the order- and
invoice--lines; the deferred triggers are basically of the format 'update
order set ..... from (select aggregates from order_line where order_id = ...)
where id = ...' and use an updated_on field to make sure they do the
necessary work only once

The program causing the problem is a conversion tool that fills a database
thus firing the various triggers. This process is fairly lengthy (couple of
hours) and this morning I noticed that the conversion tool bailed out due to
a failed transaction commit, the log mentions:

PL/pgSQL function "tr_purchaseorder_def" line 42 at SQL statement
ERROR: invalid memory alloc request size 3248857576

The statement on line 42 is the earlier mentioned update statement.

Obviously, since the backend keeps running, it won't be easy to get my hands
on the backtrace from when this error occurs and I'm wondering about the best
way to proceed in order to produce some helpfull input to the developers.

In order to find out whether this was some hardware glitch, I immediately
restarted the process and just now it ended with exactly the same error on
exactly the same spot.

Any hints on how to dig deeper ?

--
Best,

Frank.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#1)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

ERROR: invalid memory alloc request size 3248857576

Obviously, since the backend keeps running, it won't be easy to get my hands
on the backtrace from when this error occurs and I'm wondering about the best
way to proceed in order to produce some helpfull input to the developers.

Provide a reproducible test case maybe?

Also, building with --enable-cassert might help track it down.

regards, tom lane

#3Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#2)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

I'm wondering
about the best way to proceed in order to produce some helpfull input to
the developers.

Provide a reproducible test case maybe?

Sorry, won't be that easy. I didn't encounter the problem with earlier
(smaller) data-sets and even if I would put in the hours needed in order to
strip the schema and change the conversion tool both with the uncertainty
that I'd be reproducing the proper sequence of (trigger) events, I wouldn't
be able to provide the dataset itself (i.e. non-disclosure)

Also, building with --enable-cassert might help track it down.

Will do that for starters.

Hold on, I thought that this error message would be generated from numerous
places in the code but it isn't. I'll recompile with debugging enabled and
put breakpoints on the appropriate lines to produce a backtrace.

<some work is done>

Ok, using --enable-cassert might not be such a good idea for now, since it
triggers a trap very early on in the conversion:

TRAP: FailedAssertion("!(((ntp)->t_data)->t_infomask & 0x0010)", File:
"catcache.c", Line: 1729)

So for now I have disabled assertions again and will focus on creating the
backtrace for the memory alloc problem first.

When needed / wanted I can always try and go over any assertion failures
later.

Will be able to post a backtrace in a few hours, I hope.

--
Best,

Frank.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#3)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

Ok, using --enable-cassert might not be such a good idea for now, since it
triggers a trap very early on in the conversion:

TRAP: FailedAssertion("!(((ntp)->t_data)->t_infomask & 0x0010)", File:
"catcache.c", Line: 1729)

[ raised eyebrow... ] That should definitely not be happening either.
In fact, I find that considerably more surprising than your original
report. I'd suggest chasing this first.

regards, tom lane

#5Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#4)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

[ raised eyebrow... ] That should definitely not be happening either.
In fact, I find that considerably more surprising than your original
report. I'd suggest chasing this first.

It was already well on its way, so....

Here's what's going on when the memory alloc problem occurs:

(gdb) b mcxt.c:502
Breakpoint 1 at 0x81c5667: file mcxt.c, line 502.
(gdb) b mcxt.c:523
Breakpoint 2 at 0x81c56af: file mcxt.c, line 523.
(gdb) b mcxt.c:548
Breakpoint 3 at 0x81c572f: file mcxt.c, line 548.
(gdb) b mcxt.c:612
Breakpoint 4 at 0x81c57ab: file mcxt.c, line 612.
(gdb) cont
Continuing.

Breakpoint 1, MemoryContextAlloc (context=0x86a6fa8, size=3248857576) at
mcxt.c:502
502 elog(ERROR, "invalid memory alloc request size %lu",

(gdb) bt
#0 MemoryContextAlloc (context=0x86a6fa8, size=3248857576) at mcxt.c:502
#1 0x81cb6ad in CopySnapshot (snapshot=0x8735038) at tqual.c:1205
#2 0x8101414 in postquel_start (es=0x87b52f8, fcache=0x86aa990) at
functions.c:317
#3 0x810175f in postquel_execute (es=0x87b52f8, fcinfo=0xbfffe498,
fcache=0x86aa990) at functions.c:464
#4 0x8101998 in fmgr_sql (fcinfo=0xbfffe498) at functions.c:633
#5 0x80fd261 in ExecMakeFunctionResult (fcache=0x86aa7d8, econtext=0x86aa920,
isNull=0xbfffe5d3 "", isDone=0x0) at execQual.c:1038
#6 0x80fd9c1 in ExecEvalFunc (fcache=0x86aa7d8, econtext=0x86aa920,
isNull=0xbfffe5d3 "", isDone=0x0) at execQual.c:1455
#7 0x80feee1 in ExecEvalExprSwitchContext (expression=0x86aa7d8,
econtext=0x86aa920, isNull=0xbfffe5d3 "", isDone=0x0) at execQual.c:2708
#8 0x8134cf3 in evaluate_expr (expr=0x86f1388, result_type=23) at
clauses.c:2390
#9 0x81345c7 in evaluate_function (funcid=8615860, result_type=23,
args=0x86f1370, func_tuple=0x417d98b0, context=0xbfffe788) at clauses.c:2007
#10 0x81344cc in simplify_function (funcid=8615860, result_type=23,
args=0x86f1370, allow_inline=1, context=0xbfffe788) at clauses.c:1913
#11 0x8133c2d in eval_const_expressions_mutator (node=0x87894b0,
context=0xbfffe788) at clauses.c:1245
#12 0x81359e3 in expression_tree_mutator (node=0x87881c8, mutator=0x8133b6c
<eval_const_expressions_mutator>, context=0xbfffe788) at clauses.c:3198
#13 0x8134060 in eval_const_expressions_mutator (node=0x87881a0,
context=0xbfffe788) at clauses.c:1574
#14 0x8135959 in expression_tree_mutator (node=0x86f05c8, mutator=0x8133b6c
<eval_const_expressions_mutator>, context=0xbfffe788) at clauses.c:3163
#15 0x8134343 in eval_const_expressions_mutator (node=0x86f05c8,
context=0xbfffe788) at clauses.c:1756
#16 0x813597f in expression_tree_mutator (node=0x86f1058, mutator=0x8133b6c
<eval_const_expressions_mutator>, context=0xbfffe788) at clauses.c:3184
#17 0x8134343 in eval_const_expressions_mutator (node=0x86f1058,
context=0xbfffe788) at clauses.c:1756
#18 0x8133b49 in eval_const_expressions (node=0x86f1058) at clauses.c:1152
#19 0x812d925 in preprocess_expression (parse=0x8734248, expr=0x86f1058,
kind=1) at planner.c:415
#20 0x812d6e5 in subquery_planner (parse=0x8734248, tuple_fraction=0) at
planner.c:240
#21 0x812d5bd in planner (parse=0x8734248, isCursor=0 '\000', cursorOptions=0,
boundParams=0x0) at planner.c:129
#22 0x815bb4f in pg_plan_query (querytree=0x8734248, boundParams=0x0) at
postgres.c:647
#23 0x815bbeb in pg_plan_queries (querytrees=0x86f10c8, boundParams=0x0,
needSnapshot=0 '\000') at postgres.c:715
#24 0x810b428 in _SPI_prepare_plan (
src=0x87c6a08 "UPDATE purchaseorder SET price_total_val = total_val,
valuta_id = total_valuta_id, status_id = CASE WHEN count_lines = 0 OR
count_pol_quotation > 0 THEN po_stat('PO_QUOTATION') WHEN
count_pol_ontransp"..., plan=0xbfffe8a4) at spi.c:1276
#25 0x810a291 in SPI_prepare (
src=0x87c6a08 "UPDATE purchaseorder SET price_total_val = total_val,
valuta_id = total_valuta_id, status_id = CASE WHEN count_lines = 0 OR
count_pol_quotation > 0 THEN po_stat('PO_QUOTATION') WHEN
count_pol_ontransp"..., nargs=1, argtypes=0x87b3480) at spi.c:407
#26 0x41500e02 in ?? ()
#27 0x41500fce in ?? ()
#28 0x414ffc2e in ?? ()
#29 0x414ffa4b in ?? ()
#30 0x414ff98b in ?? ()
#31 0x414ff36f in ?? ()
#32 0x414fbecc in ?? ()
#33 0x80ebadc in ExecCallTriggerFunc (trigdata=0xbfffec10, finfo=0x422da3b0,
per_tuple_context=0x8769ba0) at trigger.c:1149
#34 0x80ec860 in AfterTriggerExecute (event=0x41b84b38, rel=0x418097f0,
trigdesc=0x422d8ef0, finfo=0x422da200, per_tuple_context=0x8769ba0)
at trigger.c:1979
#35 0x80eca7f in afterTriggerInvokeEvents (events=0x87053c8, firing_id=9977,
delete_ok=1 '\001') at trigger.c:2175
#36 0x80ecc66 in AfterTriggerEndXact () at trigger.c:2376
#37 0x8090806 in CommitTransaction () at xact.c:1450
#38 0x8090c6d in CommitTransactionCommand () at xact.c:1938
#39 0x815ceba in finish_xact_command () at postgres.c:1843
#40 0x815bebf in exec_simple_query (query_string=0x82ebbd0 "COMMIT") at
postgres.c:965
#41 0x815e0db in PostgresMain (argc=4, argv=0x82a5478, username=0x82a5448
"megaconv") at postgres.c:2986
#42 0x813b85e in BackendRun (port=0x82b8538) at postmaster.c:2817
#43 0x813b121 in BackendStartup (port=0x82b8538) at postmaster.c:2453
#44 0x81399b4 in ServerLoop () at postmaster.c:1198
#45 0x81394c5 in PostmasterMain (argc=3, argv=0x82a4038) at postmaster.c:917
#46 0x8111e11 in main (argc=3, argv=0xbffff914) at main.c:268
#47 0x400d4577 in __libc_start_main () from /lib/libc.so.6

In the mean time, I have compiled postgres on another machine (different
version of slackware) with assertions on, but being a fresh install I needed
to re-run the first phase of conversion which basically is a copy tables /
create indexes loop. However, during the read of *some* of the tables I see
this :

Program received signal SIGUSR1, User defined signal 1.
0x40109ac1 in kill () from /lib/libc.so.6
(gdb) where
#0 0x40109ac1 in kill () from /lib/libc.so.6
#1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at
pmsignal.c:69
#2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x837aa94) at
sinvaladt.c:229
#3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x837aa94) at sinval.c:122
#4 0x081ffd18 in ProcessInvalidationMessages (hdr=0x837a73c, func=0x818f7c0
<SendSharedInvalidMessage>) at inval.c:344
#5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674
#6 0x080a2548 in CommitTransaction () at xact.c:1543
#7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938
#8 0x0819caef in finish_xact_command () at postgres.c:1843
#9 0x0819b795 in exec_simple_query (query_string=0x834ee8c "create index
lijst03_table_idx15 on lijst03_table(fn16)") at postgres.c:965
#10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc
"megaconv") at postgres.c:2986
#11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817
#12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453
#13 0x0816ff6f in ServerLoop () at postmaster.c:1198
#14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917
#15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268
#16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6

I'm able to issue 'cont' to move on, and it's not like the server load is too
high or something.

Will proceed to the first assertion TRAP mentioned earlier.

--
Best,

Frank.

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#5)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

Here's what's going on when the memory alloc problem occurs:

AFAICS this can only be explained as a memory stomp on the static data
structures SerializableSnapshotData or LatestSnapshotData. That seems a
tad improbable. I wonder if what you have is a build problem. Have you
tried a full "make distclean", configure, rebuild from scratch?

regards, tom lane

#7Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#6)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Here's what's going on when the memory alloc problem occurs:

AFAICS this can only be explained as a memory stomp on the static data
structures SerializableSnapshotData or LatestSnapshotData. That seems a
tad improbable. I wonder if what you have is a build problem. Have you
tried a full "make distclean", configure, rebuild from scratch?

Yep, I always do a distclean when changing configuration options, I've been
bitten before ;)

Mind you, I am using the earlier discussed patch to avoid the glibc bug :

******************************************************
$> more oracle_compat.diff
--- src/backend/utils/adt/oracle_compat.c_orig  Mon Nov 29 12:25:20 2004
+++ src/backend/utils/adt/oracle_compat.c       Mon Nov 29 12:26:49 2004
@@ -43,7 +43,7 @@
  * We assume if we have these two functions, we have their friends too, and
  * can use the wide-character method.
  */
-#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER)
+#if defined(HAVE_WCSTOMBS) && defined(HAVE_TOWLOWER) && FALSE
 #define USE_WIDE_UPPER_LOWER
 #endif
******************************************************

I could reproduce this case tomorrow and inspect some variables or call
abort() if you think that could be of any help?

As far as the other problem is concerned, I just finished the first half of
the conversion on a different machine that was build with assertions enabled
and saw a number of reports like the ones below. What's causing these?

I'll start the second half in a minute, which should reproduce the TRAP pretty
soon. Will post a backtrace of that asap as well.

Program received signal SIGUSR1, User defined signal 1.
0x40109ac1 in kill () from /lib/libc.so.6
(gdb) where
#0 0x40109ac1 in kill () from /lib/libc.so.6
#1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at
pmsignal.c:69
#2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x83e4af8) at
sinvaladt.c:229
#3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x83e4af8) at sinval.c:122
#4 0x081ffcd8 in ProcessInvalidationMessages (hdr=0x83dd6a4, func=0x818f7c0
<SendSharedInvalidMessage>) at inval.c:343
#5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674
#6 0x080a2548 in CommitTransaction () at xact.c:1543
#7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938
#8 0x0819caef in finish_xact_command () at postgres.c:1843
#9 0x0819b795 in exec_simple_query (query_string=0x834ee8c "drop table
lijst45_table") at postgres.c:965
#10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc
"megaconv") at postgres.c:2986
#11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817
#12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453
#13 0x0816ff6f in ServerLoop () at postmaster.c:1198
#14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917
#15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268
#16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6

Program received signal SIGUSR1, User defined signal 1.
0x40109ac1 in kill () from /lib/libc.so.6
(gdb) where
#0 0x40109ac1 in kill () from /lib/libc.so.6
#1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at
pmsignal.c:69
#2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x8408e20) at
sinvaladt.c:229
#3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x8408e20) at sinval.c:122
#4 0x081ffcd8 in ProcessInvalidationMessages (hdr=0x8407844, func=0x818f7c0
<SendSharedInvalidMessage>) at inval.c:343
#5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674
#6 0x080a2548 in CommitTransaction () at xact.c:1543
#7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938
#8 0x0819caef in finish_xact_command () at postgres.c:1843
#9 0x0819b795 in exec_simple_query (
query_string=0x834f0a4 "create table lijst55_table(invoice_id
int,serial_id int,payment_cond text,old_debtor_id text,debtor_descr
text,korting numeric,magazijncode int,pakbon_nr text,totaal_bruto
numeric,totaal_netto numeric"...) at postgres.c:965
#10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc
"megaconv") at postgres.c:2986
#11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817
#12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453
#13 0x0816ff6f in ServerLoop () at postmaster.c:1198
#14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917
#15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268
#16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6
(gdb)

Program received signal SIGUSR1, User defined signal 1.
0x40109ac1 in kill () from /lib/libc.so.6
(gdb) where
#0 0x40109ac1 in kill () from /lib/libc.so.6
#1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE) at
pmsignal.c:69
#2 0x081906a8 in SIInsertDataEntry (segP=0xdbd, data=0x842bb40) at
sinvaladt.c:229
#3 0x0818f7e2 in SendSharedInvalidMessage (msg=0x842bb40) at sinval.c:122
#4 0x081ffcd8 in ProcessInvalidationMessages (hdr=0x842514c, func=0x818f7c0
<SendSharedInvalidMessage>) at inval.c:343
#5 0x082001d6 in AtEOXact_Inval (isCommit=1 '\001') at inval.c:674
#6 0x080a2548 in CommitTransaction () at xact.c:1543
#7 0x080a2af5 in CommitTransactionCommand () at xact.c:1938
#8 0x0819caef in finish_xact_command () at postgres.c:1843
#9 0x0819b795 in exec_simple_query (query_string=0x834ee8c "drop table
lijst61_table") at postgres.c:965
#10 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b0c, username=0x8308adc
"megaconv") at postgres.c:2986
#11 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817
#12 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453
#13 0x0816ff6f in ServerLoop () at postmaster.c:1198
#14 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917
#15 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268
#16 0x400f5d06 in __libc_start_main () from /lib/libc.so.6
(gdb) cont

--
Best,

Frank.

#8Alvaro Herrera
alvherre@dcc.uchile.cl
In reply to: Frank van Vugt (#7)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

On Fri, Dec 03, 2004 at 01:03:43AM +0100, Frank van Vugt wrote:

As far as the other problem is concerned, I just finished the first half of
the conversion on a different machine that was build with assertions enabled
and saw a number of reports like the ones below. What's causing these?

This happens when a transaction finishes that creates, drops, or
modifies a user. If it happens at a different time, something is wrong.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"Oh, great altar of passive entertainment, bestow upon me thy discordant images
at such speed as to render linear thought impossible" (Calvin a la TV)

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#7)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

I could reproduce this case tomorrow and inspect some variables or call
abort() if you think that could be of any help?

Do you have hardware watchpoints on that machine? What I'd be inclined
to try is tracing through the transaction to see where
SerializableSnapshotData.xcnt or LatestSnapshotData.xcnt gets clobbered
from. It'd be mighty tedious if you have to fall back on a software
watchpoint though.

Also, I gather from the xact.c traceback that the transaction looks like

begin;
savepoint a;
...
commit; -- without releasing the savepoint

Does anything happen before the SAVEPOINT? Now that you see which
plpgsql function is failing, do you have a better shot at making
a self-contained example?

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#8)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Alvaro Herrera <alvherre@dcc.uchile.cl> writes:

On Fri, Dec 03, 2004 at 01:03:43AM +0100, Frank van Vugt wrote:

As far as the other problem is concerned, I just finished the first half of
the conversion on a different machine that was build with assertions enabled
and saw a number of reports like the ones below. What's causing these?

This happens when a transaction finishes that creates, drops, or
modifies a user. If it happens at a different time, something is wrong.

Given that the call site is SIInsertDataEntry, it's clear that
SendPostmasterSignal is actually being told PMSIGNAL_WAKEN_CHILDREN ...
the gdb traceback is reporting the wrong value for the parameter, which
is no big surprise for values in registers.

Which makes this a perfectly normal event; it's just trying to flush the
SI messaging queue before it overflows. Nothing to see here, move
along.

regards, tom lane

#11Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Alvaro Herrera (#8)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

#1 0x0818e9e1 in SendPostmasterSignal (reason=PMSIGNAL_PASSWORD_CHANGE)
at pmsignal.c:69

As far as the other problem is concerned, I just finished the first half
of the conversion on a different machine that was build with assertions
enabled and saw a number of reports like the ones below. What's causing
these?

This happens when a transaction finishes that creates, drops, or
modifies a user. If it happens at a different time, something is wrong.

The code for the first part of the conversion does not contain any statements
for create/alter/drop user, it doesn't even fire triggers. Basically it's
just looping over the following set of statements for a number of textfiles:

- read, validate and post-process the contents of a text file
- write the result into another text file
- (try to) drop the table
- create the table
- grant select on this table to public
- copy table from a textfile

That's it, so there is no explicit user-handling at all.

Anything particular I can take a look at?

--
Best,

Frank.

#12Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#9)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

I could reproduce this case tomorrow and inspect some variables or call
abort() if you think that could be of any help?

Do you have hardware watchpoints on that machine?

Mmm, don't know for sure, will have a look tomorrow.

What I'd be inclined
to try is tracing through the transaction to see where
SerializableSnapshotData.xcnt or LatestSnapshotData.xcnt gets clobbered
from. It'd be mighty tedious if you have to fall back on a software
watchpoint though.

A hint on the expression(s) to use would be appreciated !

Also, I gather from the xact.c traceback that the transaction looks like

begin;
savepoint a;
...
commit; -- without releasing the savepoint

Does anything happen before the SAVEPOINT?

Actually, the SQL involved is v7.4.6, so there are no savepoints to begin with
(or arguably, there's nothing 'before' a savepoint ;))

Now that you see which plpgsql function is failing, do you have a better
shot at making a self-contained example?

Not really, but if tracing the transaction won't reveal anything else I guess
I will have no choice..... ;-(

--
Best,

Frank.

#13Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#10)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Ok, here are the results of the assertion TRAP (mind you, same error on the
same line, but a different machine than the original)

database log:

TRAP: FailedAssertion("!(((ntp)->t_data)->t_infomask & 0x0010)", File:
"catcache.c", Line: 1729)

gdb:

Program received signal SIGABRT, Aborted.
0x40109ac1 in kill () from /lib/libc.so.6
(gdb) where
#0 0x40109ac1 in kill () from /lib/libc.so.6
#1 0x401096eb in raise () from /lib/libc.so.6
#2 0x4010b127 in abort () from /lib/libc.so.6
#3 0x08208036 in ExceptionalCondition (conditionName=0x40212d24 "8,\023",
errorType=0x6 <Address 0x6 out of bounds>, fileName=0x0, lineNumber=1729)
at assert.c:51
#4 0x081ff88f in build_dummy_tuple (cache=0x41469048, nkeys=1,
skeys=0xbfffe410) at catcache.c:1735
#5 0x081fe917 in SearchCatCache (cache=0x41469048, v1=0, v2=0, v3=0, v4=0) at
catcache.c:1240
#6 0x08173908 in pgstat_vacuum_tabstat () at pgstat.c:817
#7 0x081156d9 in vacuum (vacstmt=0x8342b5c) at vacuum.c:272
#8 0x0819f7cc in PortalRunUtility (portal=0x835bf2c, query=0x8342bb8,
dest=0x8342a3c, completionTag=0xbfffebd0 "") at pquery.c:934
#9 0x0819fb01 in PortalRunMulti (portal=0x835bf2c, dest=0x8342a3c,
altdest=0x8342a3c, completionTag=0xbfffebd0 "") at pquery.c:1001
#10 0x0819f174 in PortalRun (portal=0x835bf2c, count=2147483647,
dest=0x8342a3c, altdest=0x8342a3c, completionTag=0xbfffebd0 "") at
pquery.c:617
#11 0x0819b760 in exec_simple_query (query_string=0x8342824 "vacuum full
analyse ean_available") at postgres.c:933
#12 0x0819dce0 in PostgresMain (argc=4, argv=0x8308b04, username=0x8308adc
"vugtf") at postgres.c:2986
#13 0x08171f31 in BackendRun (port=0x83177f0) at postmaster.c:2817
#14 0x08171991 in BackendStartup (port=0x83177f0) at postmaster.c:2453
#15 0x0816ff6f in ServerLoop () at postmaster.c:1198
#16 0x0816f4a3 in PostmasterMain (argc=3, argv=0x8307788) at postmaster.c:917
#17 0x0813d22d in main (argc=3, argv=0x8307788) at main.c:268
#18 0x400f5d06 in __libc_start_main () from /lib/libc.so.6

Obviously, continuing was not an option ;)

--
Best,

Frank.

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#12)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

What I'd be inclined
to try is tracing through the transaction to see where
SerializableSnapshotData.xcnt or LatestSnapshotData.xcnt gets clobbered
from. It'd be mighty tedious if you have to fall back on a software
watchpoint though.

A hint on the expression(s) to use would be appreciated !

(gdb) watch SerializableSnapshotData.xcnt
Hardware watchpoint 1: SerializableSnapshotData.xcnt
(gdb) watch LatestSnapshotData.xcnt
Hardware watchpoint 2: LatestSnapshotData.xcnt
(gdb) cont
Continuing.

The only "authorized" place to set these is the assignment near the
bottom of GetSnapshotData(), which would normally get hit about
once per command.

regards, tom lane

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#12)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

Does anything happen before the SAVEPOINT?

Actually, the SQL involved is v7.4.6,

Not with that backtrace, it isn't.

regards, tom lane

#16Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#15)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Does anything happen before the SAVEPOINT?

Actually, the SQL involved is v7.4.6,

Not with that backtrace, it isn't.

Excuse me?

I'm running v8.0.0beta5, but the sql-statements that are in my source-tree are
all still 'based on v7.4.6', i.e. there is no occurence of the word
'savepoint' in the whole tree.

--
Best,

Frank.

#17Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#14)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

A hint on the expression(s) to use would be appreciated !

(gdb) watch SerializableSnapshotData.xcnt

I could have thought of that ;)

(I misinterpreted what you meant by SerializableSnapshotData.xcnt, wouldn't
have asked otherwise)

Anyway, I just confirmed that hardware watchpoints are available, so I'll
follow this up first thing in the morning.

--
Best,

Frank.

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#16)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

Does anything happen before the SAVEPOINT?

Actually, the SQL involved is v7.4.6,

Not with that backtrace, it isn't.

Excuse me?

I'm running v8.0.0beta5,

Oh, I thought you meant the backend was 7.4.6.

but the sql-statements that are in my source-tree are
all still 'based on v7.4.6', i.e. there is no occurence of the word
'savepoint' in the whole tree.

Hmm. The line number referenced for CommitTransactionCommand() is
definitely in code that is only entered after a savepoint, but I see
it is an identical code path to some other arms of the switch()
command. Probably the compiler decided to merge those arms. You might
want to back off the compiler optimization level a step so you can get
more readable tracebacks ...

regards, tom lane

#19Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#18)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Hmm. The line number referenced for CommitTransactionCommand() is
definitely in code that is only entered after a savepoint, but I see
it is an identical code path to some other arms of the switch()
command. Probably the compiler decided to merge those arms. You might
want to back off the compiler optimization level a step so you can get
more readable tracebacks ...

Yup, will do that as well.

Will read any comments you may have on the TRAP backtrace in a couple of
hours, need to take myself offline for a while ;)

--
Best,

Frank.

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#13)
Re: "invalid memory alloc request size <n>" in deferred trigger causes transaction to fail, but the backend keeps running

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:

Ok, here are the results of the assertion TRAP (mind you, same error on the
same line, but a different machine than the original)

That's another can't-happen problem. As best I can tell, this suggests
that the tuple descriptor associated with the RELOID catalog cache has
gotten clobbered (it should have the HASOID bit set, but an Assert that
says it does is failing).

Either you've found some code path that clobbers random bits of memory,
or you've got flaky hardware. But the same error on two different
machines makes the second theory a bit hard to swallow.

What's the hardware platform exactly? Is it 64-bit, or anything at all
off the beaten path?

regards, tom lane

#21Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#1)
#22Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#22)
#24Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#23)
#25Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#24)
#26Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#25)
#27Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#25)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#27)
#29Frank van Vugt
ftm.van.vugt@foxi.nl
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Frank van Vugt (#29)