BUG #8245: Urgent:Query on slave failing with invalid memory alloc request size 18446744073709537559

Started by Kim Applegatealmost 13 years ago3 messagesbugs
Jump to latest
#1Kim Applegate
kapplegate@apsalar.com

The following bug has been logged on the website:

Bug reference: 8245
Logged by: Kim Applegate
Email address: kapplegate@apsalar.com
PostgreSQL version: 9.2.4
Operating system: OpenIndiana
Description:

One of our queries has started failing randomly on our slaves with

invalid memory alloc request size 18446744073709537559

This is eventually repeatable on any slave that has failed. Of the 64 slaves
it will fail on 1-5 of them for any given run. The stored procedure that is
failing queries on a partition with 64 children and only returns rows from 4
of the child tables.

Turning Debug on in the logs only gives this

Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.warning] [27-1] ERROR:
invalid memory alloc request size 18446744073709532101
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.warning] [27-2]
CONTEXT: SQL function "events_args_top500_week" statement 1
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [28-1] DEBUG:
shmem_exit(0): 7 callbacks to make
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [29-1] DEBUG:
proc_exit(0): 3 callbacks to make
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [30-1] DEBUG:
exit(0)
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [31-1] DEBUG:
shmem_exit(-1): 0 callbacks to make
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [32-1] DEBUG:
proc_exit(-1): 0 callbacks to make

apsalar=# select version();

--------------------------------------------------------------------------------
PostgreSQL 9.2.4 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.7.2,
64-bit
(1 row)

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kim Applegate (#1)
Re: BUG #8245: Urgent:Query on slave failing with invalid memory alloc request size 18446744073709537559

kapplegate@apsalar.com writes:

One of our queries has started failing randomly on our slaves with
invalid memory alloc request size 18446744073709537559

It's hard to say much with that amount of information. Is it always the
exact same number? The root cause is probably either corrupted data
(that is, a trashed length word for some variable-width field) or some
internal logic bug that's causing the server to miscompute how much
memory it needs for some transient allocation. You could confirm or
refute the corrupt-data hypothesis by seeing if you can pg_dump each of
the tables referenced by the failing procedure. If pg_dump fails with
the same error then it's corrupt data, else not. If it's a bug, though,
we'd still be needing more info to track it down. Don't suppose you'd
want to change that specific ERROR to a PANIC so we could get a stack
trace :-(

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#3Kim Applegate
kapplegate@apsalar.com
In reply to: Kim Applegate (#1)
Re: BUG #8245: Urgent:Query on slave failing with invalid memory alloc request size 18446744073709537559

Thank you for the advice. Shortly after submitting this bug I found

Tom Lane's suggestion here

/messages/by-id/201109082233.p88MXbGE026996@wwwmaster.postgresql.org

Once we had the core dump we were quickly able to figure out that the
problem was in one our custom functions.

Thank you for the help.

Kim Applegate

kapplegate(at)apsalar(dot)com writes:

One of our queries has started failing randomly on our slaves with
invalid memory alloc request size 18446744073709537559

It's hard to say much with that amount of information. Is it always the
exact same number? The root cause is probably either corrupted data
(that is, a trashed length word for some variable-width field) or some
internal logic bug that's causing the server to miscompute how much
memory it needs for some transient allocation. You could confirm or
refute the corrupt-data hypothesis by seeing if you can pg_dump each of
the tables referenced by the failing procedure. If pg_dump fails with
the same error then it's corrupt data, else not. If it's a bug, though,
we'd still be needing more info to track it down. Don't suppose you'd
want to change that specific ERROR to a PANIC so we could get a stack
trace :-(

regards, tom lane

On Fri, Jun 21, 2013 at 7:05 AM, <kapplegate@apsalar.com> wrote:

Show quoted text

The following bug has been logged on the website:

Bug reference: 8245
Logged by: Kim Applegate
Email address: kapplegate@apsalar.com
PostgreSQL version: 9.2.4
Operating system: OpenIndiana
Description:

One of our queries has started failing randomly on our slaves with

invalid memory alloc request size 18446744073709537559

This is eventually repeatable on any slave that has failed. Of the 64
slaves
it will fail on 1-5 of them for any given run. The stored procedure that is
failing queries on a partition with 64 children and only returns rows from
4
of the child tables.

Turning Debug on in the logs only gives this

Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.warning] [27-1] ERROR:
invalid memory alloc request size 18446744073709532101
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.warning] [27-2]
CONTEXT: SQL function "events_args_top500_week" statement 1
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [28-1] DEBUG:
shmem_exit(0): 7 callbacks to make
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [29-1] DEBUG:
proc_exit(0): 3 callbacks to make
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [30-1] DEBUG:
exit(0)
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [31-1] DEBUG:
shmem_exit(-1): 0 callbacks to make
Jun 20 06:50:53 * postgres[14825]: [ID 748848 local2.debug] [32-1] DEBUG:
proc_exit(-1): 0 callbacks to make

apsalar=# select version();

--------------------------------------------------------------------------------
PostgreSQL 9.2.4 on x86_64-pc-solaris2.11, compiled by gcc (GCC) 4.7.2,
64-bit
(1 row)