memory leak regression 9.1 versus 8.1

Started by Joe Conwayover 13 years ago7 messages

mail@joeconway.com

over 13 years ago

1 attachment(s)

I'm working on an upgrade of PostgreSQL embedded in a product from
version 8.1.x to 9.1.x. One particular PL/pgSQL function is giving us an
issue as there seems to be a rather severe regression in memory usage --
a query that finishes in 8.1 causes an out of memory exception on 9.1.

Using the same data on the same machine I see memory use stay steady at
a reasonably low value on the 8.1 installation but steadily climb on 9.1
(I watched it go over 2 GB and canceled the query -- the production
machines are 32 bit)

The attached standalone script seems to reproduce the effect. On 8.1
memory usage remains steady and low, on 9.1 I watched it climb past 1.1
GB and canceled the query.

I suspect the append node to be the culprit because if I skip the "UNION
ALL", i.e. if I use one generate_series with 20 million rows instead of
10 with 2 million each, then I do not see the memory leak. The real
function is actually selecting over many inherited tables (i.e. a
partitioned table).

Thoughts?

Thanks,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Joe Conway (#1)

Re: memory leak regression 9.1 versus 8.1

Joe Conway <mail@joeconway.com> writes:

I'm working on an upgrade of PostgreSQL embedded in a product from
version 8.1.x to 9.1.x. One particular PL/pgSQL function is giving us an
issue as there seems to be a rather severe regression in memory usage --
a query that finishes in 8.1 causes an out of memory exception on 9.1.

I see no memory leak at all in this example, either in HEAD or 9.1
branch tip. Perhaps whatever you're seeing is an already-fixed bug?

Another likely theory is that you've changed settings from the 8.1
installation. I would expect this example to eat about 10 times
work_mem (due to one tuplestore for each generate_series invocation),
and that's more or less what I see happening here. A large work_mem
could look like a leak, but it isn't.

If you need further help in debugging, try launching the postmaster
under a fairly restrictive memory ulimit, so that the backend will get a
malloc failure before it starts to swap too badly. The memory map it
will then print on stderr should point to where the memory is going.

regards, tom lane

Joe Conway

mail@joeconway.com

over 13 years ago

In reply to: Tom Lane (#2)

Re: memory leak regression 9.1 versus 8.1

On 05/09/2012 03:08 PM, Tom Lane wrote:

I see no memory leak at all in this example, either in HEAD or 9.1
branch tip. Perhaps whatever you're seeing is an already-fixed bug?

Another likely theory is that you've changed settings from the 8.1
installation. I would expect this example to eat about 10 times
work_mem (due to one tuplestore for each generate_series invocation),
and that's more or less what I see happening here. A large work_mem
could look like a leak, but it isn't.

Good call -- of course that just means my contrived example fails to
duplicate the real issue :-(
In the real example, even with work_mem = 1 MB I see the same behavior
on 9.1.

If you need further help in debugging, try launching the postmaster
under a fairly restrictive memory ulimit, so that the backend will get a
malloc failure before it starts to swap too badly. The memory map it
will then print on stderr should point to where the memory is going.

Thanks -- will try that.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

Joe Conway

mail@joeconway.com

over 13 years ago

In reply to: Joe Conway (#3)

1 attachment(s)

Re: memory leak regression 9.1 versus 8.1

On 05/09/2012 03:36 PM, Joe Conway wrote:

Good call -- of course that just means my contrived example fails to
duplicate the real issue :-(
In the real example, even with work_mem = 1 MB I see the same behavior
on 9.1.

OK, new script. This more faithfully represents the real life scenario,
and reproduces the issue on HEAD with out-of-the-box config settings,
versus 8.1 which completes the query having never exceeded a very modest
memory usage:

---------------
On pg 8.1 with out of the box config:
VIRT RES SHR
199m 11m 3032
---------------
On pg head with out of the box config:
VIRT RES SHR
1671m 1.5g 16m
---------------

I have not tried your ulimit suggestion yet but will do that next.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

Joe Conway

mail@joeconway.com

over 13 years ago

In reply to: Joe Conway (#4)

1 attachment(s)

Re: memory leak regression 9.1 versus 8.1

On 05/09/2012 05:06 PM, Joe Conway wrote:

OK, new script. This more faithfully represents the real life scenario,
and reproduces the issue on HEAD with out-of-the-box config settings,
versus 8.1 which completes the query having never exceeded a very modest
memory usage:

---------------
On pg 8.1 with out of the box config:
VIRT RES SHR
199m 11m 3032
---------------
On pg head with out of the box config:
VIRT RES SHR
1671m 1.5g 16m
---------------

The attached one-liner seems to plug up the majority (although not quite
all) of the leakage.

do_convert_tuple() is allocating a new tuple for every row in the loop
and exec_stmt_return_next() is leaking it.

The query now finishes successfully. On pg head with attached patch and
out of the box config:
VIRT RES SHR
196m 35m 31m

This look sane/correct?

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support

Attachments:

memory-leak.001.difftext/x-patch; name=memory-leak.001.diffDownload

diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index de1aece..24346c2 100644
*** a/src/pl/plpgsql/src/pl_exec.c
--- b/src/pl/plpgsql/src/pl_exec.c
*************** exec_stmt_return_next(PLpgSQL_execstate 
*** 2469,2474 ****
--- 2469,2475 ----
  					{
  						tuple = do_convert_tuple(tuple, tupmap);
  						free_conversion_map(tupmap);
+ 						free_tuple = true;
  					}
  				}
  				break;

Tom Lane

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Joe Conway (#5)

Re: memory leak regression 9.1 versus 8.1

Joe Conway <mail@joeconway.com> writes:

The attached one-liner seems to plug up the majority (although not quite
all) of the leakage.

Looks sane to me. Are you planning to look for the remaining leakage?

regards, tom lane

Joe Conway

mail@joeconway.com

over 13 years ago

In reply to: Tom Lane (#6)

Re: memory leak regression 9.1 versus 8.1

On 05/09/2012 10:01 PM, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

The attached one-liner seems to plug up the majority (although not quite
all) of the leakage.

Looks sane to me. Are you planning to look for the remaining leakage?

Actually, now I'm not so sure there really are any other leaks. The last
test I ran, on 9.1 with the original data and plpgsql function, grew to:

VIRT RES SHR
540m 327m 267m

but then stabilized there through the end of the query, which
successfully returned:

count
----------
28847766
(1 row)

This was with:

report_log=# show shared_buffers;
shared_buffers
----------------
256MB
(1 row)

report_log=# show work_mem;
work_mem
----------
16MB
(1 row)

So I think those memory usage numbers look reasonable.

The bug appears to go back through 8.4 -- kind of surprising no one has
complained before.

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support