Memory leaks in record_out and record_send

Started by Tom Laneabout 13 years ago10 messages
#1Tom Lane
tgl@sss.pgh.pa.us

I looked into the problem complained of here:
http://archives.postgresql.org/pgsql-general/2012-11/msg00279.php
which turns out to have nothing to do with joins and everything to do
with the fact that record_out() leaks memory like mad. It leaks both
the strings returned by the per-column output functions and any column
values that have to be detoasted. You can easily reproduce this with
an example like

create table leak (f1 int, f2 text);

insert into leak select x, 'foo' from generate_series(1,1000000) x;

select leak from leak;

The attached patch against HEAD fixes this, as well as a similar
leakage in record_send(). The added code is lifted directly from
printtup() so it's not adding any new assumptions to the system.

I wonder though if we ought to think about running output functions in
a short-lived memory context instead of the executor's main context.
We've considered that before, I think, and it's always been the path
of least resistance to fix the output functions instead --- but there
will always be another leak I'm afraid.

OTOH I can't see trying to back-patch a solution like that. If we want
to fix this in the back branches (and note the complaint linked above is
against 8.3), I think we have to do it as attached.

Thoughts?

regards, tom lane

#2Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#1)
Re: Memory leaks in record_out and record_send

Tom Lane <tgl@sss.pgh.pa.us> writes:

OTOH I can't see trying to back-patch a solution like that. If we want
to fix this in the back branches (and note the complaint linked above is
against 8.3), I think we have to do it as attached.

Thoughts?

I've been using textin(record_out(NEW)) in generic partitioning
triggers, and you can find examples of that trick in the wiki, so I
think we have users of that in the field.

Please indeed do consider backpatching!

I don't have an opinion on the opportunity to use a shorter memory
context, I feel that would need some more involved analytics than my
brainpower of the moment allows me to consider.

Thanks,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#2)
Re: Memory leaks in record_out and record_send

Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

Thoughts?

I've been using textin(record_out(NEW)) in generic partitioning
triggers, and you can find examples of that trick in the wiki, so I
think we have users of that in the field.

I think explicit calls like that actually wouldn't be a problem,
since they'd be run in a per-tuple context anyway. The cases that
are problematic are hard-coded I/O function calls. I'm worried
about the ones like, say, plpgsql's built-in conversion operations.
We could probably fix printtup's usage with some confidence, but
there are a lot of other ones.

regards, tom lane

#4Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Tom Lane (#3)
Re: Memory leaks in record_out and record_send

Tom Lane <tgl@sss.pgh.pa.us> writes:

I think explicit calls like that actually wouldn't be a problem,
since they'd be run in a per-tuple context anyway. The cases that
are problematic are hard-coded I/O function calls. I'm worried
about the ones like, say, plpgsql's built-in conversion operations.
We could probably fix printtup's usage with some confidence, but
there are a lot of other ones.

That's a good reason to get them into a shorter memory context, but
which? per transaction maybe? shorter?

--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dimitri Fontaine (#4)
Re: Memory leaks in record_out and record_send

Dimitri Fontaine <dimitri@2ndQuadrant.fr> writes:

Tom Lane <tgl@sss.pgh.pa.us> writes:

I think explicit calls like that actually wouldn't be a problem,
since they'd be run in a per-tuple context anyway. The cases that
are problematic are hard-coded I/O function calls. I'm worried
about the ones like, say, plpgsql's built-in conversion operations.
We could probably fix printtup's usage with some confidence, but
there are a lot of other ones.

That's a good reason to get them into a shorter memory context, but
which? per transaction maybe? shorter?

It would have to be per-tuple to do any good. The existing behavior
is per-query and causes problems if lots of rows are output. In plpgsql
it would be a function-call-lifespan leak.

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#1)
Re: Memory leaks in record_out and record_send

On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder though if we ought to think about running output functions in
a short-lived memory context instead of the executor's main context.
We've considered that before, I think, and it's always been the path
of least resistance to fix the output functions instead --- but there
will always be another leak I'm afraid.

Such is the lot of people who code in C. I worry that the number of
memory contexts we're kicking around already is imposing a significant
distributed overhead on the system that is hard to measure but
nevertheless real, and that this will add to it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: Memory leaks in record_out and record_send

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder though if we ought to think about running output functions in
a short-lived memory context instead of the executor's main context.
We've considered that before, I think, and it's always been the path
of least resistance to fix the output functions instead --- but there
will always be another leak I'm afraid.

Such is the lot of people who code in C. I worry that the number of
memory contexts we're kicking around already is imposing a significant
distributed overhead on the system that is hard to measure but
nevertheless real, and that this will add to it.

Yeah, perhaps. I'd like to think that a MemoryContextReset is cheaper
than a bunch of retail pfree's, but it's hard to prove anything without
actually coding and testing it --- and on modern machines, effects like
cache locality could swamp pure instruction-count gains anyway.

Anyway, I committed the narrow fix for the moment.

regards, tom lane

#8Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#7)
Re: Memory leaks in record_out and record_send

On Tue, Nov 13, 2012 at 3:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Tue, Nov 13, 2012 at 12:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wonder though if we ought to think about running output functions in
a short-lived memory context instead of the executor's main context.
We've considered that before, I think, and it's always been the path
of least resistance to fix the output functions instead --- but there
will always be another leak I'm afraid.

Such is the lot of people who code in C. I worry that the number of
memory contexts we're kicking around already is imposing a significant
distributed overhead on the system that is hard to measure but
nevertheless real, and that this will add to it.

Yeah, perhaps. I'd like to think that a MemoryContextReset is cheaper
than a bunch of retail pfree's, but it's hard to prove anything without
actually coding and testing it --- and on modern machines, effects like
cache locality could swamp pure instruction-count gains anyway.

Yeah. The thing that concerns me is that I think we have a pretty
decent number of memory contexts where the expected number of
allocations is very small ... and we have the context *just in case*
we do more than that in certain instances. I've seen profiles where
the setup/teardown costs of memory contexts are significant ... which
doesn't mean that those examples would perform better with fewer
memory contexts, but it's enough to make me pause for thought.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Stephen Frost
sfrost@snowman.net
In reply to: Robert Haas (#8)
Re: Memory leaks in record_out and record_send

* Robert Haas (robertmhaas@gmail.com) wrote:

Yeah. The thing that concerns me is that I think we have a pretty
decent number of memory contexts where the expected number of
allocations is very small ... and we have the context *just in case*
we do more than that in certain instances. I've seen profiles where
the setup/teardown costs of memory contexts are significant ... which
doesn't mean that those examples would perform better with fewer
memory contexts, but it's enough to make me pause for thought.

So, for my 2c, I'm on the other side of this, personally. We have
memory contexts for more-or-less exactly this issue. It's one of the
great things about PG- it's resiliant and very unlikely to have large or
bad memory leaks in general, much of which can, imv, be attributed to
our use of memory contexts.

Thanks,

Stephen

#10Martijn van Oosterhout
kleptog@svana.org
In reply to: Stephen Frost (#9)
Re: Memory leaks in record_out and record_send

On Tue, Nov 13, 2012 at 05:50:08PM -0500, Stephen Frost wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:

Yeah. The thing that concerns me is that I think we have a pretty
decent number of memory contexts where the expected number of
allocations is very small ... and we have the context *just in case*
we do more than that in certain instances. I've seen profiles where
the setup/teardown costs of memory contexts are significant ... which
doesn't mean that those examples would perform better with fewer
memory contexts, but it's enough to make me pause for thought.

So, for my 2c, I'm on the other side of this, personally. We have
memory contexts for more-or-less exactly this issue. It's one of the
great things about PG- it's resiliant and very unlikely to have large or
bad memory leaks in general, much of which can, imv, be attributed to
our use of memory contexts.

If the problem is that we create memory context overhead which is not
necessary in many cases, perhaps we can reduce the overhead somehow.
IIRC we have a seperate function for resetting a context and freeing it
entirely. If there was a quick test we could do such that resetting a
context did nothing unless at least (say) 16k had been allocated, that
might reduce the cost for many very small allocations.

Ofcourse, unless someone comes up with a way to measure the cost this
is all handwaving, but it might a nice project for someone interested
in learning to hack postgres.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

He who writes carelessly confesses thereby at the very outset that he does
not attach much importance to his own thoughts.

-- Arthur Schopenhauer