Re: Proposal for fixing intra-query memory leaks

Started by Karel Zakover 25 years ago8 messages

zakkr@zf.jcu.cz

over 25 years ago

On Sat, 29 Apr 2000, Tom Lane wrote:

Background
----------

We already do most of our memory allocation in "memory contexts", which
are usually AllocSets as implemented by backend/utils/mmgr/aset.c.
(Is there any value in allowing for other memory context types? We could

I have "aset" variant, that allow create and work with memory context in
a shared memory.

save some cycles by getting rid of a level of indirection here.) What
we need to do is create more contexts and define proper rules about when
they can be freed.

The basic operations on a memory context are:

* create a context

A questuon, need we really any name for context? I mean an example
"CreateGlobalMemory(char *name)". Global memory routines expect that
a context name is static. It is a litle problem if contexts are created
dynamic. I suggest change it and allocate context name in context memory.

About blocks deallocation, now if context is destroyed for all blocks are
call free(). And if some context needs new block the aset.c call malloc().
What create a "BlockFreeList" with limited size (and allow set size for
this pool as new postmaster command line switch), and instead call the free()
for each block, remove a block (standard block, not a large) to this
BlockFreeList and instead the malloc() try first look at to this list?

in that context. The MemoryContextSwitchTo() operation selects a new
current context (and returns the previous context, so that the caller can
restore the previous context before exiting).

Note: there is no really good reason for pfree() to be tied to the current
memory context; it ought to be possible to pfree() a chunk of memory no
matter which context it was allocated from. Currently we cannot do that

I not sure if I understent here.

What the chunk->aset pointer in all chunks? It is pointer to a setData and
the AllocSetFree() need this pointer only. The AllocSetFree() not use full
context struct.

IMHO is very easy implement pfree() and repalloc() independent on
CurrentMemoryContext setting. The MemoryContextSwitchTo() needs palloc()
only. Or not?

because of the possibility that there is more than one kind of memory
context. If they were all AllocSets then the problem goes away, which is
one reason I'd like to eliminate the provision for other kinds of
contexts.

Additions to the memory-context mechanism
-----------------------------------------

If we are going to have more contexts, we need more mechanism for keeping
track of them; else we risk leaking whole contexts under error conditions.
We can do this as follows:

1. There will be two kinds of contexts, "permanent" and "temporary".
Permanent contexts are never reset or deleted except by explicit caller
command (in practice, they probably won't ever be, period). There will
not be very many of these --- perhaps only the existing TopMemoryContext
and CacheMemoryContext. We should avoid having very much code run with
CurrentMemoryContext pointing at a permanent context, since any forgotten
palloc() represents a permanent memory leak.

And what you mean about total persistent contexts in shared memory?
I create for the QueryCache specific top context that is stored in the
QueryCache pool (shmem) and is independent on standard TopMemoryContext.

We probably don't know what we will need in future save to shared memory.
IMHO is good create some common mechanism for shmem, that allow use current
routines (an example copyObject()) for a work with shmem.

2. Temporary contexts are remembered by the context manager and are
guaranteed to be deleted at transaction end. (If we ever have nested
transactions, we'd probably want to tie each temporary context to a
particular transaction, but for now that's not necessary.) Most activity
will happen in temporary contexts.

What is the "context manager"?

Your suggestion/planns are very interesting. I agree that more contexts
(for error, query, transactions, ..etc.) is a good idea.

Karel

Import Notes

Reply to msg id not found: 17136.957047804@sss.pgh.pa.us

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Karel Zak (#1)

Karel Zak <zakkr@zf.jcu.cz> writes:

(Is there any value in allowing for other memory context types? We could

I have "aset" variant, that allow create and work with memory context in
a shared memory.

Hmm. Well, maybe nailing things down to a single kind of context is
pushing it too far. What I mainly want is to be able to free a palloc'd
chunk without assuming that it came from the currently active context.
With AllocSets as aset.c does them, this is possible because every chunk
has a back-link to its owning context. We could still make it work with
multiple context types if we require that they all have the same kind of
back-link in the same place. Essentially, the overhead data for an
allocated chunk would have to be the same for all context types. But
we could still have different context types with different allocation
methods for each type. pfree macro would get the back-link from 8 bytes
before the given chunk address, verify that it points at something with
the right node type to be a memory context header, and then call through
a function pointer in the context header to get to the routine that
actually knows how to free a chunk of that context type.

A questuon, need we really any name for context? I mean an example
"CreateGlobalMemory(char *name)". Global memory routines expect that
a context name is static. It is a litle problem if contexts are created
dynamic. I suggest change it and allocate context name in context memory.

Good point. I don't believe that contexts have names at all at the
moment (portals do, but not contexts). But it seems like it would be
helpful for debugging to have some kind of name for each context. It
wouldn't have to be unique or anything, but when you were trying to
figure out where your memory leak is, being able to see which context
has gotten bloated would be useful...

About blocks deallocation, now if context is destroyed for all blocks are
call free(). And if some context needs new block the aset.c call malloc().
What create a "BlockFreeList" with limited size (and allow set size for
this pool as new postmaster command line switch), and instead call the free()
for each block, remove a block (standard block, not a large) to this
BlockFreeList and instead the malloc() try first look at to this list?

I think this is going in the wrong direction. We are not trying to
write a better malloc than malloc. What we are trying to do is make
allocation of small chunks very fast and make it possible to release a
whole bunch of related chunks without keeping track of them individually.
I don't believe that we need to improve on malloc at the level of
grabbing or releasing whole blocks.

IMHO is very easy implement pfree() and repalloc() independent on
CurrentMemoryContext setting. The MemoryContextSwitchTo() needs palloc()
only. Or not?

Yes, but *only if the chunk you are being asked to pfree came from an
AllocSet-style context*. Currently it could have come from a different
kind of context that isn't based on AllocSets. (I think we have some
contexts where palloc is just a direct malloc, for example.)

And what you mean about total persistent contexts in shared memory?
I create for the QueryCache specific top context that is stored in the
QueryCache pool (shmem) and is independent on standard TopMemoryContext.

QueryCache would probably be a permanent context ...

regards, tom lane

Karel Zak

zakkr@zf.jcu.cz

over 25 years ago

In reply to: Tom Lane (#2)

On Tue, 2 May 2000, Tom Lane wrote:

Karel Zak <zakkr@zf.jcu.cz> writes:

(Is there any value in allowing for other memory context types? We could

I have "aset" variant, that allow create and work with memory context in
a shared memory.

Hmm. Well, maybe nailing things down to a single kind of context is
pushing it too far. What I mainly want is to be able to free a palloc'd
chunk without assuming that it came from the currently active context.

With my QueryCache contexts it will possible too. I use same AllocSet
structs - different is only a block spring (not from malloc, but from
QueryCache shmem pool). Change chunks headers will easy if it will need.

A problem will, how select right method for free/realloc. This information
is only in context struct, but not in a chunk header or in AllocSet struct
(now).

With AllocSets as aset.c does them, this is possible because every chunk
has a back-link to its owning context. We could still make it work with
multiple context types if we require that they all have the same kind of
back-link in the same place. Essentially, the overhead data for an
allocated chunk would have to be the same for all context types. But
we could still have different context types with different allocation
methods for each type. pfree macro would get the back-link from 8 bytes
before the given chunk address, verify that it points at something with

OK. You need in chunk header chunk-size too, not the back-to-context-link
only.

the right node type to be a memory context header, and then call through
a function pointer in the context header to get to the routine that
actually knows how to free a chunk of that context type.

The idea it is very good, because if we will work with more context and
allocations types we need identify chunks.

The chunk header must be relevant and same for all allocation methods and
must be independent on AllocSel (aset.c) code. Now, it is not. Am I right?

Good point. I don't believe that contexts have names at all at the
moment (portals do, but not contexts). But it seems like it would be
helpful for debugging to have some kind of name for each context. It
wouldn't have to be unique or anything, but when you were trying to
figure out where your memory leak is, being able to see which context
has gotten bloated would be useful...

Well, but not expect that header must be static string, better will
_allocate_ it in a CreateContext() function.

Karel

Karel Zak

zakkr@zf.jcu.cz

over 25 years ago

In reply to: Karel Zak (#3)

On Tue, 2 May 2000, Tom Lane wrote:

Karel Zak <zakkr@zf.jcu.cz> writes:

The chunk header must be relevant and same for all allocation methods and
must be independent on AllocSel (aset.c) code. Now, it is not. Am I right?

Right. Actually it would work to have additional data before the
standard link-to-context-plus-chunk-size fields, if a particular method
really needed more per-chunk data. But I doubt we'd ever need that.

The chunk:

  +--------------------+----------------------+-------------------+
  | depend-header      | standard-header      | data............. | 
  +--------------------+----------------------+-------------------+
  ^                    ^                      ^
  |                    |                      |
  |                    |              pointer to chunk 
  |                    |
  |          begin of standard chunk header                          
  |
 particular method header, begin of 
 this header is unknown for common routines.

IMHO in standard chunk header not must be chunk size. If something needs
chunk size it cat use depend-header.

It will very nice :-)

Karel

Import Notes

Reply to msg id not found: 5011.957302769@sss.pgh.pa.us | Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Karel Zak (#3)

Karel Zak <zakkr@zf.jcu.cz> writes:

The chunk header must be relevant and same for all allocation methods and
must be independent on AllocSel (aset.c) code. Now, it is not. Am I right?

Right. Actually it would work to have additional data before the
standard link-to-context-plus-chunk-size fields, if a particular method
really needed more per-chunk data. But I doubt we'd ever need that.

And, as you saw, the links to the pfree and prealloc routines would need
to be present in standard places in the context header.

regards, tom lane

Peter Eisentraut

peter_e@gmx.net

over 25 years ago

In reply to: Tom Lane (#5)

Thanks for the timely introduction to memory contexts btw. :)

1. There will be two kinds of contexts, "permanent" and "temporary".

Rather than making this an explicit distinction you could simply (hah)
make the transaction initiating command create a memory context under Top
and store a pointer to it in a global variable. Then per tuple contexts,
etc. are created as a child thereof. The transaction ending commands would
then destroy that context again. I guess this is sort of what you were
planning anyway but I just wanted to throw in that specially treating this
"everything lives and dies with the transaction" attitude is not the
be-all-end-all, IMHO. ("Statement" would be better than "transaction" in
many cases anyway.)

Functions that return pass-by-reference values will be required always
to palloc the returned space in the caller's memory context (ie, the
context that was CurrentMemoryContext at the time of call). It is not
OK to pass back an input pointer, even if we are returning an input value
verbatim, because we do not know the lifespan of the context the input
pointer points to.

ISTM that you can have the compiler help you here if you separate input
and output values in the function manager design. E.g., if you define the
function signature like

void my_func(const fmgr_in_t * in, fmgr_out_t * out);

then you establish the fact that copying is required.

--
Peter Eisentraut Sernanders vï¿½g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

Import Notes

Reply to msg id not found: 17136.957047804@sss.pgh.pa.us | Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Peter Eisentraut (#6)

Peter Eisentraut <peter_e@gmx.net> writes:

1. There will be two kinds of contexts, "permanent" and "temporary".

Rather than making this an explicit distinction you could simply (hah)
make the transaction initiating command create a memory context under Top
and store a pointer to it in a global variable. Then per tuple contexts,
etc. are created as a child thereof. The transaction ending commands would
then destroy that context again. I guess this is sort of what you were
planning anyway

Yes, it was. You're right that the lifespan of a context will be
determined by usage; there's not any real distinction between permanent
and temporary contexts as far as the mechanism goes. I thought it would
be clearer to describe it that way, but maybe not.

Also, yes, the topmost contexts in the tree of contexts will be
referenced via pointers in global variables. An alternative approach
would be to provide some kind of lookup-by-name facility (like portals
have) but I think that'd likely be overkill. We've gotten along fine
with pointers for TopMemoryContext, CacheMemoryContext, etc, so it seems
that's good enough.

Functions that return pass-by-reference values will be required always
to palloc the returned space in the caller's memory context (ie, the
context that was CurrentMemoryContext at the time of call). It is not
OK to pass back an input pointer, even if we are returning an input value
verbatim, because we do not know the lifespan of the context the input
pointer points to.

ISTM that you can have the compiler help you here if you separate input
and output values in the function manager design. E.g., if you define the
function signature like
void my_func(const fmgr_in_t * in, fmgr_out_t * out);
then you establish the fact that copying is required.

Given the amount of casting that we do between Datum and other types,
I don't think that const decorations would actually help much :-(.
On most compilers you can cast away const without noticing.

Jan is of the opinion that this you-must-palloc-your-result rule is
wrongheaded anyway, so the issue may go away if he persuades me...

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

over 25 years ago

In reply to: Karel Zak (#4)

Karel Zak <zakkr@zf.jcu.cz> writes:

IMHO in standard chunk header not must be chunk size. If something needs
chunk size it cat use depend-header.

We might as well put the size in the standard header, though, because
(a) all context management methods are going to need it (how you gonna
do repalloc otherwise?) and (b) there are alignment considerations.
The size of the headers has to be a multiple of MAXALIGN which is 8
on quite a few architectures. If you just make each of the
standard-header and depend-header a multiple of MAXALIGN then you end
up wasting 8 bytes per alloc chunk on these machines. You could get
around that with some notational ugliness (ie, standard header is not
declared as part of depend-header but you compute the pointer to it
separately). But I don't see the value of working that hard to keep
the size out of the standard header.

Also, since aset.c is going to be our standard memory allocator for
the foreseeable future, there's no good reason to make its life more
difficult by having to work with both a standard-header and a
depend-header. If it can get along with only a standard-header,
why not let it do so?

regards, tom lane