Expand palloc/pg_malloc API
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.
Examples:
- result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult));
+ result = palloc_obj(IndexBuildResult);
- collector->tuples = (IndexTuple *) palloc(sizeof(IndexTuple) *
collector->lentuples);
+ collector->tuples = palloc_array(IndexTuple, collector->lentuples);
One common point is that the new interfaces all have a return type that
automatically matches what they are allocating, so you don't need any
casts nor have to manually make sure the size matches the expected
result. Besides the additional safety, the notation is also more
compact, as you can see above.
Inspired by the talloc library.
The interesting changes are in fe_memutils.h and palloc.h. The rest of
the patch is just randomly sprinkled examples to test/validate the new
additions.
Attachments:
0001-Expand-palloc-pg_malloc-API.patchtext/plain; charset=UTF-8; name=0001-Expand-palloc-pg_malloc-API.patchDownload+163-105
On Tue, May 17, 2022 at 5:11 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.Examples:
- result = (IndexBuildResult *) palloc(sizeof(IndexBuildResult)); + result = palloc_obj(IndexBuildResult);- collector->tuples = (IndexTuple *) palloc(sizeof(IndexTuple) * collector->lentuples); + collector->tuples = palloc_array(IndexTuple, collector->lentuples);One common point is that the new interfaces all have a return type that
automatically matches what they are allocating, so you don't need any
casts nor have to manually make sure the size matches the expected
result. Besides the additional safety, the notation is also more
compact, as you can see above.Inspired by the talloc library.
The interesting changes are in fe_memutils.h and palloc.h. The rest of
the patch is just randomly sprinkled examples to test/validate the new
additions.
It seems interesting. Are we always type-casting explicitly the output
of palloc/palloc0? Does this mean the compiler takes care of
type-casting the returned void * to the target type?
I see lots of instances where there's no explicit type-casting to the
target variable type -
retval = palloc(sizeof(GISTENTRY));
Interval *p = palloc(sizeof(Interval));
macaddr *v = palloc0(sizeof(macaddr)); and so on.
Regards,
Bharath Rupireddy.
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
On Tue, May 17, 2022 at 5:11 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.
I see lots of instances where there's no explicit type-casting to the
target variable type -
retval = palloc(sizeof(GISTENTRY));
Interval *p = palloc(sizeof(Interval));
macaddr *v = palloc0(sizeof(macaddr)); and so on.
Yeah. IMO the first of those is very poor style, because there's
basically nothing enforcing that you wrote the right thing in sizeof().
The others are a bit safer, in that at least a human can note that
the two types mentioned on the same line match --- but I doubt any
compiler would detect it if they don't. Our current preferred style
Interval *p = (Interval *) palloc(sizeof(Interval));
is really barely an improvement, because only two of the three types
mentioned are going to be checked against each other.
So I think Peter's got a good idea here (I might quibble with the details
of some of these macros). But it's not really going to move the
safety goalposts very far unless we make a concerted effort to make
these be the style everywhere. Are we willing to do that? What
will it do to back-patching difficulty? Dare I suggest back-patching
these changes?
regards, tom lane
On 17.05.22 20:43, Tom Lane wrote:
So I think Peter's got a good idea here (I might quibble with the details
of some of these macros). But it's not really going to move the
safety goalposts very far unless we make a concerted effort to make
these be the style everywhere. Are we willing to do that? What
will it do to back-patching difficulty? Dare I suggest back-patching
these changes?
I think it could go like the castNode() introduction: first we adopt it
sporadically for new code, then we change over some larger pieces of
code, then we backpatch the API, then someone sends in a big patch to
change the rest.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 17.05.22 20:43, Tom Lane wrote:
So I think Peter's got a good idea here (I might quibble with the details
of some of these macros). But it's not really going to move the
safety goalposts very far unless we make a concerted effort to make
these be the style everywhere. Are we willing to do that? What
will it do to back-patching difficulty? Dare I suggest back-patching
these changes?
I think it could go like the castNode() introduction: first we adopt it
sporadically for new code, then we change over some larger pieces of
code, then we backpatch the API, then someone sends in a big patch to
change the rest.
OK, that seems like a reasonable plan.
I've now studied this a little more closely, and I think the
functionality is fine, but I have naming quibbles.
1. Do we really want distinct names for the frontend and backend
versions of the macros? Particularly since you're layering the
frontend ones over pg_malloc, which has exit-on-OOM behavior?
I think we've found that notational discrepancies between frontend
and backend code are generally more a hindrance than a help,
so I'm inclined to drop the pg_malloc_xxx macros and just use
"palloc"-based names across the board.
2. I don't like the "palloc_ptrtype" name at all. I see that you
borrowed that name from talloc, but I doubt that's a precedent that
very many people are familiar with. To me it sounds like it might
allocate something that's the size of a pointer, not the size of the
pointed-to object. I have to confess though that I don't have an
obviously better name to suggest. "palloc_pointed_to" would be
clear perhaps, but it's kind of long.
3. Likewise, "palloc_obj" is perhaps less clear than it could be.
I find palloc_array just fine though. Maybe palloc_object or
palloc_struct? (If "_obj" can be traced to talloc, I'm not
seeing where.)
One thought that comes to mind is that palloc_ptrtype is almost
surely going to be used in the style
myvariable = palloc_ptrtype(myvariable);
and if it's not that it's very likely wrong. So maybe we should cut
out the middleman and write something like
#define palloc_instantiate(ptr) ((ptr) = (typeof(ptr)) palloc(sizeof(*(ptr))))
...
palloc_instantiate(myvariable);
I'm not wedded to "instantiate" here, there's probably better names.
regards, tom lane
On Tue, Jul 26, 2022 at 2:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
2. I don't like the "palloc_ptrtype" name at all. I see that you
borrowed that name from talloc, but I doubt that's a precedent that
very many people are familiar with.
To me it sounds like it might
allocate something that's the size of a pointer, not the size of the
pointed-to object. I have to confess though that I don't have an
obviously better name to suggest. "palloc_pointed_to" would be
clear perhaps, but it's kind of long.
I agree that ptrtype reads "the type of a pointer".
This may not be a C-idiom but the pointed-to thing is a "reference" (hence
pass by value vs pass by reference). So:
palloc_ref(myvariablepointer)
will allocate using the type of the referenced object. Just like _array
and _obj, which name the thing being used as a size template as opposed to
instantiate which seems more like another word for "allocate/palloc".
David J.
P.S.
Admittedly I'm still getting my head around reading pointer-using code (I
get the general concept but haven't had to code them)....
- lockrelid = palloc(sizeof(*lockrelid));
+ lockrelid = palloc_ptrtype(lockrelid);
// This definitely seems like an odd idiom until I remembered about
short-lived memory contexts and the lost pointers are soon destroyed there.
So lockrelid (no star) is a pointer that has an underlying reference that
the macro (and the orignal code) resolves via the *
I cannot reason out whether the following would be equivalent to the above:
lockrelid = palloc_obj(*lockrelid);
I assume not because: typeof(lockrelid) != (*lockrelid *)
On 26.07.22 23:32, Tom Lane wrote:
1. Do we really want distinct names for the frontend and backend
versions of the macros? Particularly since you're layering the
frontend ones over pg_malloc, which has exit-on-OOM behavior?
I think we've found that notational discrepancies between frontend
and backend code are generally more a hindrance than a help,
so I'm inclined to drop the pg_malloc_xxx macros and just use
"palloc"-based names across the board.
This seems like a question that is independent of this patch. Given
that both pg_malloc() and palloc() do exist in fe_memutils, I think it
would be confusing to only extend one part of that and not the other.
The amount of code is ultimately not a lot.
If we wanted to get rid of pg_malloc() altogether, maybe we could talk
about that.
(Personally, I have always been a bit suspicious about using the name
palloc() without memory context semantics in frontend code, but I guess
this is wide-spread now.)
3. Likewise, "palloc_obj" is perhaps less clear than it could be.
I find palloc_array just fine though. Maybe palloc_object or
palloc_struct? (If "_obj" can be traced to talloc, I'm not
seeing where.)
In talloc, the talloc() function itself allocates an object of a given
type. To allocate something of a specified size, you'd use
talloc_size(). So those names won't map exactly. I'm fine with
palloc_object() if that is clearer.
One thought that comes to mind is that palloc_ptrtype is almost
surely going to be used in the stylemyvariable = palloc_ptrtype(myvariable);
and if it's not that it's very likely wrong. So maybe we should cut
out the middleman and write something like#define palloc_instantiate(ptr) ((ptr) = (typeof(ptr)) palloc(sizeof(*(ptr))))
...
palloc_instantiate(myvariable);
Right, this is sort of what you'd want, really. But it looks like
strange C code, since you are modifying the variable even though you are
passing it by value.
I think the _ptrtype variant isn't that useful anyway, so if it's
confusing we can leave it out.
On 27.07.22 01:58, David G. Johnston wrote:
Admittedly I'm still getting my head around reading pointer-using code
(I get the general concept but haven't had to code them)....- lockrelid = palloc(sizeof(*lockrelid)); + lockrelid = palloc_ptrtype(lockrelid);// This definitely seems like an odd idiom until I remembered about
short-lived memory contexts and the lost pointers are soon destroyed there.So lockrelid (no star) is a pointer that has an underlying reference
that the macro (and the orignal code) resolves via the *I cannot reason out whether the following would be equivalent to the above:
lockrelid = palloc_obj(*lockrelid);
I think that would also work.
Ultimately, it would be more idiomatic (in Postgres), to write this as
lockrelid = palloc(sizeof(LockRelId));
and thus
lockrelid = palloc_obj(LockRelId);
On 12.08.22 09:31, Peter Eisentraut wrote:
In talloc, the talloc() function itself allocates an object of a given
type. To allocate something of a specified size, you'd use
talloc_size(). So those names won't map exactly. I'm fine with
palloc_object() if that is clearer.
I think the _ptrtype variant isn't that useful anyway, so if it's
confusing we can leave it out.
I have updated this patch set to rename the _obj() functions to
_object(), and I have dropped the _ptrtype() variants.
I have also split the patch to put the new API and the example uses into
separate patches.
Attachments:
v2-0001-Expand-palloc-pg_malloc-API-for-more-type-safety.patchtext/plain; charset=UTF-8; name=v2-0001-Expand-palloc-pg_malloc-API-for-more-type-safety.patchDownload+50-1
v2-0002-Assorted-examples-of-expanded-type-safer-palloc-p.patchtext/plain; charset=UTF-8; name=v2-0002-Assorted-examples-of-expanded-type-safer-palloc-p.patchDownload+83-104
On Fri, Aug 12, 2022 at 3:31 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
(Personally, I have always been a bit suspicious about using the name
palloc() without memory context semantics in frontend code, but I guess
this is wide-spread now.)
I think it would be a good thing to add memory context support to the
frontend. We could just put everything in a single context for
starters, and then frontend utilities that wanted to create other
contexts could do so.
There are difficulties, though. For instance, memory contexts are
nodes, and have a NodeTag. And I'm pretty sure we don't want frontend
code to know about all the backend node types. My suspicion is that
memory context types really shouldn't be node types, but right now,
they are. Whether that's the correct view or not, this kind of problem
means it's not a simple lift-and-shift to move the memory context code
into src/common. Someone would need to spend some time thinking about
how to engineer it.
--
Robert Haas
EDB: http://www.enterprisedb.com
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
I have updated this patch set to rename the _obj() functions to
_object(), and I have dropped the _ptrtype() variants.
I have also split the patch to put the new API and the example uses into
separate patches.
This patch set seems fine to me, so I've marked it Ready for Committer.
I think serious consideration should be given to back-patching the
0001 part (that is, addition of the macros). Otherwise we'll have
to remember not to use these macros in code intended for back-patch,
and that'll be mighty annoying once we are used to them.
regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes:
On Fri, Aug 12, 2022 at 3:31 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:(Personally, I have always been a bit suspicious about using the name
palloc() without memory context semantics in frontend code, but I guess
this is wide-spread now.)
I think it would be a good thing to add memory context support to the
frontend. We could just put everything in a single context for
starters, and then frontend utilities that wanted to create other
contexts could do so.
Perhaps, but I think we should have at least one immediate use-case
for multiple contexts in frontend. Otherwise it's just useless extra
code. The whole point of memory contexts in the backend is that we
have well-defined lifespans for certain types of allocations (executor
state, function results, etc); but it's not very clear to me that the
same concept will be helpful in any of our frontend programs.
There are difficulties, though. For instance, memory contexts are
nodes, and have a NodeTag. And I'm pretty sure we don't want frontend
code to know about all the backend node types. My suspicion is that
memory context types really shouldn't be node types, but right now,
they are. Whether that's the correct view or not, this kind of problem
means it's not a simple lift-and-shift to move the memory context code
into src/common. Someone would need to spend some time thinking about
how to engineer it.
I don't really think that's much of an issue. We could replace the
nodetag fields with some sort of magic number and have just as much
wrong-pointer safety as in the backend. What I do take issue with
is moving the code into src/common. I think we'd be better off
just writing a distinct implementation for frontend. For one thing,
it's not apparent to me that aset.c is a good allocator for frontend
(and the other two surely are not).
This is all pretty off-topic for Peter's patch, though.
regards, tom lane
On 09.09.22 22:13, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
I have updated this patch set to rename the _obj() functions to
_object(), and I have dropped the _ptrtype() variants.I have also split the patch to put the new API and the example uses into
separate patches.This patch set seems fine to me, so I've marked it Ready for Committer.
committed
I think serious consideration should be given to back-patching the
0001 part (that is, addition of the macros). Otherwise we'll have
to remember not to use these macros in code intended for back-patch,
and that'll be mighty annoying once we are used to them.
Yes, the 0001 patch is kept separate so that we can do that when we feel
the time is right.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 09.09.22 22:13, Tom Lane wrote:
I think serious consideration should be given to back-patching the
0001 part (that is, addition of the macros). Otherwise we'll have
to remember not to use these macros in code intended for back-patch,
and that'll be mighty annoying once we are used to them.
Yes, the 0001 patch is kept separate so that we can do that when we feel
the time is right.
I think the right time is now, or at least as soon as you're
satisfied that the buildfarm is happy.
regards, tom lane
On 12.09.22 15:49, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 09.09.22 22:13, Tom Lane wrote:
I think serious consideration should be given to back-patching the
0001 part (that is, addition of the macros). Otherwise we'll have
to remember not to use these macros in code intended for back-patch,
and that'll be mighty annoying once we are used to them.Yes, the 0001 patch is kept separate so that we can do that when we feel
the time is right.I think the right time is now, or at least as soon as you're
satisfied that the buildfarm is happy.
This has been done.
I have another little idea that fits well here: repalloc0 and
repalloc0_array. These zero out the space added by repalloc. This is a
common pattern in the backend code that is quite hairy to code by hand.
See attached patch.
Attachments:
0001-Add-repalloc0-and-repalloc0_array.patchtext/plain; charset=UTF-8; name=0001-Add-repalloc0-and-repalloc0_array.patchDownload+43-72
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
I have another little idea that fits well here: repalloc0 and
repalloc0_array. These zero out the space added by repalloc. This is a
common pattern in the backend code that is quite hairy to code by hand.
See attached patch.
+1 in general --- you've put your finger on something I felt was
missing, but couldn't quite identify.
However, I'm a bit bothered by the proposed API:
+extern pg_nodiscard void *repalloc0(void *pointer, Size size, Size oldsize);
It kind of feels that the argument order should be pointer, oldsize, size.
It feels even more strongly that people will get the ordering wrong,
whichever we choose. Is there a way to make that more bulletproof?
The only thought that comes to mind offhand is that the only plausible
use-case is with size >= oldsize, so maybe an assertion or even a
runtime check would help to catch getting it backwards. (I notice
that your proposed coding will fail rather catastrophically if the
caller gets it backwards. An assertion failure would be better.)
regards, tom lane
I wrote:
It kind of feels that the argument order should be pointer, oldsize, size.
It feels even more strongly that people will get the ordering wrong,
whichever we choose. Is there a way to make that more bulletproof?
Actually ... an even-more-terrifyingly-plausible misuse is that the
supplied oldsize is different from the actual previous allocation.
We should try to check that. In MEMORY_CONTEXT_CHECKING builds
it should be possible to assert that oldsize == requested_size.
We don't have that data if !MEMORY_CONTEXT_CHECKING, but we could
at least assert that oldsize <= allocated chunk size.
regards, tom lane
On 14.09.22 06:53, Tom Lane wrote:
I wrote:
It kind of feels that the argument order should be pointer, oldsize, size.
It feels even more strongly that people will get the ordering wrong,
whichever we choose. Is there a way to make that more bulletproof?Actually ... an even-more-terrifyingly-plausible misuse is that the
supplied oldsize is different from the actual previous allocation.
We should try to check that. In MEMORY_CONTEXT_CHECKING builds
it should be possible to assert that oldsize == requested_size.
We don't have that data if !MEMORY_CONTEXT_CHECKING, but we could
at least assert that oldsize <= allocated chunk size.
I'm not very familiar with MEMORY_CONTEXT_CHECKING. Where would one get
these values?
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
On 14.09.22 06:53, Tom Lane wrote:
Actually ... an even-more-terrifyingly-plausible misuse is that the
supplied oldsize is different from the actual previous allocation.
We should try to check that. In MEMORY_CONTEXT_CHECKING builds
it should be possible to assert that oldsize == requested_size.
We don't have that data if !MEMORY_CONTEXT_CHECKING, but we could
at least assert that oldsize <= allocated chunk size.
I'm not very familiar with MEMORY_CONTEXT_CHECKING. Where would one get
these values?
Hmm ... the individual allocators have that info, but mcxt.c doesn't
have access to it. I guess we could invent an additional "method"
to return the requested size of a chunk, which is only available in
MEMORY_CONTEXT_CHECKING builds, or maybe in !MEMORY_CONTEXT_CHECKING
it returns the allocated size instead.
regards, tom lane