A stack allocation API
Hi,
In the locale code we often use a 1KB array for copies of strings
where we need a NUL-terminated or transcoded version to give a library
function, with a fallback to palloc() + pfree() if we need more space
than that, but:
* we open code it repeatedly
* we often have two allocations but won't use the stack if we can't fit both
* we don't use it in nearby places that are obviously similar,
probably because it's a bit tedious to repeat
* in the past we've forgotten to pfree() large allocations and had to fix leaks
* it's not very type-safe
* we don't seem to consider alignment for non-char types, eg UChar,
wchar_t (apparently ASAN has never complained about that and I think I
see why it's always OK as written, but I suspect that might be UB)
In the attached, I tried to tidy that up with an interface that lets you write:
DECLARE_STACK_BUFFER();
p = stack_buffer_alloc(n);
...
stack_buffer_free(p);
The point of the _free() call is that it might need to call pfree() if
it was a large allocation and not from the stack.
Or slightly higher level and supporting the most common use cases with
a one-liner:
cstr1 = stack_buffer_strdup_with_len(str1, len1);
cstr2 = stack_buffer_strdup_with_len(str2, len2);
result = strcoll_l(cstr1, cstr2, locale);
stack_buffer_free(cstr1);
stack_buffer_free(cstr2);
Or for non-char cases without casts or pointer/size arithmetic, in the
style of recent palloc() variants:
wcstr = stack_buffer_alloc_array(wchar_t, len);
uchar = stack_buffer_alloc_array(UChar, len);
Better names/ideas welcome.
I also wondered if we might have a reasonable case for using alloca(),
where available. It's pretty much the thing we are emulating, but
keeps the stack nice and compact without big holes to step over for
the following call to strcoll_l() or whatever it might be. Though
it's non-standard and often discouraged due to the inherent danger of
overflow, our usage is metered. I don't see why it's any more
dangerous than the existing code as long as our cap is applied to it,
or am I missing some other problem with that idea? One issue with
USE_ALLOCA is that we have no systems where that wouldn't be used, so
the fallback code would be untested unless you comment the #define
out...
Attachments:
v1-0001-Provide-stack-allocation-API.patchapplication/octet-stream; name=v1-0001-Provide-stack-allocation-API.patchDownload+226-145
Thomas Munro <thomas.munro@gmail.com> writes:
In the locale code we often use a 1KB array for copies of strings
where we need a NUL-terminated or transcoded version to give a library
function, with a fallback to palloc() + pfree() if we need more space
than that, but:
Yeah, I think there are some other use-cases too.
I also wondered if we might have a reasonable case for using alloca(),
where available. It's pretty much the thing we are emulating, but
keeps the stack nice and compact without big holes to step over for
the following call to strcoll_l() or whatever it might be.
+1 for investigating alloca(). The one disadvantage I can see
to making this coding pattern more common is that it'll result in
increased stack usage, which is not great now and will become
considerably less great in our hypothetical multithreaded future.
If we can fix it so the typical stack consumption is a good deal
less than 1KB, that worry would be alleviated.
regards, tom lane
Hi,
On 2026-02-27 10:35:39 -0500, Tom Lane wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
I also wondered if we might have a reasonable case for using alloca(),
where available. It's pretty much the thing we are emulating, but
keeps the stack nice and compact without big holes to step over for
the following call to strcoll_l() or whatever it might be.+1 for investigating alloca(). The one disadvantage I can see
to making this coding pattern more common is that it'll result in
increased stack usage, which is not great now and will become
considerably less great in our hypothetical multithreaded future.
Yea, that's what I immediately was thinking about too. IIRC, on linux, the
stack for the "main" thread is allocated on-demand, but the stack for threads
is mapped entirely upon creation (I think because it'd be hard to ensure
there's space for the stack otherwise). So there's more benefit in keeping the
stack small-ish with threads than there is in a process based model.
That said, I've thought about accellerating a few things with an
'on-stack-if-small-palloc-otherwise' approach as well. Particularly things
like small StringInfos could really benefit from it - but it'd be a nontrivial
conversion, due code calling pfree on the memory. I guess we could introduce
a memory context that'd do nothing for pfree(), which could be used when using
the stack version, but IDK, that seems mighty ugly.
However, I'm pretty unconvinced of this argument
in the past we've forgotten to pfree() large allocations and had to fix leaks
because we'll continue to rely on calling something to free anyway (due to
large objects) and using a different path for smaller objects just will make
it harder to find those.
I wish msvc implemented something akin to gcc/clang's
attribute(cleanup(cleanup_function)), but it doesn't look like it
does. Obviously it would if we were to compile with C++, but I don't think
anybody has appetite for the work it'd need to get there.
Greetings,
Andres Freund