Add bump memory context type and use it for tuplesorts

Started by David Rowleyalmost 3 years ago58 messageshackers
Jump to latest
#1David Rowley
dgrowleyml@gmail.com

Background:

The idea with 40af10b57 (Use Generation memory contexts to store
tuples in sorts) was to reduce the memory wastage in tuplesort.c
caused by aset.c's power-of-2 rounding up behaviour and allow more
tuples to be stored per work_mem in tuplesort.c

Later, in (v16's) c6e0fe1f2a (Improve performance of and reduce
overheads of memory management) that commit reduced the palloc chunk
header overhead down to 8 bytes. For generation.c contexts (as is now
used by non-bounded tuplesorts as of 40af10b57), the overhead was 24
bytes. So this allowed even more tuples to be stored in a work_mem by
reducing the chunk overheads for non-bounded tuplesorts by 2/3rds down
to 16 bytes.

1083f94da (Be smarter about freeing tuples during tuplesorts) removed
the useless pfree() calls from tuplesort.c which pfree'd the tuples
just before we reset the context. So, as of now, we never pfree()
memory allocated to store tuples in non-bounded tuplesorts.

My thoughts are, if we never pfree tuples in tuplesorts, then why
bother having a chunk header at all?

Proposal:

Because of all of what is mentioned above about the current state of
tuplesort, there does not really seem to be much need to have chunk
headers in memory we allocate for tuples at all. Not having these
saves us a further 8 bytes per tuple.

In the attached patch, I've added a bump memory allocator which
allocates chunks without and chunk header. This means the chunks
cannot be pfree'd or realloc'd. That seems fine for the use case for
storing tuples in tuplesort. I've coded bump.c in such a way that when
built with MEMORY_CONTEXT_CHECKING, we *do* have chunk headers. That
should allow us to pick up any bugs that are introduced by any code
which accidentally tries to pfree a bump.c chunk.

I'd expect a bump.c context only to be used for fairly short-lived and
memory that's only used by a small amount of code (e.g. only accessed
from a single .c file, like tuplesort.c). That should reduce the risk
of any code accessing the memory which might be tempted into calling
pfree or some other unsupported function.

Getting away from the complexity of freelists (aset.c) and tracking
allocations per block (generation.c) allows much better allocation
performance. All we need to do is check there's enough space then
bump the free pointer when performing an allocation. See the attached
time_to_allocate_10gbs_memory.png to see how bump.c compares to aset.c
and generation.c to allocate 10GBs of memory resetting the context
after 1MB. It's not far off twice as fast in raw allocation speed.
(See 3rd tab in the attached spreadsheet)

Performance:

In terms of the speed of palloc(), the performance tested on an AMD
3990x CPU on Linux for 8-byte chunks:

aset.c 9.19 seconds
generation.c 8.68 seconds
bump.c 4.95 seconds

These numbers were generated by calling:
select stype,chksz,pg_allocate_memory_test_reset(chksz,1024*1024,10::bigint*1024*1024*1024,stype)
from (values(8),(16),(32),(64),(128)) t1(chksz) cross join
(values('aset'),('generation'),('bump')) t2(stype) order by
stype,chksz;

This allocates a total of 10GBs of chunks but calls a context reset
after 1MB so as not to let the memory usage get out of hand. The
function is in the attached membench.patch.txt file.

In terms of performance of tuplesort, there's a small (~5-6%)
performance gain. Not as much as I'd hoped, but I'm also doing a bit
of other work on tuplesort to make it more efficient in terms of CPU,
so I suspect the cache efficiency improvements might be more
pronounced after those.
Please see the attached bump_context_tuplesort_2023-06-27.ods for my
complete benchmark.

One thing that might need more thought is that we're running a bit low
on MemoryContextMethodIDs. I had to use an empty slot that has a bit
pattern like glibc malloc'd chunks sized 128kB. Maybe it's worth
freeing up a bit from the block offset in MemoryChunk. This is
currently 30 bits allowing 1GB offset, but these offsets are always
MAXALIGNED, so we could free up a couple of bits since those 2
lowest-order bits will always be 0 anyway.

I've attached the bump allocator patch and also the script I used to
gather the performance results in the first 2 tabs in the attached
spreadsheet.

David

Attachments:

membench.patch.txttext/plain; charset=US-ASCII; name=membench.patch.txtDownload+229-0
time_to_allocate_10gbs_memory.pngimage/png; name=time_to_allocate_10gbs_memory.pngDownload
bump_context_tuplesort_2023-06-27.odsapplication/vnd.oasis.opendocument.spreadsheet; name=bump_context_tuplesort_2023-06-27.odsDownload+2-3
bump_allocator.patchtext/plain; charset=US-ASCII; name=bump_allocator.patchDownload+849-35
sortbenchall.sh.txttext/plain; charset=US-ASCII; name=sortbenchall.sh.txtDownload
#2David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#1)
Re: Add bump memory context type and use it for tuplesorts

On Tue, 27 Jun 2023 at 21:19, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached the bump allocator patch and also the script I used to
gather the performance results in the first 2 tabs in the attached
spreadsheet.

I've attached a v2 patch which changes the BumpContext a little to
remove some of the fields that are not really required. There was no
need for the "keeper" field as the keeper block always comes at the
end of the BumpContext as these are allocated in a single malloc().
The pointer to the "block" also isn't really needed. This is always
the same as the head element in the blocks dlist.

David

Attachments:

bump_allocator_v2.patchtext/plain; charset=US-ASCII; name=bump_allocator_v2.patchDownload+845-35
#3Nathan Bossart
nathandbossart@gmail.com
In reply to: David Rowley (#1)
Re: Add bump memory context type and use it for tuplesorts

On Tue, Jun 27, 2023 at 09:19:26PM +1200, David Rowley wrote:

Because of all of what is mentioned above about the current state of
tuplesort, there does not really seem to be much need to have chunk
headers in memory we allocate for tuples at all. Not having these
saves us a further 8 bytes per tuple.

In the attached patch, I've added a bump memory allocator which
allocates chunks without and chunk header. This means the chunks
cannot be pfree'd or realloc'd. That seems fine for the use case for
storing tuples in tuplesort. I've coded bump.c in such a way that when
built with MEMORY_CONTEXT_CHECKING, we *do* have chunk headers. That
should allow us to pick up any bugs that are introduced by any code
which accidentally tries to pfree a bump.c chunk.

This is a neat idea.

In terms of performance of tuplesort, there's a small (~5-6%)
performance gain. Not as much as I'd hoped, but I'm also doing a bit
of other work on tuplesort to make it more efficient in terms of CPU,
so I suspect the cache efficiency improvements might be more
pronounced after those.

Nice.

One thing that might need more thought is that we're running a bit low
on MemoryContextMethodIDs. I had to use an empty slot that has a bit
pattern like glibc malloc'd chunks sized 128kB. Maybe it's worth
freeing up a bit from the block offset in MemoryChunk. This is
currently 30 bits allowing 1GB offset, but these offsets are always
MAXALIGNED, so we could free up a couple of bits since those 2
lowest-order bits will always be 0 anyway.

I think it'd be okay to steal those bits. AFAICT it'd complicate the
macros in memutils_memorychunk.h a bit, but that doesn't seem like such a
terrible price to pay to allow us to keep avoiding the glibc bit patterns.

+	if (base->sortopt & TUPLESORT_ALLOWBOUNDED)
+		tuplen = GetMemoryChunkSpace(tuple);
+	else
+		tuplen = MAXALIGN(tuple->t_len);

nitpick: I see this repeated in a few places, and I wonder if it might
deserve a comment.

I haven't had a chance to try out your benchmark, but I'm hoping to do so
in the near future.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#4Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: David Rowley (#2)
Re: Add bump memory context type and use it for tuplesorts

On Tue, 11 Jul 2023 at 01:51, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 27 Jun 2023 at 21:19, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached the bump allocator patch and also the script I used to
gather the performance results in the first 2 tabs in the attached
spreadsheet.

I've attached a v2 patch which changes the BumpContext a little to
remove some of the fields that are not really required. There was no
need for the "keeper" field as the keeper block always comes at the
end of the BumpContext as these are allocated in a single malloc().
The pointer to the "block" also isn't really needed. This is always
the same as the head element in the blocks dlist.

Neat idea, +1.

I think it would make sense to split the "add a bump allocator"
changes from the "use the bump allocator in tuplesort" patches.

Tangent: Do we have specific notes on worst-case memory usage of
memory contexts with various allocation patterns? This new bump
allocator seems to be quite efficient, but in a worst-case allocation
pattern it can still waste about 1/3 of its allocated memory due to
never using free space on previous blocks after an allocation didn't
fit on that block.
It probably isn't going to be a huge problem in general, but this
seems like something that could be documented as a potential problem
when you're looking for which allocator to use and compare it with
other allocators that handle different allocation sizes more
gracefully.

+++ b/src/backend/utils/mmgr/bump.c
+BumpBlockIsEmpty(BumpBlock *block)
+{
+    /* it's empty if the freeptr has not moved */
+    return (block->freeptr == (char *) block + Bump_BLOCKHDRSZ);
[...]
+static inline void
+BumpBlockMarkEmpty(BumpBlock *block)
+{
+#if defined(USE_VALGRIND) || defined(CLOBBER_FREED_MEMORY)
+    char       *datastart = ((char *) block) + Bump_BLOCKHDRSZ;

These two use different definitions of the start pointer. Is that deliberate?

+++ b/src/include/utils/tuplesort.h
@@ -109,7 +109,8 @@ typedef struct TuplesortInstrumentation
* a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
* which is a separate palloc chunk --- we assume it is just one chunk and
* can be freed by a simple pfree() (except during merge, when we use a
- * simple slab allocator).  SortTuples also contain the tuple's first key
+ * simple slab allocator and when performing a non-bounded sort where we
+ * use a bump allocator).  SortTuples also contain the tuple's first key

I'd go with something like the following:

+ * ...(except during merge *where* we use a
+ * simple slab allocator, and during a non-bounded sort where we
+ * use a bump allocator).

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#5Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Matthias van de Meent (#4)
Re: Add bump memory context type and use it for tuplesorts

On Mon, 6 Nov 2023 at 19:54, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Tue, 11 Jul 2023 at 01:51, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 27 Jun 2023 at 21:19, David Rowley <dgrowleyml@gmail.com> wrote:

I've attached the bump allocator patch and also the script I used to
gather the performance results in the first 2 tabs in the attached
spreadsheet.

I've attached a v2 patch which changes the BumpContext a little to
remove some of the fields that are not really required. There was no
need for the "keeper" field as the keeper block always comes at the
end of the BumpContext as these are allocated in a single malloc().
The pointer to the "block" also isn't really needed. This is always
the same as the head element in the blocks dlist.

+++ b/src/backend/utils/mmgr/bump.c
[...]
+MemoryContext
+BumpContextCreate(MemoryContext parent,
+                  const char *name,
+                  Size minContextSize,
+                  Size initBlockSize,
+                  Size maxBlockSize)
[...]
+    /* Determine size of initial block */
+    allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
+    if (minContextSize != 0)
+        allocSize = Max(allocSize, minContextSize);
+    else
+        allocSize = Max(allocSize, initBlockSize);

Shouldn't this be the following, considering the meaning of "initBlockSize"?

+    allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
+        Bump_CHUNKHDRSZ + initBlockSize;
+    if (minContextSize != 0)
+        allocSize = Max(allocSize, minContextSize);
+ * BumpFree
+ *        Unsupported.
[...]
+ * BumpRealloc
+ *        Unsupported.

Rather than the error, can't we make this a no-op (potentially
optionally, or in a different memory context?)

What I mean is, I get that it is an easy validator check that the code
that uses this context doesn't accidentally leak memory through
assumptions about pfree, but this does make this memory context
unusable for more generic operations where leaking a little memory is
preferred over the overhead of other memory contexts, as
MemoryContextReset is quite cheap in the grand scheme of things.

E.g. using a bump context in btree descent could speed up queries when
we use compare operators that do allocate memory (e.g. numeric, text),
because btree operators must not leak memory and thus always have to
manually keep track of all allocations, which can be expensive.

I understand that allowing pfree/repalloc in bump contexts requires
each allocation to have a MemoryChunk prefix in overhead, but I think
it's still a valid use case to have a very low overhead allocator with
no-op deallocator (except context reset). Do you have performance
comparison results between with and without the overhead of
MemoryChunk?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#6Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Matthias van de Meent (#5)
Re: Add bump memory context type and use it for tuplesorts

Hi,

I wanted to take a look at this patch but it seems to need a rebase,
because of a seemingly trivial conflict in MemoryContextMethodID:

--- src/include/utils/memutils_internal.h
+++ src/include/utils/memutils_internal.h
@@ -123,12 +140,13 @@ typedef enum MemoryContextMethodID
 {
   MCTX_UNUSED1_ID,      /* 000 occurs in never-used memory */
   MCTX_UNUSED2_ID,      /* glibc malloc'd chunks usually match 001 */
-  MCTX_UNUSED3_ID,      /* glibc malloc'd chunks > 128kB match 010 */
+  MCTX_BUMP_ID,        /* glibc malloc'd chunks > 128kB match 010
+                 * XXX? */
   MCTX_ASET_ID,
   MCTX_GENERATION_ID,
   MCTX_SLAB_ID,
   MCTX_ALIGNED_REDIRECT_ID,
-  MCTX_UNUSED4_ID        /* 111 occurs in wipe_mem'd memory */
+  MCTX_UNUSED3_ID        /* 111 occurs in wipe_mem'd memory */
 } MemoryContextMethodID;

I wasn't paying much attention to these memcontext reworks in 2022, so
my instinct was simply to use one of those "UNUSED" IDs. But after
looking at the 80ef92675823 a bit more, are those IDs really unused? I
mean, we're relying on those values to detect bogus pointers, right? So
if we instead start using those values for a new memory context, won't
we lose the ability to detect those issues?

Maybe I'm completely misunderstanding the implication of those limits,
but doesn't this mean the claim that we can support 8 memory context
types is not quite true, and the limit is 4, because the 4 IDs are
already used for malloc stuff?

One thing that confuses me a bit is that the comments introduced by
80ef92675823 talk about glibc, but the related discussion in [1]/messages/by-id/2910981.1665080361@sss.pgh.pa.us talks a
lot about FreeBSD, NetBSD, ... which don't actually use glibc (AFAIK).
So how portable are those unused IDs, actually?

Or am I just too caffeine-deprived and missing something obvious?

regards

[1]: /messages/by-id/2910981.1665080361@sss.pgh.pa.us

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Matthias van de Meent (#4)
Re: Add bump memory context type and use it for tuplesorts

On 11/6/23 19:54, Matthias van de Meent wrote:

...

Tangent: Do we have specific notes on worst-case memory usage of
memory contexts with various allocation patterns? This new bump
allocator seems to be quite efficient, but in a worst-case allocation
pattern it can still waste about 1/3 of its allocated memory due to
never using free space on previous blocks after an allocation didn't
fit on that block.
It probably isn't going to be a huge problem in general, but this
seems like something that could be documented as a potential problem
when you're looking for which allocator to use and compare it with
other allocators that handle different allocation sizes more
gracefully.

I don't think it's documented anywhere, but I agree it might be an
interesting piece of information. It probably did not matter too much
when we had just AllocSet, but now we have 3 very different allocators,
so maybe we should explain this.

When implementing these allocators, it didn't feel that important,
because the new allocators started as intended for a very specific part
of the code (as in "This piece of code has a very unique allocation
pattern, let's develop a custom allocator for it."), but if we feel we
want to make it simpler to use the allocators elsewhere ...

I think there are two obvious places where to document this - either in
the header of each memory context .c file, or a README in the mmgr
directory. Or some combination of it.

At some point I was thinking about writing a "proper paper" comparing
these allocators in a more scientific / thorough way, but I never got to
do it. I wonder if that'd be interesting for enough people.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#6)
Re: Add bump memory context type and use it for tuplesorts

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

Maybe I'm completely misunderstanding the implication of those limits,
but doesn't this mean the claim that we can support 8 memory context
types is not quite true, and the limit is 4, because the 4 IDs are
already used for malloc stuff?

Well, correct code would still work, but we will take a hit in our
ability to detect bogus chunk pointers if we convert any of the four
remaining bit-patterns to valid IDs. That has costs for debugging.
The particular bit patterns we left unused were calculated to make it
likely that we could detect a malloced-instead-of-palloced chunk (at
least with glibc); but in general, reducing the number of invalid
patterns makes it more likely that a just-plain-bad pointer would
escape detection.

I am less concerned about that than I was in 2022, because people
have already had some time to flush out bugs associated with the
GUC malloc->palloc conversion. Still, maybe we should think harder
about whether we can free up another ID bit before we go eating
more ID types. It's not apparent to me that the "bump context"
idea is valuable enough to foreclose ever adding more context types,
yet it will be pretty close to doing that if we commit it as-is.

If we do kick this can down the road, then I concur with eating 010
next, as it seems the least likely to occur in glibc-malloced
chunks.

One thing that confuses me a bit is that the comments introduced by
80ef92675823 talk about glibc, but the related discussion in [1] talks a
lot about FreeBSD, NetBSD, ... which don't actually use glibc (AFAIK).

The conclusion was that the specific invalid values didn't matter as
much on the other platforms as they do with glibc. But right now you
have a fifty-fifty chance that a pointer to garbage will look valid.
Do we want to increase those odds?

regards, tom lane

#9Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tom Lane (#8)
Re: Add bump memory context type and use it for tuplesorts

On 2/17/24 00:14, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

Maybe I'm completely misunderstanding the implication of those limits,
but doesn't this mean the claim that we can support 8 memory context
types is not quite true, and the limit is 4, because the 4 IDs are
already used for malloc stuff?

Well, correct code would still work, but we will take a hit in our
ability to detect bogus chunk pointers if we convert any of the four
remaining bit-patterns to valid IDs. That has costs for debugging.
The particular bit patterns we left unused were calculated to make it
likely that we could detect a malloced-instead-of-palloced chunk (at
least with glibc); but in general, reducing the number of invalid
patterns makes it more likely that a just-plain-bad pointer would
escape detection.

I am less concerned about that than I was in 2022, because people
have already had some time to flush out bugs associated with the
GUC malloc->palloc conversion. Still, maybe we should think harder
about whether we can free up another ID bit before we go eating
more ID types. It's not apparent to me that the "bump context"
idea is valuable enough to foreclose ever adding more context types,
yet it will be pretty close to doing that if we commit it as-is.

If we do kick this can down the road, then I concur with eating 010
next, as it seems the least likely to occur in glibc-malloced
chunks.

I don't know if the bump context for tuplesorts alone is worth it, but
I've been thinking it's not the only place doing something like that.
I'm aware of two other places doing this "dense allocation" - spell.c
and nodeHash.c. And in those cases it definitely made a big difference
(ofc, the question is how big the difference would be now, with all the
palloc improvements).

But maybe we could switch all those places to a proper memcontext
(instead of something built on top of a memory context) ... Of course,
the code in spell.c/nodeHash.c is quite stable, so the custom code does
not cost much.

One thing that confuses me a bit is that the comments introduced by
80ef92675823 talk about glibc, but the related discussion in [1] talks a
lot about FreeBSD, NetBSD, ... which don't actually use glibc (AFAIK).

The conclusion was that the specific invalid values didn't matter as
much on the other platforms as they do with glibc. But right now you
have a fifty-fifty chance that a pointer to garbage will look valid.
Do we want to increase those odds?

Not sure. The ability to detect bogus pointers seems valuable, but is
the difference between 4/8 and 3/8 really qualitatively different? If it
is, maybe we should try to increase it by simply adding a bit.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#9)
Re: Add bump memory context type and use it for tuplesorts

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 2/17/24 00:14, Tom Lane wrote:

The conclusion was that the specific invalid values didn't matter as
much on the other platforms as they do with glibc. But right now you
have a fifty-fifty chance that a pointer to garbage will look valid.
Do we want to increase those odds?

Not sure. The ability to detect bogus pointers seems valuable, but is
the difference between 4/8 and 3/8 really qualitatively different? If it
is, maybe we should try to increase it by simply adding a bit.

I think it'd be worth taking a fresh look at the bit allocation in the
header word to see if we can squeeze another bit without too much
pain. There's basically no remaining headroom in the current design,
and it starts to seem like we want some. (I'm also wondering whether
the palloc_aligned stuff should have been done some other way than
by consuming a context type ID.)

regards, tom lane

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#10)
Re: Add bump memory context type and use it for tuplesorts

Hi,

On 2024-02-16 20:10:48 -0500, Tom Lane wrote:

Tomas Vondra <tomas.vondra@enterprisedb.com> writes:

On 2/17/24 00:14, Tom Lane wrote:

The conclusion was that the specific invalid values didn't matter as
much on the other platforms as they do with glibc. But right now you
have a fifty-fifty chance that a pointer to garbage will look valid.
Do we want to increase those odds?

Not sure. The ability to detect bogus pointers seems valuable, but is
the difference between 4/8 and 3/8 really qualitatively different? If it
is, maybe we should try to increase it by simply adding a bit.

I think it'd be worth taking a fresh look at the bit allocation in the
header word to see if we can squeeze another bit without too much
pain. There's basically no remaining headroom in the current design,
and it starts to seem like we want some.

I think we could fairly easily "move" some bits around, by restricting the
maximum size of a non-external chunk (i.e. allocations coming out of a larger
block, not a separate allocation). Right now we reserve 30 bits for the offset
from the block header to the allocation.

It seems unlikely that it's ever worth having an undivided 1GB block. Even if
we wanted something that large - say because we want to use 1GB huge pages to
back the block - we could just add a few block headers ever couple hundred
MBs.

Another avenue is that presumably the chunk<->block header offset always has
at least the two lower bits set to zero, so perhaps we could just shift
blockoffset right by two bits in MemoryChunkSetHdrMask() and left in
MemoryChunkGetBlock()?

(I'm also wondering whether the palloc_aligned stuff should have been done
some other way than by consuming a context type ID.)

Possibly, I just don't quite know how.

Greetings,

Andres Freund

#12David Rowley
dgrowleyml@gmail.com
In reply to: Matthias van de Meent (#4)
Re: Add bump memory context type and use it for tuplesorts

Thanks for having a look at this.

On Tue, 7 Nov 2023 at 07:55, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

I think it would make sense to split the "add a bump allocator"
changes from the "use the bump allocator in tuplesort" patches.

I've done this and will post updated patches after replying to the
other comments.

Tangent: Do we have specific notes on worst-case memory usage of
memory contexts with various allocation patterns? This new bump
allocator seems to be quite efficient, but in a worst-case allocation
pattern it can still waste about 1/3 of its allocated memory due to
never using free space on previous blocks after an allocation didn't
fit on that block.
It probably isn't going to be a huge problem in general, but this
seems like something that could be documented as a potential problem
when you're looking for which allocator to use and compare it with
other allocators that handle different allocation sizes more
gracefully.

It might be a good idea to document this. The more memory allocator
types we add, the harder it is to decide which one to use when writing
new code.

+++ b/src/backend/utils/mmgr/bump.c
+BumpBlockIsEmpty(BumpBlock *block)
+{
+    /* it's empty if the freeptr has not moved */
+    return (block->freeptr == (char *) block + Bump_BLOCKHDRSZ);
[...]
+static inline void
+BumpBlockMarkEmpty(BumpBlock *block)
+{
+#if defined(USE_VALGRIND) || defined(CLOBBER_FREED_MEMORY)
+    char       *datastart = ((char *) block) + Bump_BLOCKHDRSZ;

These two use different definitions of the start pointer. Is that deliberate?

hmm, I'm not sure if I follow what you mean. Are you talking about
the "datastart" variable and the assignment of block->freeptr (which
you've not quoted?)

+++ b/src/include/utils/tuplesort.h
@@ -109,7 +109,8 @@ typedef struct TuplesortInstrumentation
* a pointer to the tuple proper (might be a MinimalTuple or IndexTuple),
* which is a separate palloc chunk --- we assume it is just one chunk and
* can be freed by a simple pfree() (except during merge, when we use a
- * simple slab allocator).  SortTuples also contain the tuple's first key
+ * simple slab allocator and when performing a non-bounded sort where we
+ * use a bump allocator).  SortTuples also contain the tuple's first key

I'd go with something like the following:

+ * ...(except during merge *where* we use a
+ * simple slab allocator, and during a non-bounded sort where we
+ * use a bump allocator).

Adjusted.

#13David Rowley
dgrowleyml@gmail.com
In reply to: Nathan Bossart (#3)
Re: Add bump memory context type and use it for tuplesorts

On Wed, 26 Jul 2023 at 12:11, Nathan Bossart <nathandbossart@gmail.com> wrote:

I think it'd be okay to steal those bits. AFAICT it'd complicate the
macros in memutils_memorychunk.h a bit, but that doesn't seem like such a
terrible price to pay to allow us to keep avoiding the glibc bit patterns.

I've not adjusted anything here and I've kept the patch using the

128KB glibc bit pattern. I think it was a good idea to make our

lives easier if someone came to us with a bug report, but it's not
like the reserved patterns are guaranteed to cover all malloc
implementations. What's there is just to cover the likely candidates.
I'd like to avoid adding any bit shift instructions in the code that
decodes the hdrmask.

+     if (base->sortopt & TUPLESORT_ALLOWBOUNDED)
+             tuplen = GetMemoryChunkSpace(tuple);
+     else
+             tuplen = MAXALIGN(tuple->t_len);

nitpick: I see this repeated in a few places, and I wonder if it might
deserve a comment.

I ended up adding a macro and a comment in each location that does this.

I haven't had a chance to try out your benchmark, but I'm hoping to do so
in the near future.

Great. It would be good to get a 2nd opinion.

David

#14David Rowley
dgrowleyml@gmail.com
In reply to: Matthias van de Meent (#5)
Re: Add bump memory context type and use it for tuplesorts

On Fri, 26 Jan 2024 at 01:29, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

+    allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
+    if (minContextSize != 0)
+        allocSize = Max(allocSize, minContextSize);
+    else
+        allocSize = Max(allocSize, initBlockSize);

Shouldn't this be the following, considering the meaning of "initBlockSize"?

No, we want to make the blocksize exactly initBlockSize if we can. Not
initBlockSize plus all the header stuff. We do it that way for all
the other contexts and I agree that it's a good idea as it keeps the
malloc request sizes powers of 2.

+ * BumpFree
+ *        Unsupported.
[...]
+ * BumpRealloc
+ *        Unsupported.

Rather than the error, can't we make this a no-op (potentially
optionally, or in a different memory context?)

Unfortunately not. There are no MemoryChunks on bump chunks so we've
no way to determine the context type a given pointer belongs to. I've
left the MemoryChunk on there for debug builds so we can get the
ERRORs to allow us to fix the broken code that is pfreeing these
chunks.

I understand that allowing pfree/repalloc in bump contexts requires
each allocation to have a MemoryChunk prefix in overhead, but I think
it's still a valid use case to have a very low overhead allocator with
no-op deallocator (except context reset). Do you have performance
comparison results between with and without the overhead of
MemoryChunk?

Oh right, you've taken this into account. I was hoping not to have
the headers otherwise the only gains we see over using generation.c is
that of the allocation function being faster.

I certainly did do benchmarks in [1]/messages/by-id/CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com and saw the 338% increase due to
the reduction in memory. That massive jump happened by accident as
the sort on tenk1 went from not fitting into default 4MB work_mem to
fitting in, so what I happened to measure there was the difference of
spilling to disk and not. The same could happen for this case, so the
overhead of having the chunk headers really depends on what the test
is. Probably, "potentially large" is likely a good way to describe the
overhead of having chunk headers. However, to a lesser extent, there
will be a difference for large sorts as we'll be able to fit more
tuples per batch and do fewer batches. The smaller the tuple, the
more that will be noticeable as the chunk header is a larger portion
of the overall allocation with those.

David

[1]: /messages/by-id/CAApHDvoH4ASzsAOyHcxkuY01Qf++8JJ0paw+03dk+W25tQEcNQ@mail.gmail.com

#15David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#6)
Re: Add bump memory context type and use it for tuplesorts

Thanks for taking an interest in this.

On Sat, 17 Feb 2024 at 11:46, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:

I wasn't paying much attention to these memcontext reworks in 2022, so
my instinct was simply to use one of those "UNUSED" IDs. But after
looking at the 80ef92675823 a bit more, are those IDs really unused? I
mean, we're relying on those values to detect bogus pointers, right? So
if we instead start using those values for a new memory context, won't
we lose the ability to detect those issues?

I wouldn't say we're "relying" on them. Really there just there to
improve debugability. If we call any code that tries to look at the
MemoryChunk header of a malloc'd chunk, then we can expect bad things
to happen. We no longer have any code which does this.
MemoryContextContains() did, and it's now gone.

Maybe I'm completely misunderstanding the implication of those limits,
but doesn't this mean the claim that we can support 8 memory context
types is not quite true, and the limit is 4, because the 4 IDs are
already used for malloc stuff?

I think we all expected a bit more pain from the memory context
change. I was happy that Tom did the extra work to look at the malloc
patterns of glibc, but I think there's been very little gone wrong.
The reserved MemoryContextMethodIDs do seem to have allowed [1]/messages/by-id/796b65c3-57b7-bddf-b0d5-a8afafb8b627@gmail.com to be
found, but I guess there'd have been a segfault instead of an ERROR
without the reserved IDs.

I've attached version 2, now split into 2 patches.

0001 for the bump allocator
0002 to use the new allocator for tuplesorts

David

[1]: /messages/by-id/796b65c3-57b7-bddf-b0d5-a8afafb8b627@gmail.com

Attachments:

v3-0001-Introduce-a-bump-memory-allocator.patchtext/plain; charset=US-ASCII; name=v3-0001-Introduce-a-bump-memory-allocator.patchDownload+794-10
v3-0002-Use-bump-memory-context-for-tuplesorts.patchtext/plain; charset=US-ASCII; name=v3-0002-Use-bump-memory-context-for-tuplesorts.patchDownload+78-34
#16Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: David Rowley (#12)
Re: Add bump memory context type and use it for tuplesorts

On Tue, 20 Feb 2024 at 10:41, David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 7 Nov 2023 at 07:55, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

+++ b/src/backend/utils/mmgr/bump.c
+BumpBlockIsEmpty(BumpBlock *block)
+{
+    /* it's empty if the freeptr has not moved */
+    return (block->freeptr == (char *) block + Bump_BLOCKHDRSZ);
[...]
+static inline void
+BumpBlockMarkEmpty(BumpBlock *block)
+{
+#if defined(USE_VALGRIND) || defined(CLOBBER_FREED_MEMORY)
+    char       *datastart = ((char *) block) + Bump_BLOCKHDRSZ;

These two use different definitions of the start pointer. Is that deliberate?

hmm, I'm not sure if I follow what you mean. Are you talking about
the "datastart" variable and the assignment of block->freeptr (which
you've not quoted?)

What I meant was that

(char *) block + Bump_BLOCKHDRSZ

vs

((char *) block) + Bump_BLOCKHDRSZ

, when combined with my little experience with pointer addition and
precedence, and a lack of compiler at the ready at that point in time,
I was afraid that "(char *) block + Bump_BLOCKHDRSZ" would be parsed
as "(char *) (block + Bump_BLOCKHDRSZ)", which would get different
offsets across the two statements.
Godbolt has since helped me understand that both are equivalent.

Kind regards,

Matthias van de Meent

#17Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: David Rowley (#14)
Re: Add bump memory context type and use it for tuplesorts

On Tue, 20 Feb 2024 at 11:02, David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 26 Jan 2024 at 01:29, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

+    allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
+    if (minContextSize != 0)
+        allocSize = Max(allocSize, minContextSize);
+    else
+        allocSize = Max(allocSize, initBlockSize);

Shouldn't this be the following, considering the meaning of "initBlockSize"?

No, we want to make the blocksize exactly initBlockSize if we can. Not
initBlockSize plus all the header stuff. We do it that way for all
the other contexts and I agree that it's a good idea as it keeps the
malloc request sizes powers of 2.

One part of the reason of my comment was that initBlockSize was
ignored in favour of minContextSize if that was configured, regardless
of the value of initBlockSize. Is it deliberately ignored when
minContextSize is set?

+ * BumpFree
+ *        Unsupported.
[...]
+ * BumpRealloc
+ *        Unsupported.

Rather than the error, can't we make this a no-op (potentially
optionally, or in a different memory context?)

Unfortunately not. There are no MemoryChunks on bump chunks so we've
no way to determine the context type a given pointer belongs to. I've
left the MemoryChunk on there for debug builds so we can get the
ERRORs to allow us to fix the broken code that is pfreeing these
chunks.

I understand that allowing pfree/repalloc in bump contexts requires
each allocation to have a MemoryChunk prefix in overhead, but I think
it's still a valid use case to have a very low overhead allocator with
no-op deallocator (except context reset). Do you have performance
comparison results between with and without the overhead of
MemoryChunk?

Oh right, you've taken this into account. I was hoping not to have
the headers otherwise the only gains we see over using generation.c is
that of the allocation function being faster.

[...] The smaller the tuple, the
more that will be noticeable as the chunk header is a larger portion
of the overall allocation with those.

I see. Thanks for the explanation.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

#18David Rowley
dgrowleyml@gmail.com
In reply to: Matthias van de Meent (#16)
Re: Add bump memory context type and use it for tuplesorts

On Tue, 20 Feb 2024 at 23:52, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

What I meant was that

(char *) block + Bump_BLOCKHDRSZ

vs

((char *) block) + Bump_BLOCKHDRSZ

, when combined with my little experience with pointer addition and
precedence, and a lack of compiler at the ready at that point in time,
I was afraid that "(char *) block + Bump_BLOCKHDRSZ" would be parsed
as "(char *) (block + Bump_BLOCKHDRSZ)", which would get different
offsets across the two statements.
Godbolt has since helped me understand that both are equivalent.

I failed to notice this. I've made them the same regardless to
prevent future questions from being raised about the discrepancy
between the two.

David

#19David Rowley
dgrowleyml@gmail.com
In reply to: Matthias van de Meent (#17)
Re: Add bump memory context type and use it for tuplesorts

On Wed, 21 Feb 2024 at 00:02, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

On Tue, 20 Feb 2024 at 11:02, David Rowley <dgrowleyml@gmail.com> wrote:

On Fri, 26 Jan 2024 at 01:29, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:

+    allocSize = MAXALIGN(sizeof(BumpContext)) + Bump_BLOCKHDRSZ +
+    if (minContextSize != 0)
+        allocSize = Max(allocSize, minContextSize);
+    else
+        allocSize = Max(allocSize, initBlockSize);

Shouldn't this be the following, considering the meaning of "initBlockSize"?

No, we want to make the blocksize exactly initBlockSize if we can. Not
initBlockSize plus all the header stuff. We do it that way for all
the other contexts and I agree that it's a good idea as it keeps the
malloc request sizes powers of 2.

One part of the reason of my comment was that initBlockSize was
ignored in favour of minContextSize if that was configured, regardless
of the value of initBlockSize. Is it deliberately ignored when
minContextSize is set?

Ok, it's a good question. It's to allow finer-grained control over the
initial block as it allows it to be a fixed given size without
affecting the number that we double for the subsequent blocks.

e.g BumpContextCreate(64*1024, 8*1024, 1024*1024);

would make the first block 64K and the next block 16K, followed by
32K, 64K ... 1MB.

Whereas, BumpContextCreate(0, 8*1024, 1024*1024) will start at 8K, 16K ... 1MB.

It seems useful as you might have a good idea of how much memory the
common case has and want to do that without having to allocate
subsequent blocks, but if slightly more memory is required sometimes,
you probably don't want the next malloc to be double the common size,
especially if the common size is large.

Apart from slab.c, this is how all the other contexts work. It seems
best to keep this and not to go against the grain on this as there's
more to consider if we opt to change the context types of existing
contexts.

David

#20David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#19)
Re: Add bump memory context type and use it for tuplesorts

There've been a few changes to the memory allocators in the past week
and some of these changes also need to be applied to bump.c. So, I've
rebased the patches on top of today's master. See attached.

I also re-ran the performance tests to check the allocation
performance against the recently optimised aset, generation and slab
contexts. The attached graph shows the time it took in seconds to
allocate 1GB of memory performing a context reset after 1MB. The
function I ran the test on is in the attached
pg_allocate_memory_test.patch.txt file.

The query I ran was:

select chksz,mtype,pg_allocate_memory_test_reset(chksz,
1024*1024,1024*1024*1024, mtype) from (values(8),(16),(32),(64))
sizes(chksz),(values('aset'),('generation'),('slab'),('bump'))
cxt(mtype) order by mtype,chksz;

David

Attachments:

v4-0001-Introduce-a-bump-memory-allocator.patchtext/plain; charset=US-ASCII; name=v4-0001-Introduce-a-bump-memory-allocator.patchDownload+865-10
v4-0002-Use-bump-memory-context-for-tuplesorts.patchtext/plain; charset=US-ASCII; name=v4-0002-Use-bump-memory-context-for-tuplesorts.patchDownload+78-34
pg_allocate_memory_test.patch.txttext/plain; charset=US-ASCII; name=pg_allocate_memory_test.patch.txtDownload+234-0
bump_performance_2024-03-05.pngimage/png; name=bump_performance_2024-03-05.pngDownload+1-0
#21John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#20)
#22Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: John Naylor (#21)
#23David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#22)
#24David Rowley
dgrowleyml@gmail.com
In reply to: John Naylor (#21)
#25John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#24)
#26Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#23)
#27David Rowley
dgrowleyml@gmail.com
In reply to: Tomas Vondra (#26)
#28Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#27)
#29David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#20)
#30Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#30)
#32David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#32)
#34Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Tom Lane (#33)
#35Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthias van de Meent (#34)
#36Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Tom Lane (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matthias van de Meent (#36)
#38David Rowley
dgrowleyml@gmail.com
In reply to: Tom Lane (#37)
#39David Rowley
dgrowleyml@gmail.com
In reply to: Matthias van de Meent (#36)
#40Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: David Rowley (#39)
#41David Rowley
dgrowleyml@gmail.com
In reply to: Matthias van de Meent (#40)
#42Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: David Rowley (#41)
#43John Naylor
john.naylor@enterprisedb.com
In reply to: David Rowley (#39)
#44David Rowley
dgrowleyml@gmail.com
In reply to: John Naylor (#43)
#45Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: David Rowley (#44)
#46Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Tomas Vondra (#45)
#47Andres Freund
andres@anarazel.de
In reply to: Tomas Vondra (#45)
#48Tomas Vondra
tomas.vondra@2ndquadrant.com
In reply to: Andres Freund (#47)
#49Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tomas Vondra (#48)
#50Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#49)
#51Andres Freund
andres@anarazel.de
In reply to: Daniel Gustafsson (#50)
#52David Rowley
dgrowleyml@gmail.com
In reply to: Andres Freund (#47)
#53Daniel Gustafsson
daniel@yesql.se
In reply to: Andres Freund (#51)
#54Amul Sul
sulamul@gmail.com
In reply to: Daniel Gustafsson (#53)
#55Daniel Gustafsson
daniel@yesql.se
In reply to: Amul Sul (#54)
#56David Rowley
dgrowleyml@gmail.com
In reply to: Amul Sul (#54)
#57David Rowley
dgrowleyml@gmail.com
In reply to: David Rowley (#44)
#58Amul Sul
sulamul@gmail.com
In reply to: David Rowley (#56)