PATCH: two slab-like memory allocators
Hi,
Back in the bug #14231 thread [1]/messages/by-id/20160706185502.1426.28143@wrigleys.postgresql.org, dealing with performance issues in
reorderbuffer due to excessive number of expensive free() calls, I've
proposed to resolve that by a custom slab-like memory allocator,
suitable for fixed-size allocations. I'd like to put this into the next
CF, and it's probably too invasive change to count as a bugfix anyway.
[1]: /messages/by-id/20160706185502.1426.28143@wrigleys.postgresql.org
/messages/by-id/20160706185502.1426.28143@wrigleys.postgresql.org
This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):
Slab
----
* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)
GenSlab
-------
* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current Slab
So none of this is meant as a universal replacement of AllocSet, but in
the suitable cases the results seem really promising. For example for
the simple test query in [1]/messages/by-id/20160706185502.1426.28143@wrigleys.postgresql.org, the performance improvement is this:
N | master | patched
-----------------------------
10000 | 100ms | 100ms
50000 | 15000ms | 350ms
100000 | 146000ms | 700ms
200000 | ? | 1400ms
That's a fairly significant improvement, and the submitted version of
the patches should perform even better (~2x, IIRC).
There's a bunch of TODOs - e.g. handling of realloc() calls in the
GenSlab, and probably things I haven't thought about.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-simple-slab-allocator-fixed-size-allocations.patchbinary/octet-stream; name=0001-simple-slab-allocator-fixed-size-allocations.patchDownload+935-94
0002-generational-slab-auto-tuning-allocator.patchbinary/octet-stream; name=0002-generational-slab-auto-tuning-allocator.patchDownload+369-72
Hi Tomas,
On 02/08/16 17:44, Tomas Vondra wrote:
This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):Slab
----
* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)GenSlab
-------
* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current Slab
So it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?
So none of this is meant as a universal replacement of AllocSet, but in
the suitable cases the results seem really promising. For example for
the simple test query in [1], the performance improvement is this:N | master | patched
-----------------------------
10000 | 100ms | 100ms
50000 | 15000ms | 350ms
100000 | 146000ms | 700ms
200000 | ? | 1400msThat's a fairly significant improvement, and the submitted version of
the patches should perform even better (~2x, IIRC).
I agree that it improves performance quite nicely and that reoderbuffer
could use this.
About the code. I am not quite sure that this needs to be split into two
patches especially if 1/3 of the second patch is the removal of the code
added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.
Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.
One thing I don't understand however is why the freelist is both array
and doubly linked list and why there is new implementation of said
doubly linked list given that we have dlist.
+/* + * SlabContext is our standard implementation of MemoryContext. + *
Really?
+/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + */
Is this true? Why? And if it is then why doesn't the SlabChunk actually
match the StandardChunkHeader?
+#define SlabPointerIsValid(pointer) PointerIsValid(pointer)
What's the point of this given that it's defined in the .c file?
+static void * +SlabAlloc(MemoryContext context, Size size) +{ + Slab set = (Slab) context; + SlabBlock block; + SlabChunk chunk; + int idx; + + AssertArg(SlabIsValid(set)); + + Assert(size == set->chunkSize);
I wonder if there should be stronger protection (ie, elog) for the size
matching.
+static void * +SlabRealloc(MemoryContext context, void *pointer, Size size) +{ + Slab set = (Slab)context; + + /* can't do actual realloc with slab, but let's try to be gentle */ + if (size == set->chunkSize) + return pointer; + + /* we can't really do repalloc with this allocator */ + Assert(false);
This IMHO should definitely be elog.
+static void +SlabCheck(MemoryContext context) +{ + /* FIXME */ +}
Do you plan to implement this interface?
+#define SLAB_DEFAULT_BLOCK_SIZE 8192 +#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)
I am guessing this is based on max_cached_tuplebufs? Maybe these could
be written with same style?
GenSlab:
Since this is relatively simple wrapper it looks mostly ok to me. The
only issue I have here is that I am not quite sure about those FIXME
functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
mcxt but I guess the alternative there is to error out so this is
probably preferable. Would want to hear other opinions here.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 09/25/2016 08:48 PM, Petr Jelinek wrote:
Hi Tomas,
On 02/08/16 17:44, Tomas Vondra wrote:
This patch actually includes two new memory allocators (not one). Very
brief summary (for more detailed explanation of the ideas, see comments
at the beginning of slab.c and genslab.c):Slab
----
* suitable for fixed-length allocations (other pallocs fail)
* much simpler than AllocSet (no global freelist management etc.)
* free space is tracked per block (using a simple bitmap)
* which allows freeing the block once all chunks are freed (AllocSet
will hold the memory forever, in the hope of reusing it)GenSlab
-------
* suitable for non-fixed-length allocations, but with chunks of mostly
the same size (initially unknown, the context will tune itself)
* a combination AllocSet and Slab (or a sequence of Slab allocators)
* the goal is to do most allocations in Slab context
* there's always a single 'current' Slab context, and every now and and
then it's replaced with a new generation (with the chunk size computed
from recent requests)
* the AllocSet context is used for chunks too large for current SlabSo it's just wrapper around the other two allocators to make this
usecase easier to handle. Do you expect there will be use for the
GenSlab eventually outside of the reoderbuffer?
Yes, you might say it's just a wrapper around the other two allocators,
but it *also* includes the logic of recomputing chunk size etc.
I haven't thought about other places that might benefit from these new
allocators very much - in general, it's useful for places that produce a
stream of equally-sized items (GenSlab relaxes this), that are pfreed()
in ~FIFO manner (i.e. roughly in the order of allocation).
So none of this is meant as a universal replacement of AllocSet,
but in the suitable cases the results seem really promising. For
example for the simple test query in [1], the performance
improvement is this:N | master | patched
-----------------------------
10000 | 100ms | 100ms
50000 | 15000ms | 350ms
100000 | 146000ms | 700ms
200000 | ? | 1400msThat's a fairly significant improvement, and the submitted version
of the patches should perform even better (~2x, IIRC).I agree that it improves performance quite nicely and that
reoderbuffer could use this.About the code. I am not quite sure that this needs to be split into
two patches especially if 1/3 of the second patch is the removal of
the code added by the first one and otherwise it's quite small and
straightforward. That is unless you expect the GenSlab to not go in.
I don't know - it seemed natural to first introduce the Slab, as it's
easier to discuss it separately, and it works for 2 of the 3 contexts
needed in reorderbuffer.
GenSlab is an improvement of Slab, or rather based on it, so that it
works for the third context. And it introduces some additional ideas
(particularly the generational design, etc.)
Of course, none of this means it has to be committed in two separate
chunks, or that I don't expect GenSlab to get committed ...
Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.One thing I don't understand however is why the freelist is both
array and doubly linked list and why there is new implementation of
said doubly linked list given that we have dlist.
Hmm, perhaps that should be explained better.
In AllocSet, we only have a global freelist of chunks, i.e. we have a
list of free chunks for each possible size (there's 11 sizes, starting
with 8 bytes and then doubling the size). So freelist[0] is a list of
free 8B chunks, freelist[1] is a list of free 16B chunks, etc.
In Slab, the freelist has two levels - first there's a bitmap on each
block (which is possible, as the chunks have constant size), tracking
which chunks of that particular block are free. This makes it trivial to
check that all chunks on the block are free, and free the whole block
(which is impossible with AllocSet).
Second, the freelist at the context level tracks blocks with a given
number of free chunks - so freelist[0] tracks completely full blocks,
freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
reuse space on almost full blocks first, in the hope that some of the
less full blocks will get completely empty (and freed to the OS).
Is that clear?
+/* + * SlabContext is our standard implementation of MemoryContext. + *Really?
Meh, that's clearly a bogus comment.
+/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + */Is this true? Why? And if it is then why doesn't the SlabChunk
actually match the StandardChunkHeader?
It is true, a lot of stuff in MemoryContext infrastructure relies on
that. For example when you do pfree(ptr), we actually do something like
header = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))
to get the chunk header - which includes pointer to the memory context
and other useful stuff.
This also means we can put additional fields before StandardChunkHeader
as that does not break this pointer arithmetic, i.e. SlabChunkData is
effectively defined like this:
typedef struct SlabChunkData
{
/* block owning this chunk */
void *block;
/* standard header */
StandardChunkHeader header;
} SlabChunkData;
+#define SlabPointerIsValid(pointer) PointerIsValid(pointer)
What's the point of this given that it's defined in the .c file?
Meh, I've copied this from aset.c, but I see it's useless there too.
+static void * +SlabAlloc(MemoryContext context, Size size) +{ + Slab set = (Slab) context; + SlabBlock block; + SlabChunk chunk; + int idx; + + AssertArg(SlabIsValid(set)); + + Assert(size == set->chunkSize);I wonder if there should be stronger protection (ie, elog) for the size
matching.
Perhaps, I'm not opposed to that.
+static void * +SlabRealloc(MemoryContext context, void *pointer, Size size) +{ + Slab set = (Slab)context; + + /* can't do actual realloc with slab, but let's try to be gentle */ + if (size == set->chunkSize) + return pointer; + + /* we can't really do repalloc with this allocator */ + Assert(false);This IMHO should definitely be elog.
Yeah, you're probably right.
+static void +SlabCheck(MemoryContext context) +{ + /* FIXME */ +}Do you plan to implement this interface?
Yes, although I'm not sure what checks should go there. The only thing I
can think of right now is checking that the number of free chunks on a
block (according to the bitmap) matches the freelist index.
+#define SLAB_DEFAULT_BLOCK_SIZE 8192 +#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)I am guessing this is based on max_cached_tuplebufs? Maybe these
could be written with same style?
Not sure I understand what you mean by "based on"? I don't quite
remember how I came up with those constants, but I guess 8kB and 8MB
seemed like good values.
Also, what style you mean? I've used the same style as for ALLOCSET_*
constants in the same file.
GenSlab:
Since this is relatively simple wrapper it looks mostly ok to me. The
only issue I have here is that I am not quite sure about those FIXME
functions (Free, Realloc, GetChunkSpace). It's slightly weird to call to
mcxt but I guess the alternative there is to error out so this is
probably preferable. Would want to hear other opinions here.
Yeah, I'd like to get some opinions on that too - that's why I left
there the FIXMEs, actually.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 25/09/16 22:17, Tomas Vondra wrote:
On 09/25/2016 08:48 PM, Petr Jelinek wrote:
Slab:
In general it seems understandable, the initial description helps to
understand what's happening well enough.One thing I don't understand however is why the freelist is both
array and doubly linked list and why there is new implementation of
said doubly linked list given that we have dlist.Hmm, perhaps that should be explained better.
In AllocSet, we only have a global freelist of chunks, i.e. we have a
list of free chunks for each possible size (there's 11 sizes, starting
with 8 bytes and then doubling the size). So freelist[0] is a list of
free 8B chunks, freelist[1] is a list of free 16B chunks, etc.In Slab, the freelist has two levels - first there's a bitmap on each
block (which is possible, as the chunks have constant size), tracking
which chunks of that particular block are free. This makes it trivial to
check that all chunks on the block are free, and free the whole block
(which is impossible with AllocSet).Second, the freelist at the context level tracks blocks with a given
number of free chunks - so freelist[0] tracks completely full blocks,
freelist[1] is a list of blocks with 1 free chunk, etc. This is used to
reuse space on almost full blocks first, in the hope that some of the
less full blocks will get completely empty (and freed to the OS).Is that clear?
Ah okay, makes sense, the documentation of this could be improved then
though as it's all squashed into single sentence that wasn't quite clear
for me.
+/* + * SlabChunk + * The prefix of each piece of memory in an SlabBlock + * + * NB: this MUST match StandardChunkHeader as defined by utils/memutils.h. + */Is this true? Why? And if it is then why doesn't the SlabChunk
actually match the StandardChunkHeader?It is true, a lot of stuff in MemoryContext infrastructure relies on
that. For example when you do pfree(ptr), we actually do something likeheader = (StandardChunkHeader*)(ptr - sizeof(StandardChunkHeader))
to get the chunk header - which includes pointer to the memory context
and other useful stuff.This also means we can put additional fields before StandardChunkHeader
as that does not break this pointer arithmetic, i.e. SlabChunkData is
effectively defined like this:typedef struct SlabChunkData
{
/* block owning this chunk */
void *block;/* standard header */
StandardChunkHeader header;
} SlabChunkData;
Yes but your struct then does not match StandardChunkHeader exactly so
it should be explained in more detail (the aset.c where this comment is
also present has struct that matches StandardChunkHeader so it's
sufficient there).
+static void +SlabCheck(MemoryContext context) +{ + /* FIXME */ +}Do you plan to implement this interface?
Yes, although I'm not sure what checks should go there. The only thing I
can think of right now is checking that the number of free chunks on a
block (according to the bitmap) matches the freelist index.
Yeah this context does not seem like it needs too much checking. The
freelist vs free chunks check sounds ok to me. I guess GenSlab then will
call the checks for the underlying contexts.
+#define SLAB_DEFAULT_BLOCK_SIZE 8192 +#define SLAB_LARGE_BLOCK_SIZE (8 * 1024 * 1024)I am guessing this is based on max_cached_tuplebufs? Maybe these
could be written with same style?Not sure I understand what you mean by "based on"? I don't quite
remember how I came up with those constants, but I guess 8kB and 8MB
seemed like good values.Also, what style you mean? I've used the same style as for ALLOCSET_*
constants in the same file.
I mean using 8 * 1024 for SLAB_DEFAULT_BLOCK_SIZE so that it's more
readable. ALLOCSET_* does that too (with exception of
ALLOCSET_SEPARATE_THRESHOLD which I have no idea why it's different from
rest of the code).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Attached is v2 of the patch, updated based on the review. That means:
- Better comment explaining how free chunks are tracked in Slab context.
- Removed the unused SlabPointerIsValid macro.
- Modified the comment before SlabChunkData, explaining how it relates
to StandardChunkHeader.
- Replaced the two Assert() calls with elog().
- Implemented SlabCheck(). I've ended up with quite a few checks there,
checking pointers between the context, block and chunks, checks due
to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the
number of free chunks (bitmap, freelist vs. chunk header).
- I've also modified SlabContextCreate() to compute chunksPerBlock a
bit more efficiently (use a simple formula instead of the loop, which
might be a bit too expensive for large blocks / small chunks).
I haven't done any changes to GenSlab, but I do have a few notes:
Firstly, I've realized there's an issue when chunkSize gets too large -
once it exceeds blockSize, the SlabContextCreate() fails as it's
impossible to place a single chunk into the block. In reorderbuffer,
this may happen when the tuples (allocated in tup_context) get larger
than 8MB, as the context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB).
For Slab the elog(ERROR) is fine as both parameters are controlled by
the developer directly, but GenSlab computes the chunkSize on the fly,
so we must not let it fail like that - that'd result in unpredictable
failures, which is not very nice.
I see two ways to fix this. We may either increase the block size
automatically - e.g. instead of specifying specifying chunkSize and
blockSize when creating the Slab, specify chunkSize and chunksPerBlock
(and then choose the smallest 2^k block large enough). For example with
chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's
the closest 2^k block larger than 96000 bytes.
But maybe there's a simpler solution - we may simply cap the chunkSize
(in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles
those requests in a special way - for example instead of tracking them
in freelist, those chunks got freed immediately.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
0001-simple-slab-allocator-fixed-size-allocations.patchbinary/octet-stream; name=0001-simple-slab-allocator-fixed-size-allocations.patchDownload+1039-101
0002-generational-slab-auto-tuning-allocator.patchbinary/octet-stream; name=0002-generational-slab-auto-tuning-allocator.patchDownload+369-72
I reproduced the quadradic pfree performance problem and verified that
these patches solved it.
The slab.c data structures and functions contain no quadradic components.
I noticed the sizing loop in SlabContextCreate() and came up with
a similar formula to determine chunksPerBlock that you arrived at.
Firstly, I've realized there's an issue when chunkSize gets too
large - once it exceeds blockSize, the SlabContextCreate() fails
as it's impossible to place a single chunk into the block. In
reorderbuffer, this may happen when the tuples (allocated in
tup_context) get larger than 8MB, as the context uses
SLAB_LARGE_BLOCK_SIZE (which is 8MB).But maybe there's a simpler solution - we may simply cap the
chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
AllocSet handles those requests in a special way - for example
instead of tracking them in freelist, those chunks got freed
immediately.
I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.
In slab.c it looks like a line in the top comments could be clearer.
Perhaps this is what is meant.
< * (plus alignment), now wasting memory.
* (plus alignment), not wasting memory.
In slab.c some lines are over 80 characters could be folded.
It would be nice to give each patch version a unique file name.
Nice patch, I enjoyed reading it!
Best, John
John Gorman
On Mon, Sep 26, 2016 at 10:10 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com
Show quoted text
wrote:
Hi,
Attached is v2 of the patch, updated based on the review. That means:
- Better comment explaining how free chunks are tracked in Slab context.
- Removed the unused SlabPointerIsValid macro.
- Modified the comment before SlabChunkData, explaining how it relates
to StandardChunkHeader.- Replaced the two Assert() calls with elog().
- Implemented SlabCheck(). I've ended up with quite a few checks there,
checking pointers between the context, block and chunks, checks due
to MEMORY_CONTEXT_CHECKING etc. And of course, cross-checking the
number of free chunks (bitmap, freelist vs. chunk header).- I've also modified SlabContextCreate() to compute chunksPerBlock a
bit more efficiently (use a simple formula instead of the loop, which
might be a bit too expensive for large blocks / small chunks).I haven't done any changes to GenSlab, but I do have a few notes:
Firstly, I've realized there's an issue when chunkSize gets too large -
once it exceeds blockSize, the SlabContextCreate() fails as it's impossible
to place a single chunk into the block. In reorderbuffer, this may happen
when the tuples (allocated in tup_context) get larger than 8MB, as the
context uses SLAB_LARGE_BLOCK_SIZE (which is 8MB).For Slab the elog(ERROR) is fine as both parameters are controlled by the
developer directly, but GenSlab computes the chunkSize on the fly, so we
must not let it fail like that - that'd result in unpredictable failures,
which is not very nice.I see two ways to fix this. We may either increase the block size
automatically - e.g. instead of specifying specifying chunkSize and
blockSize when creating the Slab, specify chunkSize and chunksPerBlock (and
then choose the smallest 2^k block large enough). For example with
chunkSize=96 and chunksPerBlock=1000, we'd get 128kB blocks, as that's the
closest 2^k block larger than 96000 bytes.But maybe there's a simpler solution - we may simply cap the chunkSize (in
GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because AllocSet handles those
requests in a special way - for example instead of tracking them in
freelist, those chunks got freed immediately.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9/26/16 9:10 PM, Tomas Vondra wrote:
Attached is v2 of the patch, updated based on the review. That means:
+ /* make sure the block can store at least one chunk (with 1B for a
bitmap)? */
(and the comment below it)
I find the question to be confusing... I think these would be better as
+ /* make sure the block can store at least one chunk (with 1B for a
bitmap) */
and
+ /* number of chunks can we fit into a block, including header and
bitmap */
I'm also wondering if a 1B freespace bitmap is actually useful. If
nothing else, there should probably be a #define for the initial bitmap
size.
+ /* otherwise add it to the proper freelist bin */
Looks like something went missing... :)
Should zeroing the block in SlabAlloc be optional like it is with
palloc/palloc0?
+ /*
+ * If there are no free chunks in any existing block, create a new block
+ * and put it to the last freelist bucket.
+ */
+ if (set->minFreeCount == 0)
Couldn't there be blocks that have a free count > minFreeCount? (In
which case you don't want to just alloc a new block, no?)
Nevermind, after reading further down I understand what's going on. I
got confused by "we've decreased nfree for a block with the minimum
count" until I got to the part that deals with minFreeCount = 0. I think
it'd be helpful if the "we've decreased nfree" comment mentioned that if
nfree is 0 we'll look for the correct value after updating the freelists.
+ block->bitmapptr[idx/8] |= (0x01 << (idx % 8));
Did you consider using a pre-defined map of values to avoid the shift? I
know I've seen that somewhere in the code...
Patch 2...
Doesn't GenSlabReset() need to actually free something? If the call
magically percolates to the other contexts it'd be nice to note that in
the comment.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/02/2016 01:53 AM, Jim Nasby wrote:
On 9/26/16 9:10 PM, Tomas Vondra wrote:
Attached is v2 of the patch, updated based on the review. That means:
+ /* make sure the block can store at least one chunk (with 1B for a
bitmap)? */
(and the comment below it)I find the question to be confusing... I think these would be better as
+ /* make sure the block can store at least one chunk (with 1B for a bitmap) */ and + /* number of chunks can we fit into a block, including header and bitmap */
Thanks, will rephrase the comments a bit.
I'm also wondering if a 1B freespace bitmap is actually useful. If
nothing else, there should probably be a #define for the initial
bitmap size.
That's not the point. The point is that we need to store at least one
entry per block, with one bit in a bitmap. But we can't store just a
single byte - we always allocate at least 1 byte. So it's more an
explanation for the "1" literal in the check, nothing more.
+ /* otherwise add it to the proper freelist bin */
Looks like something went missing... :)
Ummm? The patch contains this:
+ /* otherwise add it to the proper freelist bin */
+ if (set->freelist[block->nfree])
+ set->freelist[block->nfree]->prev = block;
+
+ block->next = set->freelist[block->nfree];
+ set->freelist[block->nfree] = block;
Which does exactly the thing it should do. Or what is missing?
Should zeroing the block in SlabAlloc be optional like it is with
palloc/palloc0?
Good catch. The memset(,0) should not be in SlabAlloc() as all, as the
zeroing is handled in mctx.c, independently of the implementation.
+ /* + * If there are no free chunks in any existing block, create a new block + * and put it to the last freelist bucket. + */ + if (set->minFreeCount == 0) Couldn't there be blocks that have a free count > minFreeCount? (In which case you don't want to just alloc a new block, no?)Nevermind, after reading further down I understand what's going on. I
got confused by "we've decreased nfree for a block with the minimum
count" until I got to the part that deals with minFreeCount = 0. I think
it'd be helpful if the "we've decreased nfree" comment mentioned that if
nfree is 0 we'll look for the correct value after updating the freelists.
Right. I think it'd be good to add an assert ensuring the minFreeCount
value matches the block freelist. Or at least SlabCheck() could check this.
+ block->bitmapptr[idx/8] |= (0x01 << (idx % 8));
Did you consider using a pre-defined map of values to avoid the shift? I
know I've seen that somewhere in the code...Patch 2...
Doesn't GenSlabReset() need to actually free something? If the call
magically percolates to the other contexts it'd be nice to note that in
the comment.
No, the other contexts are created as children of GenSlab, so the memory
context infrastructure resets them automatically. I don't think this
needs to be mentioned in the comments - it's pretty basic part of the
parent-child context relationship.
Thanks!
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/02/2016 12:23 AM, John Gorman wrote:
I reproduced the quadradic pfree performance problem and verified
that these patches solved it.The slab.c data structures and functions contain no quadradic
components.I noticed the sizing loop in SlabContextCreate() and came up with a
similar formula to determine chunksPerBlock that you arrived at.
;-)
Firstly, I've realized there's an issue when chunkSize gets too
large - once it exceeds blockSize, the SlabContextCreate() fails
as it's impossible to place a single chunk into the block. In
reorderbuffer, this may happen when the tuples (allocated in
tup_context) get larger than 8MB, as the context uses
SLAB_LARGE_BLOCK_SIZE (which is 8MB).But maybe there's a simpler solution - we may simply cap the
chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
AllocSet handles those requests in a special way - for example
instead of tracking them in freelist, those chunks got freed
immediately.I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.
Right.
In slab.c it looks like a line in the top comments could be clearer.
Perhaps this is what is meant.< * (plus alignment), now wasting memory.
* (plus alignment), not wasting memory.
In slab.c some lines are over 80 characters could be folded.
It would be nice to give each patch version a unique file name.
OK, will fix.
Nice patch, I enjoyed reading it!
;-)
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/02/2016 12:23 AM, John Gorman wrote:
I reproduced the quadradic pfree performance problem and verified that
these patches solved it.The slab.c data structures and functions contain no quadradic components.
I noticed the sizing loop in SlabContextCreate() and came up with
a similar formula to determine chunksPerBlock that you arrived at.Firstly, I've realized there's an issue when chunkSize gets too
large - once it exceeds blockSize, the SlabContextCreate() fails
as it's impossible to place a single chunk into the block. In
reorderbuffer, this may happen when the tuples (allocated in
tup_context) get larger than 8MB, as the context uses
SLAB_LARGE_BLOCK_SIZE (which is 8MB).But maybe there's a simpler solution - we may simply cap the
chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
AllocSet handles those requests in a special way - for example
instead of tracking them in freelist, those chunks got freed
immediately.I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.
One more comment about GenSlab, particularly about unpredictability of
the repalloc() behavior. It works for "large" chunks allocated in the
AllocSet part, and mostly does not work for "small" chunks allocated in
the SlabContext. Moreover, the chunkSize changes over time, so for two
chunks of the same size, repalloc() may work on one of them and fail on
the other, depending on time of allocation.
With SlabContext it's perfectly predictable - repalloc() call fails
unless the chunk size is exactly the same as before (which is perhaps a
bit pointless, but if we decide to fail even in this case it'll work
100% time).
But with GenSlabContext it's unclear whether the call fails or not.
I don't like this unpredictability - I'd much rather have consistent
failures (making sure people don't do repalloc() on with GenSlab). But I
don't see a nice way to achieve that - the repalloc() call does not go
through GenSlabRealloc() at all, but directly to SlabRealloc() or
AllocSetRealloc().
The best solution I can think of is adding an alternate version of
AllocSetMethods, pointing to a different AllocSetReset implementation.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Oct 2, 2016 at 10:15 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
One more comment about GenSlab, particularly about unpredictability of the
repalloc() behavior. It works for "large" chunks allocated in the AllocSet
part, and mostly does not work for "small" chunks allocated in the
SlabContext. Moreover, the chunkSize changes over time, so for two chunks of
the same size, repalloc() may work on one of them and fail on the other,
depending on time of allocation.With SlabContext it's perfectly predictable - repalloc() call fails unless
the chunk size is exactly the same as before (which is perhaps a bit
pointless, but if we decide to fail even in this case it'll work 100% time).But with GenSlabContext it's unclear whether the call fails or not.
I don't like this unpredictability - I'd much rather have consistent
failures (making sure people don't do repalloc() on with GenSlab). But I
don't see a nice way to achieve that - the repalloc() call does not go
through GenSlabRealloc() at all, but directly to SlabRealloc() or
AllocSetRealloc().The best solution I can think of is adding an alternate version of
AllocSetMethods, pointing to a different AllocSetReset implementation.
You guys are still playing with this patch, so moved to next CF with
"waiting on author".
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
SlabContext has a parent context. It can delegate requests that
it cannot handle to the parent context which is GenSlab. Genslab
can send them to the sister aset context.
This could handle all reallocs so there will be no surprises.
Remind me again why we cannot realloc in place for sizes
smaller than chunkSize? GenSlab is happy to allocate smaller
sizes and put them into the fixed size chunks.
Maybe SlabAlloc can be happy with sizes up to chunkSize.
if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, size);
else
return MemoryContextAlloc(set->aset, size);
On Sat, Oct 1, 2016 at 10:15 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com>
wrote:
Show quoted text
On 10/02/2016 12:23 AM, John Gorman wrote:
I reproduced the quadradic pfree performance problem and verified that
these patches solved it.The slab.c data structures and functions contain no quadradic components.
I noticed the sizing loop in SlabContextCreate() and came up with
a similar formula to determine chunksPerBlock that you arrived at.Firstly, I've realized there's an issue when chunkSize gets too
large - once it exceeds blockSize, the SlabContextCreate() fails
as it's impossible to place a single chunk into the block. In
reorderbuffer, this may happen when the tuples (allocated in
tup_context) get larger than 8MB, as the context uses
SLAB_LARGE_BLOCK_SIZE (which is 8MB).But maybe there's a simpler solution - we may simply cap the
chunkSize (in GenSlab) to ALLOC_CHUNK_LIMIT. That's fine, because
AllocSet handles those requests in a special way - for example
instead of tracking them in freelist, those chunks got freed
immediately.I like this approach because it fixes the performance problems
with smaller allocations and doesn't change how larger
allocations are handled.One more comment about GenSlab, particularly about unpredictability of the
repalloc() behavior. It works for "large" chunks allocated in the AllocSet
part, and mostly does not work for "small" chunks allocated in the
SlabContext. Moreover, the chunkSize changes over time, so for two chunks
of the same size, repalloc() may work on one of them and fail on the other,
depending on time of allocation.With SlabContext it's perfectly predictable - repalloc() call fails unless
the chunk size is exactly the same as before (which is perhaps a bit
pointless, but if we decide to fail even in this case it'll work 100% time).But with GenSlabContext it's unclear whether the call fails or not.
I don't like this unpredictability - I'd much rather have consistent
failures (making sure people don't do repalloc() on with GenSlab). But I
don't see a nice way to achieve that - the repalloc() call does not go
through GenSlabRealloc() at all, but directly to SlabRealloc() or
AllocSetRealloc().The best solution I can think of is adding an alternate version of
AllocSetMethods, pointing to a different AllocSetReset implementation.regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10/04/2016 09:44 PM, John Gorman wrote:
SlabContext has a parent context. It can delegate requests that
it cannot handle to the parent context which is GenSlab. Genslab
can send them to the sister aset context.
But Slab may also be used separately, not just as part of GenSlab
(actually, reorderbuffer has two such contexts). That complicates things
quite a bit, and it also seems a bit awkward, because:
(a) It'd require a flag in SlabContext (or perhaps a pointer to the
second context), which introduces coupling between the contexts.
(b) SlabContext was meant to be extremely simple (based on the "single
chunk size" idea), and this contradicts that a bit.
(c) It'd move chunks between the memory contexts in unpredictable ways
(although the user should treat it as a single context, and not reset
the parts independently for example).
This could handle all reallocs so there will be no surprises.
Yeah, but it's also
Remind me again why we cannot realloc in place for sizes
smaller than chunkSize? GenSlab is happy to allocate smaller
sizes and put them into the fixed size chunks.Maybe SlabAlloc can be happy with sizes up to chunkSize.
if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, size);
else
return MemoryContextAlloc(set->aset, size);
That'd be certainly possible, but it seems a bit strange as the whole
Slab is based on the idea that all chunks have the same size. Moreover,
people usually realloc() to a larger chunk, so it does not really fix
anything in practice.
In GenSlab, the situation is more complicated. But I don't like the
coupling / moving chunks between contexts, etc.
If realloc() support is a hard requirement, it immediately rules out
SlabContext() as an independent memory context. Which seems stupid, as
independent Slab contexts are quite useful for reorderbuffer use case.
For GenSlab the situation is less clear, as there probably are ways to
make it work, but I'd vote to keep it simple for now, and simply do
elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The
current use case (reorderbuffer) does not need that, and it seems like a
can of worms to me.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
attached is v3 of the patches, with a few minor fixes in Slab, and much
larger fixes in GenSlab.
Slab (minor fixes)
------------------
- Removed the unnecessary memset() of new blocks in SlabAlloc(),
although we still need to zero the free bitmap at the end of the block.
- Renamed minFreeCount to minFreeChunks, added a few comments explaining
why/how firstFreeChunk and minFreeChunks are maintained.
- Fixed / improved a bunch of additional comments, based on feedback.
GenSlab
-------
Fixed a bunch of bugs that made GenSlab utterly useless. Firstly,
chunkSize was not stored in GenSlabContextCreate(), so this check in
SlabAlloc()
if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, set->chunkSize);
else
return MemoryContextAlloc(set->aset, size);
always fell through to the set->aset case, not allocating stuff in the
Slab at all.
Secondly, nallocations / nbytes counters were not updated at all, so the
Slab was never recreated, so GenSlab was not really generational.
This only affected 1 of 3 contexts in ReorderBuffer, but apparently
those "important ones" to affect performance.
Both issues are fixed in the attached v3, which also introduces two
additional improvements discussed in this thread:
- The chunk size is limited by ALLOCSET_SEPARATE_THRESHOLD, as for large
chunks AllocSet works just fine (thanks to keeping them out of free list
etc.)
- Instead of specifying blockSize and chunkSize, GenSlabCreate() now
accepts three parameters - minBlockSize, minChunkCount and chunkSize,
and computes the minimum block size (>= minBlockSize), sufficient to
store minChunkCount chunks, each chunkSize bytes. This works much better
in the auto-tuning scenario.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachments:
slab-allocators-v3.tgzapplication/x-compressed-tar; name=slab-allocators-v3.tgzDownload
On 05/10/16 03:11, Tomas Vondra wrote:
On 10/04/2016 09:44 PM, John Gorman wrote:
Remind me again why we cannot realloc in place for sizes
smaller than chunkSize? GenSlab is happy to allocate smaller
sizes and put them into the fixed size chunks.Maybe SlabAlloc can be happy with sizes up to chunkSize.
if (size <= set->chunkSize)
return MemoryContextAlloc(set->slab, size);
else
return MemoryContextAlloc(set->aset, size);That'd be certainly possible, but it seems a bit strange as the whole
Slab is based on the idea that all chunks have the same size. Moreover,
people usually realloc() to a larger chunk, so it does not really fix
anything in practice.In GenSlab, the situation is more complicated. But I don't like the
coupling / moving chunks between contexts, etc.If realloc() support is a hard requirement, it immediately rules out
SlabContext() as an independent memory context. Which seems stupid, as
independent Slab contexts are quite useful for reorderbuffer use case.For GenSlab the situation is less clear, as there probably are ways to
make it work, but I'd vote to keep it simple for now, and simply do
elog(ERROR) in the realloc() methods - both for Slab and GenSlab. The
current use case (reorderbuffer) does not need that, and it seems like a
can of worms to me.
Hmm, so this in practice means that the caller still has to know the
details of what chunks go where. I would prefer if the realloc just
failed always and "don't do realloc on GenSlab" would be part of spec of
hat context, the randomness that you described originally is the main
problem IMHO. Maybe you could add new "constructor" function for Aset
that would create Aset which can't realloc for use inside the GenSlab?
Alternative would be of course having the individual API calls behind
Aset and Slab exported and used by GenSlab directly instead of using
child contexts. Then all the calls would go to GenSlab which could
decide what to do (and move the chunks between the allocators).
But honestly given the usecases for GenSlab, I would at the moment
prefer just to have predictable error as it can be done more cleanly and
nobody needs the functionality so far, it can be revisited once we
actually do need it.
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Oct 4, 2016 at 10:11 PM, Tomas Vondra
For GenSlab the situation is less clear, as there probably are ways to make
it work, but I'd vote to keep it simple for now, and simply do elog(ERROR)
in the realloc() methods - both for Slab and GenSlab. The current use case
(reorderbuffer) does not need that, and it seems like a can of worms to me.
Good plan. Realloc can be added later if there is an actual use case.
On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
attached is v3 of the patches, with a few minor fixes in Slab, and much
larger fixes in GenSlab.Slab (minor fixes)
------------------
- Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
still need to zero the free bitmap at the end of the block.
- Renamed minFreeCount to minFreeChunks, added a few comments explaining
why/how firstFreeChunk and minFreeChunks are maintained.
- Fixed / improved a bunch of additional comments, based on feedback.
I had a look at 0001 today, but it seems to me that it still needs
work. It's still got a lot of remnants of where you've
copy-and-pasted aset.c. I dispute this allegation:
+ * SlabContext is our standard implementation of MemoryContext.
And all of this is just a direct copy-paste; I don't really want two copies:
+ * When running under Valgrind, we want a NOACCESS memory region both before
+ * and after the allocation. The chunk header is tempting as the preceding
+ * region, but mcxt.c expects to able to examine the standard chunk header
+ * fields. Therefore, we use, when available, the requested_size field and
+ * any subsequent padding. requested_size is made NOACCESS before returning
+ * a chunk pointer to a caller. However, to reduce client request traffic,
+ * it is kept DEFINED in chunks on the free list.
And then there's this:
+#ifdef HAVE_ALLOCINFO
+#define SlabFreeInfo(_cxt, _chunk) \
+ fprintf(stderr, "AllocFree: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)
+#define SlabAllocInfo(_cxt, _chunk) \
+ fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \
+ (_cxt)->header.name, (_chunk), (_chunk)->size)
Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
AllocAlloc, a think that, as far as I can tell, is not real. But
having this new context type also pretend to be AllocAlloc is even
dumber.
+static void
+randomize_mem(char *ptr, size_t size)
+{
+ static int save_ctr = 1;
+ size_t remaining = size;
+ int ctr;
+
+ ctr = save_ctr;
+ VALGRIND_MAKE_MEM_UNDEFINED(ptr, size);
+ while (remaining-- > 0)
+ {
+ *ptr++ = ctr;
+ if (++ctr > 251)
+ ctr = 1;
+ }
+ VALGRIND_MAKE_MEM_UNDEFINED(ptr - size, size);
+ save_ctr = ctr;
+}
+#endif /* RANDOMIZE_ALLOCATED_MEMORY */
Another copy of this doesn't seem like a good idea, either.
More broadly, I'm not sure I like this design very much. The whole
point of a slab context is that all of the objects are the same size.
I wouldn't find it too difficult to support this patch if we were
adding an allocator for fixed-size objects that was then being used to
allocate objects which are fixed size. However, what we seem to be
doing is creating an allocator for fixed-size objects and then using
it for variable-size tuples. That's really pretty weird. Isn't the
root of this problem that aset.c is utterly terrible at handling large
number of allocations? Maybe we should try to attack that problem
more directly.
On a related note, the autodestruct thing is a weird hack that's only
necessary because of the hijinks already discussed in the previous
paragraph. The context has no fixed lifetime; we're just trying to
find a way of coping with possibly-shifting tuple sizes over time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 18/10/16 22:25, Robert Haas wrote:
On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:attached is v3 of the patches, with a few minor fixes in Slab, and much
larger fixes in GenSlab.Slab (minor fixes)
------------------
- Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
still need to zero the free bitmap at the end of the block.
- Renamed minFreeCount to minFreeChunks, added a few comments explaining
why/how firstFreeChunk and minFreeChunks are maintained.
- Fixed / improved a bunch of additional comments, based on feedback.I had a look at 0001 today, but it seems to me that it still needs
work. It's still got a lot of remnants of where you've
copy-and-pasted aset.c. I dispute this allegation:+ * SlabContext is our standard implementation of MemoryContext.
Are you looking at old version of the patch? I complained about this as
well and Tomas has changed that.
And then there's this:
+#ifdef HAVE_ALLOCINFO +#define SlabFreeInfo(_cxt, _chunk) \ + fprintf(stderr, "AllocFree: %s: %p, %d\n", \ + (_cxt)->header.name, (_chunk), (_chunk)->size) +#define SlabAllocInfo(_cxt, _chunk) \ + fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \ + (_cxt)->header.name, (_chunk), (_chunk)->size)Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
AllocAlloc, a think that, as far as I can tell, is not real. But
having this new context type also pretend to be AllocAlloc is even
dumber.
You are definitely looking at old version.
More broadly, I'm not sure I like this design very much. The whole
point of a slab context is that all of the objects are the same size.
I wouldn't find it too difficult to support this patch if we were
adding an allocator for fixed-size objects that was then being used to
allocate objects which are fixed size. However, what we seem to be
doing is creating an allocator for fixed-size objects and then using
it for variable-size tuples. That's really pretty weird. Isn't the
root of this problem that aset.c is utterly terrible at handling large
number of allocations? Maybe we should try to attack that problem
more directly.
It's used for TXNs which are fixed and some tuples (there is assumption
that the decoded tuples have more or less normal distribution).
I agree though that the usability beyond the ReoderBuffer is limited
because everything that will want to use it for part of allocations will
get much more complicated by the fact that it will have to use two
different allocators.
I was wondering if rather than trying to implement new allocator we
should maybe implement palloc_fixed which would use some optimized
algorithm for fixed sized objects in our current allocator. The
advantage of that would be that we could for example use that for things
like ListCell easily (memory management of which I see quite often in
profiles).
--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/19/2016 12:27 AM, Petr Jelinek wrote:
On 18/10/16 22:25, Robert Haas wrote:
On Wed, Oct 5, 2016 at 12:22 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:attached is v3 of the patches, with a few minor fixes in Slab, and much
larger fixes in GenSlab.Slab (minor fixes)
------------------
- Removed the unnecessary memset() of new blocks in SlabAlloc(), although we
still need to zero the free bitmap at the end of the block.
- Renamed minFreeCount to minFreeChunks, added a few comments explaining
why/how firstFreeChunk and minFreeChunks are maintained.
- Fixed / improved a bunch of additional comments, based on feedback.I had a look at 0001 today, but it seems to me that it still needs
work. It's still got a lot of remnants of where you've
copy-and-pasted aset.c. I dispute this allegation:+ * SlabContext is our standard implementation of MemoryContext.
Are you looking at old version of the patch? I complained about this as
well and Tomas has changed that.And then there's this:
+#ifdef HAVE_ALLOCINFO +#define SlabFreeInfo(_cxt, _chunk) \ + fprintf(stderr, "AllocFree: %s: %p, %d\n", \ + (_cxt)->header.name, (_chunk), (_chunk)->size) +#define SlabAllocInfo(_cxt, _chunk) \ + fprintf(stderr, "AllocAlloc: %s: %p, %d\n", \ + (_cxt)->header.name, (_chunk), (_chunk)->size)Well, it's pretty stupid that AllocSetAlloc is reporting it's name as
AllocAlloc, a think that, as far as I can tell, is not real. But
having this new context type also pretend to be AllocAlloc is even
dumber.You are definitely looking at old version.
Yeah.
More broadly, I'm not sure I like this design very much. The whole
point of a slab context is that all of the objects are the same size.
I wouldn't find it too difficult to support this patch if we were
adding an allocator for fixed-size objects that was then being used to
allocate objects which are fixed size. However, what we seem to be
doing is creating an allocator for fixed-size objects and then using
it for variable-size tuples. That's really pretty weird. Isn't the
root of this problem that aset.c is utterly terrible at handling large
number of allocations? Maybe we should try to attack that problem
more directly.It's used for TXNs which are fixed and some tuples (there is
assumption that the decoded tuples have more or less normal
distribution).
Yeah. There are three contexts in reorder buffers:
- changes (fixed size)
- txns (fixed size)
- tuples (variable size)
The first two work perfectly fine with Slab.
The last one (tuples) is used to allocate variable-sized bits, so I've
tried to come up with something smart - a sequence of Slabs + overflow
AllocSet. I agree that in hindsight it's a bit strange, and that the
"generational" aspect is the key aspect here - i.e. it might be possible
to implement a memory context that allocates variable-length chunks but
still segregates them into generations. That is, don't build this on top
of Slab. That would also fix the issue with two allocators in GenSlab.
I'll think about this.
I agree though that the usability beyond the ReoderBuffer is limited
because everything that will want to use it for part of allocations will
get much more complicated by the fact that it will have to use two
different allocators.
It wasn't my (primary) goal to provide allocators usable outside
ReorderBuffer. I've intended to show that perhaps using AllocSet and
then trying to compensate for the pfree() issues is the wrong direction,
and that perhaps different allocation strategy (exploiting the
ReorderBuffer specifics) would work much better. And I think the two
allocators show prove that.
I was wondering if rather than trying to implement new allocator we
should maybe implement palloc_fixed which would use some optimized
algorithm for fixed sized objects in our current allocator. The
advantage of that would be that we could for example use that for things
like ListCell easily (memory management of which I see quite often in
profiles).
I don't see how inveting palloc_fixed() solves any of the problems, and
I think it's going to be much more complicated than you think. The idea
of injecting this into AllocSet seems like a dead-end to me, as the code
is already complex enough and it's likely to cause regressions no matter
what you do.
I prefer the idea of implementing separate specialized memory contexts.
If the bar is moved to "implement palloc_fixed()" or something like
that, someone else will have to do that - I'm not all that interested in
ReorderBuffer (this was the first time I actually saw that code), so my
motivation to spend much more time on this is rather small.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10/19/2016 02:51 PM, Tomas Vondra wrote:
...
Yeah. There are three contexts in reorder buffers:
- changes (fixed size)
- txns (fixed size)
- tuples (variable size)The first two work perfectly fine with Slab.
The last one (tuples) is used to allocate variable-sized bits, so I've
tried to come up with something smart - a sequence of Slabs + overflow
AllocSet. I agree that in hindsight it's a bit strange, and that the
"generational" aspect is the key aspect here - i.e. it might be possible
to implement a memory context that allocates variable-length chunks but
still segregates them into generations. That is, don't build this on top
of Slab. That would also fix the issue with two allocators in GenSlab.
I'll think about this.
And here is a fairly complete prototype of this idea, adding "Gen"
generational memory context based only on the "similar lifespan"
assumption (and abandoning the fixed-size assumption). It's much simpler
than GenSlab (which it's supposed to replace), and abandoning the idea
of composing two memory contexts also fixed the warts with some of the
API methods (e.g. repalloc).
I've also been thinking that perhaps "Gen" would be useful for all three
contexts in ReorderBuffer - so I've done a quick test comparing the
various combinations (using the test1() function used before).
master slabs slabs+genslab slabs+gen gens
----------------------------------------------------------------
50k 18700 210 220 190 190
100k 160000 380 470 350 350
200k N/A 750 920 740 679
500k N/A 2250 2240 1790 1740
1000k N/A 4600 5000 3910 3700
Where:
* master - 23ed2ba812117
* slabs - all three contexts use Slab (patch 0001)
* slabs+genslab - third context is GenSlab (patch 0002)
* slabs+gen - third context is the new Gen (patch 0003)
* gens - all three contexts use Gen
The results are a bit noisy, but I think it's clear the new Gen context
performs well - it actually seems a bit faster than GenSlab, and using
only Gen for all three contexts does not hurt peformance.
This is most likely due to the trivial (practically absent) freespace
management in Gen context, compared to both Slab and GenSlab. So the
speed is not the only criteria - I haven't measured memory consumption,
but I'm pretty sure there are cases where Slab consumes much less memory
than Gen, thanks to reusing free space.
I'd say throwing away GenSlab and keeping Slab+Gen is the way to go.
There's still a fair bit of work on this, particularly implementing the
missing API methods in Gen - GenCheck() and GenStats(). As Robert
pointed out, there's also quite a bit of duplicated code between the
different memory contexts (randomization and valgrind-related), so this
needs to be moved to a shared place.
I'm also thinking that we need better names, particularly for the Gen
allocator. It's supposed to mean Generational, although there are no
explicit generations anymore. Slab is probably OK - it does not match
any of the existing kernel slab allocators exactly, but it follows the
same principles, which is what matters.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services