Changing types of block and chunk sizes in memory contexts
Hi hackers,
In memory contexts, block and chunk sizes are likely to be limited by
some upper bounds. Some examples of those bounds can be
MEMORYCHUNK_MAX_BLOCKOFFSET and MEMORYCHUNK_MAX_VALUE. Both values are
only 1 less than 1GB.
This makes memory contexts to have blocks/chunks with sizes less than
1GB. Such sizes can be stored in 32-bits. Currently, "Size" type,
which is 64-bit, is used, but 32-bit integers should be enough to
store any value less than 1GB.
Attached patch is an attempt to change the types of some fields to
uint32 from Size in aset, slab and generation memory contexts.
I tried to find most of the places that needed to be changed to
uint32, but I probably missed some. I can add more places if you feel
like it.
I would appreciate any feedback.
Thanks,
--
Melih Mutlu
Microsoft
Attachments:
0001-Change-memory-context-fields-to-uint32.patchapplication/octet-stream; name=0001-Change-memory-context-fields-to-uint32.patchDownload+28-29
In memory contexts, block and chunk sizes are likely to be limited by
some upper bounds. Some examples of those bounds can be
MEMORYCHUNK_MAX_BLOCKOFFSET and MEMORYCHUNK_MAX_VALUE. Both values are
only 1 less than 1GB.
This makes memory contexts to have blocks/chunks with sizes less than
1GB. Such sizes can be stored in 32-bits. Currently, "Size" type,
which is 64-bit, is used, but 32-bit integers should be enough to
store any value less than 1GB.
size_t (= Size) is the correct type in C to store the size of an object
in memory. This is partially a self-documentation issue: If I see
size_t in a function signature, I know what is intended; if I see
uint32, I have to wonder what the intent was.
You could make an argument that using shorter types would save space for
some internal structs, but then you'd have to show some more information
where and why that would be beneficial. (But again, self-documentation:
If one were to do that, I would argue for introducing a custom type like
pg_short_size_t.)
Absent any strong performance argument, I don't see the benefit of this
change. People might well want to experiment with MEMORYCHUNK_...
settings larger than 1GB.
On Wed, 28 Jun 2023 at 20:13, Peter Eisentraut <peter@eisentraut.org> wrote:
size_t (= Size) is the correct type in C to store the size of an object
in memory. This is partially a self-documentation issue: If I see
size_t in a function signature, I know what is intended; if I see
uint32, I have to wonder what the intent was.
Perhaps it's ok to leave the context creation functions with Size
typed parameters and then just Assert the passed-in sizes are not
larger than 1GB within the context creation function. That way we
could keep this change self contained in the .c file for the given
memory context. That would mean there's no less readability. If we
ever wanted to lift the 1GB limit on block sizes then we'd not need to
switch the function signature again. There's documentation where the
struct's field is declared, so having a smaller type in the struct
itself does not seem like a reduction of documentation quality.
You could make an argument that using shorter types would save space for
some internal structs, but then you'd have to show some more information
where and why that would be beneficial.
I think there's not much need to go proving this speeds something up.
There's just simply no point in the struct fields being changed in
Melih's patch to be bigger than 32 bits as we never need to store more
than 1GB in them. Reducing these down means we may have to touch
fewer cache lines and we'll also have more space on the keeper blocks
to store allocations. Memory allocation performance is fairly
fundamental to Postgres's performance. In my view, we shouldn't have
fields that are twice as large as they need to be in code as hot as
this.
Absent any strong performance argument, I don't see the benefit of this
change. People might well want to experiment with MEMORYCHUNK_...
settings larger than 1GB.
Anyone doing so will be editing C code anyway. They can adjust these
fields then.
David
David Rowley <dgrowleyml@gmail.com> writes:
Perhaps it's ok to leave the context creation functions with Size
typed parameters and then just Assert the passed-in sizes are not
larger than 1GB within the context creation function.
Yes, I'm strongly opposed to not using Size/size_t in the mmgr APIs.
If we go that road, we're going to have a problem when someone
inevitably wants to pass a larger-than-GB value for some context
type.
What happens in semi-private structs is a different matter, although
I'm a little dubious that shaving a couple of bytes from context
headers is a useful activity. The self-documentation argument
still has some force there, so I agree with Peter that some positive
benefit has to be shown.
regards, tom lane
On 6/28/23 12:59, Tom Lane wrote:
David Rowley <dgrowleyml@gmail.com> writes:
Perhaps it's ok to leave the context creation functions with Size
typed parameters and then just Assert the passed-in sizes are not
larger than 1GB within the context creation function.Yes, I'm strongly opposed to not using Size/size_t in the mmgr APIs.
If we go that road, we're going to have a problem when someone
inevitably wants to pass a larger-than-GB value for some context
type.
+1
What happens in semi-private structs is a different matter, although
I'm a little dubious that shaving a couple of bytes from context
headers is a useful activity. The self-documentation argument
still has some force there, so I agree with Peter that some positive
benefit has to be shown.
Yeah. FWIW I was interested what the patch does in practice, so I
checked what pahole says about impact on struct sizes:
AllocSetContext 224B -> 208B (4 cachelines)
GenerationContext 152B -> 136B (3 cachelines)
SlabContext 200B -> 200B (no change, adds 4B hole)
Nothing else changes, AFAICS. I find it hard to believe this could have
any sort of positive benefit - I doubt we ever have enough contexts for
this to matter.
When I first saw the patch I was thinking it's probably changing how we
store the per-chunk requested_size. Maybe that'd make a difference,
although 4B is tiny compared to what we waste due to the doubling.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
... 4B is tiny compared to what we waste due to the doubling.
Yeah. I've occasionally wondered if we should rethink aset.c's
"only power-of-2 chunk sizes" rule. Haven't had the bandwidth
to pursue the idea though.
regards, tom lane
Hi,
On 2023-06-28 23:26:00 +0200, Tomas Vondra wrote:
Yeah. FWIW I was interested what the patch does in practice, so I
checked what pahole says about impact on struct sizes:AllocSetContext 224B -> 208B (4 cachelines)
GenerationContext 152B -> 136B (3 cachelines)
SlabContext 200B -> 200B (no change, adds 4B hole)Nothing else changes, AFAICS. I find it hard to believe this could have
any sort of positive benefit - I doubt we ever have enough contexts for
this to matter.
I don't think it's that hard to believe. We create a lot of memory contexts
that we never or just barely use. Just reducing the number of cachelines
touched for that can't hurt. This does't quite get us to reducing the size to
a lower number of cachelines, but it's a good step.
There are a few other fields that we can get rid of.
- Afaics AllocSet->keeper is unnecessary these days, as it is always allocated
together with the context itself. Saves 8 bytes.
- The set of memory context types isn't runtime extensible. We could replace
MemoryContextData->methods with a small integer index into mcxt_methods. I
think that might actually end up being as-cheap or even cheaper than the
current approach. Saves 8 bytes.
Tthat's sufficient for going to 3 cachelines.
- We could store the power of 2 for initBlockSize, nextBlockSize,
maxBlockSize, instead of the "raw" value. That'd make them one byte
each. Which also would get rid of the concerns around needing a
"mini_size_t" type.
- freeListIndex could be a single byte as well (saving 7 bytes, as right now
we loose 4 trailing bytes due to padding).
That would save another 12 bytes, if I calculate correctly. 25% shrinkage
together ain't bad.
Greetings,
Andres Freund
Hi,
On 2023-06-28 17:56:55 -0400, Tom Lane wrote:
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
... 4B is tiny compared to what we waste due to the doubling.
Yeah. I've occasionally wondered if we should rethink aset.c's
"only power-of-2 chunk sizes" rule. Haven't had the bandwidth
to pursue the idea though.
Me too. It'd not be trivial to do without also incurring performance overhead.
A somewhat easier thing we could try is to carve the "rounding up" space into
smaller chunks, similar to what we do for full blocks. It wouldn't make sense
to do that for the smaller size classes, but above 64-256 bytes or such, I
think the wins might be big enough to outweight the costs?
Of course that doesn't guarantee that that memory in those smaller size
classes is going to be used...
Greetings,
Andres Freund
On Thu, 29 Jun 2023 at 09:26, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
AllocSetContext 224B -> 208B (4 cachelines)
GenerationContext 152B -> 136B (3 cachelines)
SlabContext 200B -> 200B (no change, adds 4B hole)Nothing else changes, AFAICS.
I don't think a lack of a reduction in the number of cache lines is
the important part. Allowing more space on the keeper block, which is
at the end of the context struct seems more useful. I understand that
the proposal is just to shave off 12 bytes and that's not exactly huge
when it's just once per context, but we do create quite a large number
of contexts with ALLOCSET_SMALL_SIZES which have a 1KB initial block
size. 12 bytes in 1024 is not terrible.
It's not exactly an invasive change. It does not add any complexity
to the code and as far as I can see, about zero risk of it slowing
anything down.
David
On 6/29/23 01:34, Andres Freund wrote:
Hi,
On 2023-06-28 23:26:00 +0200, Tomas Vondra wrote:
Yeah. FWIW I was interested what the patch does in practice, so I
checked what pahole says about impact on struct sizes:AllocSetContext 224B -> 208B (4 cachelines)
GenerationContext 152B -> 136B (3 cachelines)
SlabContext 200B -> 200B (no change, adds 4B hole)Nothing else changes, AFAICS. I find it hard to believe this could have
any sort of positive benefit - I doubt we ever have enough contexts for
this to matter.I don't think it's that hard to believe. We create a lot of memory contexts
that we never or just barely use. Just reducing the number of cachelines
touched for that can't hurt. This does't quite get us to reducing the size to
a lower number of cachelines, but it's a good step.There are a few other fields that we can get rid of.
- Afaics AllocSet->keeper is unnecessary these days, as it is always allocated
together with the context itself. Saves 8 bytes.- The set of memory context types isn't runtime extensible. We could replace
MemoryContextData->methods with a small integer index into mcxt_methods. I
think that might actually end up being as-cheap or even cheaper than the
current approach. Saves 8 bytes.Tthat's sufficient for going to 3 cachelines.
- We could store the power of 2 for initBlockSize, nextBlockSize,
maxBlockSize, instead of the "raw" value. That'd make them one byte
each. Which also would get rid of the concerns around needing a
"mini_size_t" type.- freeListIndex could be a single byte as well (saving 7 bytes, as right now
we loose 4 trailing bytes due to padding).That would save another 12 bytes, if I calculate correctly. 25% shrinkage
together ain't bad.
I don't oppose these changes, but I still don't quite believe it'll make
a measurable difference (even if we manage to save a cacheline or two).
I'd definitely like to see some measurements demonstrating it's worth
the extra complexity.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
On 2023-06-29 11:58:27 +0200, Tomas Vondra wrote:
On 6/29/23 01:34, Andres Freund wrote:
On 2023-06-28 23:26:00 +0200, Tomas Vondra wrote:
Yeah. FWIW I was interested what the patch does in practice, so I
checked what pahole says about impact on struct sizes:AllocSetContext 224B -> 208B (4 cachelines)
GenerationContext 152B -> 136B (3 cachelines)
SlabContext 200B -> 200B (no change, adds 4B hole)...
That would save another 12 bytes, if I calculate correctly. 25% shrinkage
together ain't bad.I don't oppose these changes, but I still don't quite believe it'll make
a measurable difference (even if we manage to save a cacheline or two).
I'd definitely like to see some measurements demonstrating it's worth
the extra complexity.
I hacked (emphasis on that) a version together that shrinks AllocSetContext
down to 176 bytes.
There seem to be some minor performance gains, and some not too shabby memory
savings.
E.g. a backend after running readonly pgbench goes from (results repeat
precisely across runs):
pgbench: Grand total: 1361528 bytes in 289 blocks; 367480 free (206 chunks); 994048 used
to:
pgbench: Grand total: 1339000 bytes in 278 blocks; 352352 free (188 chunks); 986648 used
Running a total over all connections in the main regression tests gives less
of a win (best of three):
backends grand blocks free chunks used
690 1046956664 111373 370680728 291436 676275936
to:
backends grand blocks free chunks used
690 1045226056 111099 372972120 297969 672253936
the latter is produced with this beauty:
ninja && m test --suite setup --no-rebuild && m test --no-rebuild --print-errorlogs regress/regress -v && grep "Grand total" testrun/regress/regress/log/postmaster.log|sed -E -e 's/.*Grand total: (.*) bytes in (.*) blocks; (.*) free \((.*) chunks\); (.*) used/\1\t\2\t\3\t\4\t\5/'|awk '{backends += 1; grand += $1; blocks += $2; free += $3; chunks += $4; used += $5} END{print backends, grand, blocks, free, chunks, used}'
There's more to get. The overhead of AllocSetBlock also plays into this. Both
due to the keeper block and obviously separate blocks getting allocated
subsequently. We e.g. don't need AllocBlockData->next,prev as 8 byte pointers
(some trickiness would be required for external blocks, but they could combine
both).
Greetings,
Andres Freund
Hi,
Thanks for your comments.
Tom Lane <tgl@sss.pgh.pa.us>, 28 Haz 2023 Çar, 13:59 tarihinde şunu yazdı:
David Rowley <dgrowleyml@gmail.com> writes:
Perhaps it's ok to leave the context creation functions with Size
typed parameters and then just Assert the passed-in sizes are not
larger than 1GB within the context creation function.Yes, I'm strongly opposed to not using Size/size_t in the mmgr APIs.
If we go that road, we're going to have a problem when someone
inevitably wants to pass a larger-than-GB value for some context
type.
I reverted changes in the context creation functions and only changed
the types in structs.
I believe there are already lines to assert whether the sizes are less
than 1GB, so we should be safe there.
Andres Freund <andres@anarazel.de>, 29 Haz 2023 Per, 02:34 tarihinde şunu yazdı:
There are a few other fields that we can get rid of.
- Afaics AllocSet->keeper is unnecessary these days, as it is always allocated
together with the context itself. Saves 8 bytes.
This seemed like a safe change and removed the keeper field in
AllocSet and Generation contexts. It saves an additional 8 bytes.
Best,
--
Melih Mutlu
Microsoft
Attachments:
v2-0001-Change-memory-context-fields-to-uint32.patchapplication/octet-stream; name=v2-0001-Change-memory-context-fields-to-uint32.patchDownload+62-48
On Tue, 11 Jul 2023 at 02:41, Melih Mutlu <m.melihmutlu@gmail.com> wrote:
- Afaics AllocSet->keeper is unnecessary these days, as it is always allocated
together with the context itself. Saves 8 bytes.This seemed like a safe change and removed the keeper field in
AllocSet and Generation contexts. It saves an additional 8 bytes.
Seems like a good idea for an additional 8-bytes.
I looked at your v2 patch. The only thing that really looked wrong
were the (Size) casts in the context creation functions. These should
have been casts to uint32 rather than Size. Basically, all the casts
do is say to the compiler "Yes, I know this could cause truncation due
to assigning to a size smaller than the source type's size". Some
compilers will likely warn without that and the cast will stop them.
We know there can't be any truncation due to the Asserts. There's also
the fundamental limitation that MemoryChunk can't store block offsets
larger than 1GBs anyway, so things will go bad if we tried to have
blocks bigger than 1GB.
Aside from that, I thought that a couple of other slab.c fields could
be shrunken to uint32 as the v2 patch just reduces the size of 1 field
which just creates a 4-byte hole in SlabContext. The fullChunkSize
field is just the MAXALIGN(chunkSize) + sizeof(MemoryChunk). We
should never be using slab contexts for any chunks anywhere near that
size. aset.c would be a better context for that, so it seems fine to
me to further restrict the maximum supported chunk size by another 8
bytes.
I've attached your patch again along with a small delta of what I adjusted.
My thoughts on these changes are that it's senseless to have Size
typed fields for storing a value that's never larger than 2^30.
Getting rid of the keeper pointer seems like a cleanup as it's pretty
much a redundant field. For small sized contexts like the ones used
for storing index relcache entries, I think it makes sense to save 20
more bytes. Each backend can have many thousand of those and there
could be many hundred backends. If we can fit more allocations on that
initial 1 kilobyte keeper block without having to allocate any
additional blocks, then that's great.
I feel that Andres's results showing several hundred fewer block
allocations shows this working. Albeit, his patch reduced the size of
the structs even further than what v3 does. I think v3 is enough for
now as the additional changes Andres mentioned require some more
invasive code changes to make work.
If nobody objects or has other ideas about this, modulo commit
message, I plan to push the attached on Monday.
David
Attachments:
v3-0001-Change-memory-context-fields-to-uint32.patchtext/plain; charset=US-ASCII; name=v3-0001-Change-memory-context-fields-to-uint32.patchDownload+62-48
v3-0002-fixup-Change-memory-context-fields-to-uint32.patchtext/plain; charset=US-ASCII; name=v3-0002-fixup-Change-memory-context-fields-to-uint32.patchDownload+37-36
Hi David,
David Rowley <dgrowleyml@gmail.com>, 13 Tem 2023 Per, 08:04 tarihinde şunu
yazdı:
I looked at your v2 patch. The only thing that really looked wrong
were the (Size) casts in the context creation functions. These should
have been casts to uint32 rather than Size. Basically, all the casts
do is say to the compiler "Yes, I know this could cause truncation due
to assigning to a size smaller than the source type's size". Some
compilers will likely warn without that and the cast will stop them.
We know there can't be any truncation due to the Asserts. There's also
the fundamental limitation that MemoryChunk can't store block offsets
larger than 1GBs anyway, so things will go bad if we tried to have
blocks bigger than 1GB.
Right! I don't know why I cast them to Size. Thanks for the fix.
Best,
--
Melih Mutlu
Microsoft
On Fri, 14 Jul 2023 at 18:53, Melih Mutlu <m.melihmutlu@gmail.com> wrote:
David Rowley <dgrowleyml@gmail.com>, 13 Tem 2023 Per, 08:04 tarihinde şunu yazdı:
I looked at your v2 patch. The only thing that really looked wrong
were the (Size) casts in the context creation functions. These should
have been casts to uint32 rather than Size. Basically, all the casts
do is say to the compiler "Yes, I know this could cause truncation due
to assigning to a size smaller than the source type's size". Some
compilers will likely warn without that and the cast will stop them.
We know there can't be any truncation due to the Asserts. There's also
the fundamental limitation that MemoryChunk can't store block offsets
larger than 1GBs anyway, so things will go bad if we tried to have
blocks bigger than 1GB.Right! I don't know why I cast them to Size. Thanks for the fix.
Pushed.
David