Shared memory size computation oversight?

Started by Julien Rouhaudabout 5 years ago16 messageshackers
Jump to latest
#1Julien Rouhaud
rjuju123@gmail.com

Hi,

While investigating on some strange "out of shared memory" error reported on
the french BBS [1]https://forums.postgresql.fr/viewtopic.php?pid=32138#p32138 sorry, it's all in French., I noticed that 09adc9a8c09 (adding Robert in Cc) changed
ShmemAlloc alignment to CACHELINEALIGN but didn't update any related code that
depended on it.

Most of the core code isn't impacted as it doesn't have to reserve any shared
memory, but AFAICT pg_prewarm and pg_stat_statements can now slightly
underestimate the amount of shared memory they'll use, and similarly for any
third party extension that still rely on MAXALIGN alignment. As a consequence
those extension can consume a few hundred bytes more than they reserved, which
probably will be borrowed from the lock hashtables reserved size that isn't
alloced immediately. This can later lead to ShmemAlloc failing when trying to
acquire a lock while the hash table is almost full but should still have enough
room to store it, which could explain the error reported.

PFA a patch that fixes pg_prewarm and pg_stat_statements explicit alignment to
CACHELINEALIGN, and also update the alignment in hash_estimate_size() to what I
think ShmemInitHash will actually consume.

[1]: https://forums.postgresql.fr/viewtopic.php?pid=32138#p32138 sorry, it's all in French.
in French.

Attachments:

v1-0001-Fix-various-shared-memory-computation.patchtext/x-diff; charset=us-asciiDownload+6-7
#2Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Julien Rouhaud (#1)
Re: Shared memory size computation oversight?

Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, February 27, 2021 9:08 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

Hi,

While investigating on some strange "out of shared memory" error reported on
the french BBS [1], I noticed that 09adc9a8c09 (adding Robert in Cc) changed
ShmemAlloc alignment to CACHELINEALIGN but didn't update any related code that
depended on it.

Nice catch.

Most of the core code isn't impacted as it doesn't have to reserve any shared
memory, but AFAICT pg_prewarm and pg_stat_statements can now slightly
underestimate the amount of shared memory they'll use, and similarly for any
third party extension that still rely on MAXALIGN alignment. As a consequence
those extension can consume a few hundred bytes more than they reserved, which
probably will be borrowed from the lock hashtables reserved size that isn't
alloced immediately. This can later lead to ShmemAlloc failing when trying to
acquire a lock while the hash table is almost full but should still have enough
room to store it, which could explain the error reported.

PFA a patch that fixes pg_prewarm and pg_stat_statements explicit alignment to
CACHELINEALIGN, and also update the alignment in hash_estimate_size() to what I
think ShmemInitHash will actually consume.

Please excuse me as I speak most from the side of ignorance. I am slightly curious
to understand something in your patch, if you can be kind enough to explain it to
me.

The commit 09adc9a8c09 you pointed out to, as far as I can see, changed the total
size of the shared memory, not individual bits. It did explain that the users of
that space had properly padded their data, but even assuming that they did not do
that as asked, the result would (or rather should) be cache misses, not running out
of reserved memory.

My limited understanding is also based in a comment in CreateSharedMemoryAndSemaphores()

/*
* Size of the Postgres shared-memory block is estimated via
* moderately-accurate estimates for the big hogs, plus 100K for the
* stuff that's too small to bother with estimating.
*
* We take some care during this phase to ensure that the total size
* request doesn't overflow size_t. If this gets through, we don't
* need to be so careful during the actual allocation phase.
*/
size = 100000;

Of course, my argument falls bare, if the size estimates of each of the elements is
rather underestimated. Indeed this is the argument in the present patch expressed in
code in hash_estimate_size most prominently, here:

        size = add_size(size,
-                                       mul_size(nElementAllocs,
-                                                        mul_size(elementAllocCnt, elementSize)));
+                                       CACHELINEALIGN(Nmul_size(nElementAllocs,
+                                                                  mul_size(elementAllocCnt, elementSize))));

(small note, Nmul_size is a typo of mul_size, but that is minor, by amending it the
code compiles).

To conclude, the running out of shared memory, seems to me to be fixed rather
vaguely with this patch. I am not claiming that increasing the memory used by
the elements is not needed, I am claiming that I can not clearly see how is that
specific increase needed.

Thank you for your patience,
//Georgios -- https://www.vmware.com

Show quoted text

[1] https://forums.postgresql.fr/viewtopic.php?pid=32138#p32138 sorry, it's all
in French.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Georgios Kokolatos (#2)
Re: Shared memory size computation oversight?

Georgios <gkokolatos@protonmail.com> writes:

My limited understanding is also based in a comment in CreateSharedMemoryAndSemaphores()

* Size of the Postgres shared-memory block is estimated via
* moderately-accurate estimates for the big hogs, plus 100K for the
* stuff that's too small to bother with estimating.

Right. That 100K slop factor is capable of hiding a multitude of sins.

I have not looked at this patch, but I think the concern is basically that
if we have space-estimation infrastructure that misestimates what it is
supposed to estimate, then if that infrastructure is used to estimate the
size of any of the "big hog" data structures, we might misestimate by
enough that the slop factor wouldn't hide it.

regards, tom lane

#4Julien Rouhaud
rjuju123@gmail.com
In reply to: Georgios Kokolatos (#2)
Re: Shared memory size computation oversight?

On Wed, Mar 03, 2021 at 03:37:20PM +0000, Georgios wrote:

Please excuse me as I speak most from the side of ignorance. I am slightly curious
to understand something in your patch, if you can be kind enough to explain it to
me.

The commit 09adc9a8c09 you pointed out to, as far as I can see, changed the total
size of the shared memory, not individual bits. It did explain that the users of
that space had properly padded their data, but even assuming that they did not do
that as asked, the result would (or rather should) be cache misses, not running out
of reserved memory.

It actually changed the allocation of individual bits inside the fixed size
shared memory, as underlying users ends up calling shmemalloc(),
but did not accurately changed the actual total size of shared memory as many
estimates are done either still using MAXALIGN or simply not accounting for the
padding.

My limited understanding is also based in a comment in CreateSharedMemoryAndSemaphores()

/*
* Size of the Postgres shared-memory block is estimated via
* moderately-accurate estimates for the big hogs, plus 100K for the
* stuff that's too small to bother with estimating.
*
* We take some care during this phase to ensure that the total size
* request doesn't overflow size_t. If this gets through, we don't
* need to be so careful during the actual allocation phase.
*/
size = 100000;

Of course, my argument falls bare, if the size estimates of each of the elements is
rather underestimated. Indeed this is the argument in the present patch expressed in
code in hash_estimate_size most prominently, here:

size = add_size(size,
-                                       mul_size(nElementAllocs,
-                                                        mul_size(elementAllocCnt, elementSize)));
+                                       CACHELINEALIGN(Nmul_size(nElementAllocs,
+                                                                  mul_size(elementAllocCnt, elementSize))));

(small note, Nmul_size is a typo of mul_size, but that is minor, by amending it the
code compiles).

Oops, I'll fix that.

To conclude, the running out of shared memory, seems to me to be fixed rather
vaguely with this patch. I am not claiming that increasing the memory used by
the elements is not needed, I am claiming that I can not clearly see how is that
specific increase needed.

I'm also not entirely convinced that those fixes are enough to explain the "out
of shared memory" issue originally reported, but that's the only class of
problem that I could spot which could possibly explain it.

As Robert initially checked, it should be at worse a 6kB understimate with
default parameters (it may be slightly more now as we have more shared memory
users, but it should be the same order of magnitude), and I don't think that
there it will vary a lot with huge shared_buffers and/or max_connections.

Those 6kB should be compared to how much room is giving the initial 100k vs how
much the small stuff is actually computing.

Note that there are still some similar issue in the code. A quick look at
ProcGlobalShmemSize() vs InitProcGlobal() seems to indicate that it's missing 5
CACHELINEALIGN in the estimate. Unless someone object, I'll soon do a full
review of all the estimates in CreateSharedMemoryAndSemaphores() and fix all
additional occurences that I can find.

#5Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#3)
Re: Shared memory size computation oversight?

On Wed, Mar 03, 2021 at 11:23:54AM -0500, Tom Lane wrote:

Georgios <gkokolatos@protonmail.com> writes:

My limited understanding is also based in a comment in CreateSharedMemoryAndSemaphores()

* Size of the Postgres shared-memory block is estimated via
* moderately-accurate estimates for the big hogs, plus 100K for the
* stuff that's too small to bother with estimating.

Right. That 100K slop factor is capable of hiding a multitude of sins.

I have not looked at this patch, but I think the concern is basically that
if we have space-estimation infrastructure that misestimates what it is
supposed to estimate, then if that infrastructure is used to estimate the
size of any of the "big hog" data structures, we might misestimate by
enough that the slop factor wouldn't hide it.

Exactly. And now that I looked deeper I can see that multiple estimates are
entirely ignoring the padding alignment (e.g. ProcGlobalShmemSize()), which can
exceed the 6kB originally estimated by Robert.

#6Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#5)
Re: Shared memory size computation oversight?

On Thu, Mar 04, 2021 at 12:40:52AM +0800, Julien Rouhaud wrote:

On Wed, Mar 03, 2021 at 11:23:54AM -0500, Tom Lane wrote:

I have not looked at this patch, but I think the concern is basically that
if we have space-estimation infrastructure that misestimates what it is
supposed to estimate, then if that infrastructure is used to estimate the
size of any of the "big hog" data structures, we might misestimate by
enough that the slop factor wouldn't hide it.

Exactly. And now that I looked deeper I can see that multiple estimates are
entirely ignoring the padding alignment (e.g. ProcGlobalShmemSize()), which can
exceed the 6kB originally estimated by Robert.

We are going to have a hard time catching up all the places that are
doing an incorrect estimation, and have an even harder time making
sure that similar errors don't happen in the future. Should we add a
{add,mul}_shmem_size() to make sure that the size calculated is
correctly aligned, that uses CACHELINEALIGN before returning the
result size?
--
Michael

#7Julien Rouhaud
rjuju123@gmail.com
In reply to: Michael Paquier (#6)
Re: Shared memory size computation oversight?

On Thu, Mar 04, 2021 at 04:05:10PM +0900, Michael Paquier wrote:

On Thu, Mar 04, 2021 at 12:40:52AM +0800, Julien Rouhaud wrote:

On Wed, Mar 03, 2021 at 11:23:54AM -0500, Tom Lane wrote:

I have not looked at this patch, but I think the concern is basically that
if we have space-estimation infrastructure that misestimates what it is
supposed to estimate, then if that infrastructure is used to estimate the
size of any of the "big hog" data structures, we might misestimate by
enough that the slop factor wouldn't hide it.

Exactly. And now that I looked deeper I can see that multiple estimates are
entirely ignoring the padding alignment (e.g. ProcGlobalShmemSize()), which can
exceed the 6kB originally estimated by Robert.

We are going to have a hard time catching up all the places that are
doing an incorrect estimation, and have an even harder time making
sure that similar errors don't happen in the future. Should we add a
{add,mul}_shmem_size() to make sure that the size calculated is
correctly aligned, that uses CACHELINEALIGN before returning the
result size?

I was also considering adding new (add|mull)_*_size functions to avoid having
too messy code. I'm not terribly happy with xxx_shm_size(), as not all call to
those functions would require an alignment. Maybe (add|mull)shmemalign_size?

But before modifying dozens of calls, should we really fix those or only
increase a bit the "slop factor", or a mix of it?

For instance, I can see that for instance BackendStatusShmemSize() never had
any padding consideration, while others do.

Maybe only fixing contribs, some macro like PredXactListDataSize that already
do a MAXALIGN, SimpleLruShmemSize and hash_estimate_size() would be a short
patch and should significantly improve the estimation.

#8Michael Paquier
michael@paquier.xyz
In reply to: Julien Rouhaud (#7)
Re: Shared memory size computation oversight?

On Thu, Mar 04, 2021 at 03:23:33PM +0800, Julien Rouhaud wrote:

I was also considering adding new (add|mull)_*_size functions to avoid having
too messy code. I'm not terribly happy with xxx_shm_size(), as not all call to
those functions would require an alignment. Maybe (add|mull)shmemalign_size?

But before modifying dozens of calls, should we really fix those or only
increase a bit the "slop factor", or a mix of it?

For instance, I can see that for instance BackendStatusShmemSize() never had
any padding consideration, while others do.

Maybe only fixing contribs, some macro like PredXactListDataSize that already
do a MAXALIGN, SimpleLruShmemSize and hash_estimate_size() would be a short
patch and should significantly improve the estimation.

The lack of complaints in this area looks to me like a sign that we
may not really need to backpatch something, so I would not be against
a precise chirurgy, with a separate set of {add,mul}_size() routines
that are used where adapted, so as it is easy to track down which size
estimations expect an extra padding. I would be curious to hear more
thoughts from others here.

Saying that, calling a new routine something like add_shmem_align_size
makes it clear what's its purpose.
--
Michael

#9Georgios Kokolatos
gkokolatos@protonmail.com
In reply to: Michael Paquier (#8)
Re: Shared memory size computation oversight?

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, March 4, 2021 9:21 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 04, 2021 at 03:23:33PM +0800, Julien Rouhaud wrote:

I was also considering adding new (add|mull)_*_size functions to avoid having
too messy code. I'm not terribly happy with xxx_shm_size(), as not all call to
those functions would require an alignment. Maybe (add|mull)shmemalign_size?
But before modifying dozens of calls, should we really fix those or only
increase a bit the "slop factor", or a mix of it?
For instance, I can see that for instance BackendStatusShmemSize() never had
any padding consideration, while others do.
Maybe only fixing contribs, some macro like PredXactListDataSize that already
do a MAXALIGN, SimpleLruShmemSize and hash_estimate_size() would be a short
patch and should significantly improve the estimation.

The lack of complaints in this area looks to me like a sign that we
may not really need to backpatch something, so I would not be against
a precise chirurgy, with a separate set of {add,mul}_size() routines
that are used where adapted, so as it is easy to track down which size
estimations expect an extra padding. I would be curious to hear more
thoughts from others here.

Saying that, calling a new routine something like add_shmem_align_size
makes it clear what's its purpose.

My limited opinion on the matter after spending some time yesterday through
the related code, is that with the current api it is easy to miss something
and underestimate or just be sloppy. If only from the readability point of
view, adding the proposed add_shmem_align_size will be beneficial.

I hold no opinion on backpatching.

//Georgios

Show quoted text

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Michael

#10Julien Rouhaud
rjuju123@gmail.com
In reply to: Georgios Kokolatos (#9)
Re: Shared memory size computation oversight?

On Thu, Mar 04, 2021 at 08:43:51AM +0000, Georgios wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Thursday, March 4, 2021 9:21 AM, Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Mar 04, 2021 at 03:23:33PM +0800, Julien Rouhaud wrote:

I was also considering adding new (add|mull)_*_size functions to avoid having
too messy code. I'm not terribly happy with xxx_shm_size(), as not all call to
those functions would require an alignment. Maybe (add|mull)shmemalign_size?
But before modifying dozens of calls, should we really fix those or only
increase a bit the "slop factor", or a mix of it?
For instance, I can see that for instance BackendStatusShmemSize() never had
any padding consideration, while others do.
Maybe only fixing contribs, some macro like PredXactListDataSize that already
do a MAXALIGN, SimpleLruShmemSize and hash_estimate_size() would be a short
patch and should significantly improve the estimation.

The lack of complaints in this area looks to me like a sign that we
may not really need to backpatch something, so I would not be against
a precise chirurgy, with a separate set of {add,mul}_size() routines
that are used where adapted, so as it is easy to track down which size
estimations expect an extra padding. I would be curious to hear more
thoughts from others here.

Saying that, calling a new routine something like add_shmem_align_size
makes it clear what's its purpose.

My limited opinion on the matter after spending some time yesterday through
the related code, is that with the current api it is easy to miss something
and underestimate or just be sloppy. If only from the readability point of
view, adding the proposed add_shmem_align_size will be beneficial.

I hold no opinion on backpatching.

So here's a v2 patch which hopefully account for all unaccounted alignment
padding. There's also a significant change in the shared hash table size
estimation. AFAICT, allocation will be done this way:

- num_freelists allocs of init_size / num_freelists entries (on average) for
partitioned htab, num_freelist being 1 for non partitioned table,
NUM_FREELISTS (32) otherwise.

- then the rest of the entries, if any, are alloced in groups of
choose_nelem_alloc() entries

With this patch applied, I see an extra 16KB of shared memory being requested.

Attachments:

v2-0001-Fix-various-shared-memory-estimates.patchtext/x-diff; charset=us-asciiDownload+146-88
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Georgios Kokolatos (#9)
Re: Shared memory size computation oversight?

I spent a bit of time looking into this. Using current HEAD, I
instrumented CreateSharedMemoryAndSemaphores to log the size estimates
returned by the various estimation subroutines, plus the shared memory
space actually consumed (i.e, the change in ShmemSegHdr->freeoffset)
by the various shared memory initialization functions. There were only
two estimates that were way off: LockShmemSize requested 1651771 more
bytes than InitLocks actually consumed, and PredicateLockShmemSize
requested 167058 more bytes than InitPredicateLocks consumed. I believe
both of those are intentional, reflecting space that may be eaten by the
lock tables later.

Meanwhile, looking at ShmemSegHdr->freeoffset vs ShmemSegHdr->totalsize,
the actual remaining shmem space after postmaster startup is 1919488
bytes. (Running the core regression tests doesn't consume any of that
remaining space, btw.) Subtracting the extra lock-table space, we have
100659 bytes to spare, which is as near as makes no difference to the
intended slop of 100000 bytes.

My conclusion is that we don't need to do anything, indeed the proposed
changes will probably just lead to overestimation.

It's certainly possible that there's something amiss somewhere. These
numbers were all taken with out-of-the-box configuration, so it could be
that changing some postgresql.conf entries would expose that some module
is not scaling its request correctly. Also, I don't have any extensions
loaded, so this proves nothing about the correctness of any of those.
But it appears to me that the general scheme is working perfectly fine,
so we do not need to complicate it.

regards, tom lane

Attachments:

shmem-consumption.txttext/plain; charset=us-ascii; name=shmem-consumption.txtDownload
#12Julien Rouhaud
rjuju123@gmail.com
In reply to: Tom Lane (#11)
Re: Shared memory size computation oversight?

On Thu, Mar 04, 2021 at 12:53:54PM -0500, Tom Lane wrote:

My conclusion is that we don't need to do anything, indeed the proposed
changes will probably just lead to overestimation.

It's certainly possible that there's something amiss somewhere. These
numbers were all taken with out-of-the-box configuration, so it could be
that changing some postgresql.conf entries would expose that some module
is not scaling its request correctly. Also, I don't have any extensions
loaded, so this proves nothing about the correctness of any of those.
But it appears to me that the general scheme is working perfectly fine,
so we do not need to complicate it.

Thanks for looking at it. I agree that most of the changes aren't worth the
complication and risk to overestimate the shmem size.

But I did some more experiments comparing the current version and the patched
version of the lock and pred lock shared hash table size estimations with some
configuration changes, and I find the following delta in the estimated size:

- original configuration : 15kB
- max_connection = 1000 : 30kB
- max_connection = 1000 and max_locks_per_xact = 256 : 96kB

I don't know if my version is totally wrong or not, but if it's not it could be
worthwhile to apply the hash tab part as it would mean that it doesn't scale
properly.

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Julien Rouhaud (#1)
Re: Shared memory size computation oversight?

On 27.02.21 09:08, Julien Rouhaud wrote:

PFA a patch that fixes pg_prewarm and pg_stat_statements explicit alignment to
CACHELINEALIGN, and also update the alignment in hash_estimate_size() to what I
think ShmemInitHash will actually consume.

For extensions, wouldn't it make things better if
RequestAddinShmemSpace() applied CACHELINEALIGN() to its argument?

If you consider the typical sequence of RequestAddinShmemSpace(mysize())
and later ShmemInitStruct(..., mysize(), ...), then the size gets
rounded up to cache-line size in the second case and not the first. The
premise in your patch is that the extension should figure that out
before calling RequestAddinShmemSpace(), but that seems wrong.

Btw., I think your patch was wrong to apply CACHELINEALIGN() to
intermediate results rather than at the end.

#14Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#13)
Re: Shared memory size computation oversight?

On Thu, Aug 12, 2021 at 9:34 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.02.21 09:08, Julien Rouhaud wrote:

PFA a patch that fixes pg_prewarm and pg_stat_statements explicit alignment to
CACHELINEALIGN, and also update the alignment in hash_estimate_size() to what I
think ShmemInitHash will actually consume.

For extensions, wouldn't it make things better if
RequestAddinShmemSpace() applied CACHELINEALIGN() to its argument?

If you consider the typical sequence of RequestAddinShmemSpace(mysize())
and later ShmemInitStruct(..., mysize(), ...),

But changing RequestAddinShmemSpace() to apply CACHELINEALIGN() would
only really work for that specific usage only? If an extension does
multiple allocations you can't rely on correct results.

then the size gets
rounded up to cache-line size in the second case and not the first. The
premise in your patch is that the extension should figure that out
before calling RequestAddinShmemSpace(), but that seems wrong.

Btw., I think your patch was wrong to apply CACHELINEALIGN() to
intermediate results rather than at the end.

I'm not sure why you think it's wrong. It's ShmemInitStruct() that
will apply the padding, so if the extension calls it multiple times
(whether directly or indirectly), then the padding will be added
multiple times. Which means that in theory the extension should
account for it multiple time in the amount of memory it's requesting.

But given the later discussion, ISTM that there's an agreement that
any number of CACHELINEALIGN() overhead for any number of allocation
can be ignored as it should be safely absorbed by the 100kB extra
space. I think that the real problem, as mentioned in my last email,
is that the shared htab size estimation doesn't really scale and can
easily eat the whole extra space if you use some not that unrealistic
parameters. I still don't know if I made any mistake trying to
properly account for the htab allocation, but if I didn't it can be
problematic.

#15Peter Eisentraut
peter_e@gmx.net
In reply to: Julien Rouhaud (#14)
Re: Shared memory size computation oversight?

On 12.08.21 16:18, Julien Rouhaud wrote:

On Thu, Aug 12, 2021 at 9:34 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:

On 27.02.21 09:08, Julien Rouhaud wrote:

PFA a patch that fixes pg_prewarm and pg_stat_statements explicit alignment to
CACHELINEALIGN, and also update the alignment in hash_estimate_size() to what I
think ShmemInitHash will actually consume.

For extensions, wouldn't it make things better if
RequestAddinShmemSpace() applied CACHELINEALIGN() to its argument?

If you consider the typical sequence of RequestAddinShmemSpace(mysize())
and later ShmemInitStruct(..., mysize(), ...),

But changing RequestAddinShmemSpace() to apply CACHELINEALIGN() would
only really work for that specific usage only? If an extension does
multiple allocations you can't rely on correct results.

I think you can do different things here to create inconsistent results,
but I think there should be one common, standard, normal,
straightforward way to get a correct result.

Btw., I think your patch was wrong to apply CACHELINEALIGN() to
intermediate results rather than at the end.

I'm not sure why you think it's wrong. It's ShmemInitStruct() that
will apply the padding, so if the extension calls it multiple times
(whether directly or indirectly), then the padding will be added
multiple times. Which means that in theory the extension should
account for it multiple time in the amount of memory it's requesting.

Yeah, in that case it's probably rather the case that there is one
CACHELINEALIGN() too few, since pg_stat_statements does two separate
shmem allocations.

#16Julien Rouhaud
rjuju123@gmail.com
In reply to: Peter Eisentraut (#15)
Re: Shared memory size computation oversight?

On Fri, Aug 13, 2021 at 10:52:50AM +0200, Peter Eisentraut wrote:

On 12.08.21 16:18, Julien Rouhaud wrote:

But changing RequestAddinShmemSpace() to apply CACHELINEALIGN() would
only really work for that specific usage only? If an extension does
multiple allocations you can't rely on correct results.

I think you can do different things here to create inconsistent results, but
I think there should be one common, standard, normal, straightforward way to
get a correct result.

Unless I'm missing something, the standard and straightforward way to get a
correct result is to account for padding bytes in the C code, as it's currently
done in all contrib modules. The issue is that those modules aren't using the
correct alignment anymore.

Btw., I think your patch was wrong to apply CACHELINEALIGN() to
intermediate results rather than at the end.

I'm not sure why you think it's wrong. It's ShmemInitStruct() that
will apply the padding, so if the extension calls it multiple times
(whether directly or indirectly), then the padding will be added
multiple times. Which means that in theory the extension should
account for it multiple time in the amount of memory it's requesting.

Yeah, in that case it's probably rather the case that there is one
CACHELINEALIGN() too few, since pg_stat_statements does two separate shmem
allocations.

I still don't get it. Aligning the total shmem size is totally different from
properly padding all single allocation sizes, and is almost never the right
answer.

Using a naive example, if your extension needs to ShmemInitStruct() twice 64B,
postgres will consume 2*128B. But if you only rely on RequestAddinShmemSpace()
to add a CACHELINEALIGN(), then no padding at all will be added, and you'll
then be not one but two CACHELINEALIGN() too few.

But again, the real issue is not the CACHELINEALIGN() roundings, as those have
a more or less stable size and are already accounted for in the extra 100kB,
but the dynahash size estimation which seems to be increasingly off as the
number of entries grows.