Auto-tune shared_buffers to use available huge pages

Started by Anthonin Bonnefoy5 months ago7 messageshackers

anthonin.bonnefoy@datadoghq.com

5 months ago

Hi,

Under a normal environment, the instance's number of huge pages can be
adjusted to the size reported by shared_memory_size_in_huge_pages,
then Postgres can be started and the requested shared memory fit in
the available huge pages.

A similar approach is harder to implement with environments like
kubernetes. If I want to modify the huge pages on a pod, I need to:
- Modify the host's huge pages
- Restart the host's kubelet so it detects the new amount of huge pages
- Modify the pod's huge page request

Most of those steps are far from practical. An alternative would be to
have a fixed number of huge pages (like 25% of the node's memory), and
to adjust the configuration, like the amount of shared_buffers.
However, adjusting the configuration to fit in a fixed amount of
memory is tricky:
- shared_buffers is used to auto-tune multiple parameters so there's
no easy formula to get the correct amount. The only way I've found is
to basically increase shared_buffers until
shared_memory_size_in_huge_pages matches the desired amount of huge
pages
- changing other parameters like max_connections mean shared_buffers
has to be adjusted again

To help with that, the attached patch provides a new option,
huge_pages_autotune_buffers, to automatically use leftover huge pages
as shared_buffers. This requires some changes in the auto-tune logic:
- Subsystems that are using shared_buffers for auto-tuning will rely
on the configured shared_buffers, not the auto-tuned shared_buffers
and they should save the auto-tuned value in a GUC. This will be done
in dedicated auto-tune functions.
- Once the auto-tune functions are called, modifying NBuffers won't
change the requested memory except for the shared buffer pool in
BufferManagerShmemSize
- We can get the leftover memory (free huge pages - requested memory),
and estimate how much shared_buffers we can add
- Increasing shared_buffers will also increase the freelist hashmap,
so the auto-tuned shared_buffers needs to be reduced

The patch is split in the following sub-patches:

0001: Extract the current auto-tune logic in dedicated functions,
making the behaviour more consistent across subsystems.

0002: The checkpointer auto-tunes the request size using NBuffers, but
doesn't save the result in a GUC. This adds a new
checkpoint_request_size GUC with the same auto-tune logic.

0003: Extract HugePages_Free value when /proc/meminfo is parsed in
GetHugePageSize.

0004: Pass NBuffers as parameters to StrategyShmemSize. This is
necessary to get how much memory will be used by the freelist using
'StrategyShmemSize(candidate_nbuffers) - StrategyShmemSize(NBuffers)'.

0005: Add BufferManagerAutotune to auto-tune the amount of shared_buffers.

Regards,
Anthonin Bonnefoy

Tom Lane

tgl@sss.pgh.pa.us

5 months ago

In reply to: Anthonin Bonnefoy (#1)

Re: Auto-tune shared_buffers to use available huge pages

Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com> writes:

To help with that, the attached patch provides a new option,
huge_pages_autotune_buffers, to automatically use leftover huge pages
as shared_buffers. This requires some changes in the auto-tune logic:

Not expressing an opinion on whether we should do this, but
there is a comment on GetHugePageSize() that you seem to have
falsified without bothering to correct:

* Doing the round-up ourselves also lets us make use of the extra memory,
* rather than just wasting it. Currently, we just increase the available
* space recorded in the shmem header, which will make the extra usable for
* purposes such as additional locktable entries. Someday, for very large
* hugepage sizes, we might want to think about more invasive strategies,
* such as increasing shared_buffers to absorb the extra space.

regards, tom lane

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

5 months ago

In reply to: Tom Lane (#2)

Re: Auto-tune shared_buffers to use available huge pages

On Fri, Jan 23, 2026 at 4:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not expressing an opinion on whether we should do this, but
there is a comment on GetHugePageSize() that you seem to have
falsified without bothering to correct:

Thanks for the review, I've updated the comment.

I've also revised the approach a bit. Using the free huge pages for
auto-tuning was too restrictive, it prevents setup like running
multiple Postgres on the same instance, which would require splitting
the available huge pages.

I've replaced it with a shared_buffers_autotune_target GUC, which
controls the amount of shared memory to target. If the requested
shared memory is below the target size, shared_buffers will be
increased to (approximately) reach this target.

By setting shared_buffers_autotune_target to the amount of available
huge pages, shared_buffers will be auto-tuned to use the leftover
space.

Regards,
Anthonin Bonnefoy

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

5 months ago

In reply to: Anthonin Bonnefoy (#3)

Re: Auto-tune shared_buffers to use available huge pages

On Mon, Jan 26, 2026 at 6:47 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Fri, Jan 23, 2026 at 4:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not expressing an opinion on whether we should do this, but
there is a comment on GetHugePageSize() that you seem to have
falsified without bothering to correct:

Thanks for the review, I've updated the comment.

I've also revised the approach a bit. Using the free huge pages for
auto-tuning was too restrictive, it prevents setup like running
multiple Postgres on the same instance, which would require splitting
the available huge pages.

I've replaced it with a shared_buffers_autotune_target GUC, which
controls the amount of shared memory to target. If the requested
shared memory is below the target size, shared_buffers will be
increased to (approximately) reach this target.

By setting shared_buffers_autotune_target to the amount of available
huge pages, shared_buffers will be auto-tuned to use the leftover
space.

The user could achieve the same effect by setting shared_buffers to
the same value as shared_buffers_autotune_target. What's the
difference?

Further they can use -C and -c options to postgres to decide which
setting of shared_buffers will consume the desired amount of available
memory. That is quite quick and easy.

I may be missing some usecase where such runtime autotuning is useful.

--
Best Wishes,
Ashutosh Bapat

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

5 months ago

In reply to: Ashutosh Bapat (#4)

Re: Auto-tune shared_buffers to use available huge pages

On Thu, Jan 29, 2026 at 12:43 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The user could achieve the same effect by setting shared_buffers to
the same value as shared_buffers_autotune_target. What's the
difference?

Further they can use -C and -c options to postgres to decide which
setting of shared_buffers will consume the desired amount of available
memory. That is quite quick and easy.

I may be missing some usecase where such runtime autotuning is useful.

To give a concrete example, I have a pod with 30GB of huge pages. If I
start postgres with "postgres -cshared_buffers=30GB -chuge_pages=on",
it fails with:

FATAL: could not map anonymous shared memory: Cannot allocate memory
HINT: This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory, swap space, or huge pages.
To reduce the request size (currently 32988200960 bytes), reduce
PostgreSQL's shared memory usage, perhaps by reducing "shared_buffers"
or "max_connections".

With all the additional memory requests, ~30.7GB is requested. I need
to lower the shared_buffers to 29.2GB to be able to start, but this is
imprecise and with this value, I'm off by ~60 huge pages in the final
allocation. The huge pages that were not part of the mmap are
completely wasted since it can't even be used by Linux for the page
cache.

So the idea behind the autotune logic is to be able to provide a
specific amount of shared memory to target and adjust shared_buffers
accordingly. The huge pages I have reserved for the pod are only
useful if they are used in the shared memory mmap, so the autotune
makes sure that nothing is wasted.

shared_buffers_autotune_target is probably confusing,
shared_memory_autotune_target might be a more fitting name.

Ashutosh Bapat

ashutosh.bapat@enterprisedb.com

5 months ago

In reply to: Anthonin Bonnefoy (#5)

Re: Auto-tune shared_buffers to use available huge pages

On Thu, Jan 29, 2026 at 7:41 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:

On Thu, Jan 29, 2026 at 12:43 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

The user could achieve the same effect by setting shared_buffers to
the same value as shared_buffers_autotune_target. What's the
difference?

Further they can use -C and -c options to postgres to decide which
setting of shared_buffers will consume the desired amount of available
memory. That is quite quick and easy.

I may be missing some usecase where such runtime autotuning is useful.

To give a concrete example, I have a pod with 30GB of huge pages. If I
start postgres with "postgres -cshared_buffers=30GB -chuge_pages=on",
it fails with:

FATAL: could not map anonymous shared memory: Cannot allocate memory
HINT: This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory, swap space, or huge pages.
To reduce the request size (currently 32988200960 bytes), reduce
PostgreSQL's shared memory usage, perhaps by reducing "shared_buffers"
or "max_connections".

With all the additional memory requests, ~30.7GB is requested. I need
to lower the shared_buffers to 29.2GB to be able to start, but this is
imprecise and with this value, I'm off by ~60 huge pages in the final
allocation. The huge pages that were not part of the mmap are
completely wasted since it can't even be used by Linux for the page
cache.

With -C you can quickly find out how much shared memory and huge pages
are being used without starting the server. E.g.
$ postgres -D $DataDir -c shared_buffers="128kB" -C "shared_memory_size"
9
$ postgres -D $DataDir -c shared_buffers="150MB" -C "shared_memory_size"
173
$ postgres -D $DataDir -c shared_buffers="150MB" -C
"shared_memory_size_in_huge_pages"
87

You could write a script to find the optimal value of shared_buffers,
that will consume memory optimally; auto-tuning it without starting
the server.

Isn't that sufficient?
--
Best Wishes,
Ashutosh Bapat

Anthonin Bonnefoy

anthonin.bonnefoy@datadoghq.com

5 months ago

In reply to: Ashutosh Bapat (#6)

Re: Auto-tune shared_buffers to use available huge pages

On Thu, Jan 29, 2026 at 3:46 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:

You could write a script to find the optimal value of shared_buffers,
that will consume memory optimally; auto-tuning it without starting
the server.

Isn't that sufficient?

That was my initial approach. I have such a script and built a mapping
between memory available and shared_buffers to request, one for each
PG version. But that breaks as soon as a user does something that
increases memory request, like increasing max_connections or adding an
extension.

Thus, I need to run this script before every PG starts to account for
those possible changes. With Kubernetes, this means:
- You use a PG Docker image, you need to clone it and override the
entry point to run the script before starting PG.
- The cluster may be managed by systems like patroni[1]https://github.com/patroni/patroni, so patroni
would need to be modified to do the auto-tuning.
- You may also use a Kubernetes operator to manage the cluster. AFAIK,
they don't provide a way to override how PG is started, the operators
would need to implement the auto-tuning logic.

If this auto-tune logic is directly implemented in PG, all kubernetes
users would be able to benefit from this, without having to hijack how
PG is started.

[1]: https://github.com/patroni/patroni

Auto-tune shared_buffers to use available huge pages

Attachments:

Attachments: