copy.c allocation constant
While reading copy.c I noticed this line:
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as
65535 so we palloc a power of 2?
I have no idea if this affects performance, but it did strike me as strange.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Nov 28, 2017 at 11:51:28AM -0500, Andrew Dunstan wrote:
While reading copy.c I noticed this line:
#define RAW_BUF_SIZE 65536��� ��� /* we palloc RAW_BUF_SIZE+1 bytes */
Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as
65535 so we palloc a power of 2?I have no idea if this affects performance, but it did strike me as strange.
Coming in late here, but it does seem very odd.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On 01/24/2018 04:14 AM, Bruce Momjian wrote:
On Tue, Nov 28, 2017 at 11:51:28AM -0500, Andrew Dunstan wrote:
While reading copy.c I noticed this line:
#define RAW_BUF_SIZE 65536 /* we palloc RAW_BUF_SIZE+1 bytes */
Doesn't that seem rather odd? If we're adding 1 wouldn't it be better as
65535 so we palloc a power of 2?I have no idea if this affects performance, but it did strike me
as strange.Coming in late here, but it does seem very odd.
I think there there are two things to consider - aset.c and glibc.
In AllocSet this is handled as over-sized chunk, that is chunk greater
than ALLOC_CHUNK_LIMIT (which ends up being 8kB). Which means it's
allocated as a separate block, and does not go through the freelists. So
the size does not follow the usual "power of 2" pattern and is only
maxalign-ed, and then freed immediately on pfree.
So for AllocSet we're fine, and this should not result in any memory
inefficiency, assuming there are not many such requests (which would
result in many malloc/free calls, which may be somewhat expensive).
At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
with similar ideas (freelists, ...) so hopefully it's fine too.
And then there are the systems without glibc, or with other libc
implementations. No idea about those.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 24, 2018 at 06:25:01PM +0100, Tomas Vondra wrote:
I think there there are two things to consider - aset.c and glibc.
In AllocSet this is handled as over-sized chunk, that is chunk greater
than ALLOC_CHUNK_LIMIT (which ends up being 8kB). Which means it's
allocated as a separate block, and does not go through the freelists. So
the size does not follow the usual "power of 2" pattern and is only
maxalign-ed, and then freed immediately on pfree.So for AllocSet we're fine, and this should not result in any memory
inefficiency, assuming there are not many such requests (which would
result in many malloc/free calls, which may be somewhat expensive).At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
with similar ideas (freelists, ...) so hopefully it's fine too.And then there are the systems without glibc, or with other libc
implementations. No idea about those.
Thanks for the analysis.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
with similar ideas (freelists, ...) so hopefully it's fine too.And then there are the systems without glibc, or with other libc
implementations. No idea about those.
My guess is that a fairly common pattern for larger chunks will be to
round the size up to a multiple of 4kB, the usual memory page size.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018-01-24 13:19:19 -0500, Robert Haas wrote:
On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
with similar ideas (freelists, ...) so hopefully it's fine too.And then there are the systems without glibc, or with other libc
implementations. No idea about those.My guess is that a fairly common pattern for larger chunks will be to
round the size up to a multiple of 4kB, the usual memory page size.
Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
platforms though. From man mallopt:
Balancing these factors leads to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
Additionally, even when malloc() chooses to use mmap() to back an
allocation, it'll still needs a header to know the size of the
allocation and such. So exactly using a size of a multiple of 4KB will
still leave you with wasted space. Due to the latter I can't see it
mattering whether or not we add +1 to a power-of-two size.
There's malloc_usable_size() returning the actual size of an
allocation. It'd probably be worthwhile to use that in a few of our own
allocator routines. If not available on a platform we can just have a
stub that returns the requested size...
Greetings,
Andres Freund
On Wed, Jan 24, 2018 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:
Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
platforms though. From man mallopt:
Balancing these factors leads to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
Additionally, even when malloc() chooses to use mmap() to back an
allocation, it'll still needs a header to know the size of the
allocation and such. So exactly using a size of a multiple of 4KB will
still leave you with wasted space. Due to the latter I can't see it
mattering whether or not we add +1 to a power-of-two size.
Well, it depends on how it works. dsa_allocate, for example, never
adds a header to the size of the allocation. Allocations < 8kB are
bucketed by size class and stored in superblocks carved up into
equal-sized chunks. Allocations > 8kB are rounded to a multiple of
the 4kB page size and we grab that many consecutive free pages. I
didn't make those behaviors up; I copied them from elsewhere. Some
other allocator I read about did small-medium-large allocations: large
with mmap(), medium with multiples of the page size, small with
closely-spaced size classes.
It doesn't seem like a particularly good idea to take a 64kB+1 byte
allocation, stick a header on it, and pack it tightly up against other
allocations on both sides. Seems like that could lead to
fragmentation problems. Is that really what it does?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2018-01-24 14:25:37 -0500, Robert Haas wrote:
On Wed, Jan 24, 2018 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:
Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
platforms though. From man mallopt:
Balancing these factors leads to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
Additionally, even when malloc() chooses to use mmap() to back an
allocation, it'll still needs a header to know the size of the
allocation and such. So exactly using a size of a multiple of 4KB will
still leave you with wasted space. Due to the latter I can't see it
mattering whether or not we add +1 to a power-of-two size.Well, it depends on how it works. dsa_allocate, for example, never
adds a header to the size of the allocation.
glibc's malloc does add a header. My half-informed suspicion is that
most newer malloc backing allocators will have a header, because
maintaining a shared lookup-by-address table is pretty expensive to
maintain. A bit of metadata indicating size and/or source of the
allocation makes using thread-local information a lot easier.
Allocations < 8kB are
bucketed by size class and stored in superblocks carved up into
equal-sized chunks. Allocations > 8kB are rounded to a multiple of
the 4kB page size and we grab that many consecutive free pages. I
didn't make those behaviors up; I copied them from elsewhere. Some
other allocator I read about did small-medium-large allocations: large
with mmap(), medium with multiples of the page size, small with
closely-spaced size classes.
Sure - all I'm trying to say that it likely won't matter whether we use
power-of-two or power-of-two + 1, because it seems likely that due to
overhead considerations we'll likely not quite fit into a size class
anyway.
It doesn't seem like a particularly good idea to take a 64kB+1 byte
allocation, stick a header on it, and pack it tightly up against other
allocations on both sides. Seems like that could lead to
fragmentation problems. Is that really what it does?
No, I'm fairly sure it's not.
Greetings,
Andres Freund
Andres Freund wrote:
On 2018-01-24 14:25:37 -0500, Robert Haas wrote:
On Wed, Jan 24, 2018 at 1:43 PM, Andres Freund <andres@anarazel.de> wrote:
Indeed. Don't think RAW_BUF_SIZE is quite big enough for that on most
platforms though. From man mallopt:
Balancing these factors leads to a default setting of 128*1024 for the M_MMAP_THRESHOLD parameter.
Additionally, even when malloc() chooses to use mmap() to back an
allocation, it'll still needs a header to know the size of the
allocation and such. So exactly using a size of a multiple of 4KB will
still leave you with wasted space. Due to the latter I can't see it
mattering whether or not we add +1 to a power-of-two size.Well, it depends on how it works. dsa_allocate, for example, never
adds a header to the size of the allocation.glibc's malloc does add a header. My half-informed suspicion is that
most newer malloc backing allocators will have a header, because
maintaining a shared lookup-by-address table is pretty expensive to
maintain. A bit of metadata indicating size and/or source of the
allocation makes using thread-local information a lot easier.
Sounds like it'd be smart to allocate something closer to
M_MMAP_THRESHOLD (which with typical values would be about double the
amount of memory the current RAW_BUF_SIZE value), minus a few dozen
bytes to allow for palloc's and malloc's respective headers. That way
we bet for a mmap()ed allocation with minimum space wastage across all
layers. Not sure whether we want to try to minimize wastage through
clever use of malloc_usable_size() on each backend's first allocation --
that'd probably just be overengineering.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi,
On 2018-01-24 17:07:01 -0300, Alvaro Herrera wrote:
Andres Freund wrote:
glibc's malloc does add a header. My half-informed suspicion is that
most newer malloc backing allocators will have a header, because
maintaining a shared lookup-by-address table is pretty expensive to
maintain. A bit of metadata indicating size and/or source of the
allocation makes using thread-local information a lot easier.Sounds like it'd be smart to allocate something closer to
M_MMAP_THRESHOLD (which with typical values would be about double the
amount of memory the current RAW_BUF_SIZE value), minus a few dozen
bytes to allow for palloc's and malloc's respective headers. That way
we bet for a mmap()ed allocation with minimum space wastage across all
layers.
In general there might be cases where that's worthwhile, although
M_MMAP_THRESHOLD IIRC isn't a constant anymore, complicating things. The
specific case this thread is discussing doesn't seem worthy of such
attention though, there's afaict no actual practical problem here.
Not sure whether we want to try to minimize wastage through
clever use of malloc_usable_size() on each backend's first allocation --
that'd probably just be overengineering.
Likely also dangerous, I don't think this is a constant.
Greetings,
Andres Freund
On Thu, Jan 25, 2018 at 7:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
with similar ideas (freelists, ...) so hopefully it's fine too.And then there are the systems without glibc, or with other libc
implementations. No idea about those.My guess is that a fairly common pattern for larger chunks will be to
round the size up to a multiple of 4kB, the usual memory page size.
See also this discussion:
/messages/by-id/CAEepm=1bRyd+_W9eW-QmP1RGP03ti48zgd=K11Q6o4edQLgkcg@mail.gmail.com
TL;DR glibc doesn't actually round up like that below 128kB, but many
others including FreeBSD, macOS etc round up to various page sizes or
size classes including 8kB (!), 512 bytes. I find this a bit
frustrating because it means that the most popular libc implementation
doesn't have the problem so this kind of thing probably isn't a high
priority, but probably on most other Unices (and I have no clue for
Windows) including my current favourite we waste a bunch of memory.
--
Thomas Munro
http://www.enterprisedb.com
On Thu, Jan 25, 2018 at 09:30:54AM +1300, Thomas Munro wrote:
On Thu, Jan 25, 2018 at 7:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Jan 24, 2018 at 12:25 PM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:At the glibc level ... I'm not so sure. AFAIK glibc uses an allocator
with similar ideas (freelists, ...) so hopefully it's fine too.And then there are the systems without glibc, or with other libc
implementations. No idea about those.My guess is that a fairly common pattern for larger chunks will be to
round the size up to a multiple of 4kB, the usual memory page size.See also this discussion:
/messages/by-id/CAEepm=1bRyd+_W9eW-QmP1RGP03ti48zgd=K11Q6o4edQLgkcg@mail.gmail.com
TL;DR glibc doesn't actually round up like that below 128kB, but many
others including FreeBSD, macOS etc round up to various page sizes or
size classes including 8kB (!), 512 bytes. I find this a bit
frustrating because it means that the most popular libc implementation
doesn't have the problem so this kind of thing probably isn't a high
priority, but probably on most other Unices (and I have no clue for
Windows) including my current favourite we waste a bunch of memory.
The BSD memory allocator used to allocate in powers of two, and keep the
header in a separate location. They did this so they could combine two
free, identically-sized memory blocks into a single one that was double
the size. I have no idea how it works now.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +
On Thu, Jan 25, 2018 at 9:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
The BSD memory allocator used to allocate in powers of two, and keep the
header in a separate location. They did this so they could combine two
free, identically-sized memory blocks into a single one that was double
the size. I have no idea how it works now.
According to "man malloc" on my FreeBSD 11.1 box (which uses jemalloc,
I think NetBSD and OpenBSD use something else), the code Andrew showed
at the top of this thread will waste ~16kB (because it'll round up to
80kB) and nodeHash.c will waste ~8kB for every ~32kB chunk of tuples
as described in that other thread.
+----------------------------------------------------------+
|Category Spacing Size |
|Small lg [8] |
| 16 [16, 32, 48, 64, 80, 96, 112, 128] |
| 32 [160, 192, 224, 256] |
| 64 [320, 384, 448, 512] |
| 128 [640, 768, 896, 1024] |
| 256 [1280, 1536, 1792, 2048] |
| 512 [2560, 3072, 3584, 4096] |
| 1 KiB [5 KiB, 6 KiB, 7 KiB, 8 KiB] |
| 2 KiB [10 KiB, 12 KiB, 14 KiB] |
|Large 2 KiB [16 KiB] |
| 4 KiB [20 KiB, 24 KiB, 28 KiB, 32 KiB] |
| 8 KiB [40 KiB, 48 KiB, 54 KiB, 64 KiB] |
| 16 KiB [80 KiB, 96 KiB, 112 KiB, 128 KiB] |
| 32 KiB [160 KiB, 192 KiB, 224 KiB, 256 KiB] |
| 64 KiB [320 KiB, 384 KiB, 448 KiB, 512 KiB] |
| 128 KiB [640 KiB, 768 KiB, 896 KiB, 1 MiB] |
| 256 KiB [1280 KiB, 1536 KiB, 1792 KiB] |
|Huge 256 KiB [2 MiB] |
| 512 KiB [2560 KiB, 3 MiB, 3584 KiB, 4 MiB] |
| 1 MiB [5 MiB, 6 MiB, 7 MiB, 8 MiB] |
| 2 MiB [10 MiB, 12 MiB, 14 MiB, 16 MiB] |
| 4 MiB [20 MiB, 24 MiB, 28 MiB, 32 MiB] |
| 8 MiB [40 MiB, 48 MiB, 56 MiB, 64 MiB] |
| ... ... |
| 512 PiB [2560 PiB, 3 EiB, 3584 PiB, 4 EiB] |
| 1 EiB [5 EiB, 6 EiB, 7 EiB] |
+----------------------------------------------------------+
--
Thomas Munro
http://www.enterprisedb.com
Bruce Momjian <bruce@momjian.us> writes:
On Thu, Jan 25, 2018 at 09:30:54AM +1300, Thomas Munro wrote:
On Thu, Jan 25, 2018 at 7:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:
My guess is that a fairly common pattern for larger chunks will be to
round the size up to a multiple of 4kB, the usual memory page size.See also this discussion:
/messages/by-id/CAEepm=1bRyd+_W9eW-QmP1RGP03ti48zgd=K11Q6o4edQLgkcg@mail.gmail.com
TL;DR glibc doesn't actually round up like that below 128kB, but many
others including FreeBSD, macOS etc round up to various page sizes or
size classes including 8kB (!), 512 bytes. I find this a bit
frustrating because it means that the most popular libc implementation
doesn't have the problem so this kind of thing probably isn't a high
priority, but probably on most other Unices (and I have no clue for
Windows) including my current favourite we waste a bunch of memory.
The BSD memory allocator used to allocate in powers of two, and keep the
header in a separate location. They did this so they could combine two
free, identically-sized memory blocks into a single one that was double
the size. I have no idea how it works now.
It seems like there's fairly good reason to suppose that most versions
of malloc are efficient for power-of-2 request sizes. Our big problem
is the overhead that we add ourselves. That, however, we could compensate
for by adjusting request sizes in cases where the request is flexible.
I'd definitely support some sort of memory context API to allow such
adjustment, as we speculated about in the thread Thomas points to above.
regards, tom lane