macOS prefetching support
It seems to me that we could implement prefetching support
(USE_PREFETCH) on macOS using the fcntl() command F_RDADVISE. The man
page description is a bit terse:
F_RDADVISE Issue an advisory read async with no copy to user.
But it seems to be the right idea. Was this looked into before? I
couldn't find anything in the archives.
Attached is a patch to implement this. It seems to work, but of course
it's kind of hard to tell whether it actually does anything useful.
(Even if the performance effects were negligible, this would be useful
to get the prefetch code some exercise on this platform.)
Thoughts?
Attachments:
0001-Add-prefetching-support-on-macOS.patchtext/plain; charset=UTF-8; name=0001-Add-prefetching-support-on-macOS.patchDownload+41-23
On Wed, Aug 14, 2024 at 7:04 PM Peter Eisentraut <peter@eisentraut.org> wrote:
Attached is a patch to implement this. It seems to work, but of course
it's kind of hard to tell whether it actually does anything useful.
Header order problem: pg_config_os.h defines __darwin__, but
pg_config_manual.h is included first, and tests __darwin__. I hacked
my way around that, and then made a table of 40,000,000 integers in a
2GB buffer pool. I used "select count(pg_buffercache_evict(buffered))
from pg_buffer_cache", and "sudo purge", to clear the two layers of
cache for each test, and then measured:
maintenance_io_concurrency=0, ANALYZE: 2311ms
maintenance_io_concurrency=10, ANALYZE: 652ms
maintenance_io_concurrency=25, ANALYZE: 389ms
It works!
On 14.08.24 14:36, Thomas Munro wrote:
On Wed, Aug 14, 2024 at 7:04 PM Peter Eisentraut <peter@eisentraut.org> wrote:
Attached is a patch to implement this. It seems to work, but of course
it's kind of hard to tell whether it actually does anything useful.Header order problem: pg_config_os.h defines __darwin__, but
pg_config_manual.h is included first, and tests __darwin__. I hacked
my way around that, and then made a table of 40,000,000 integers in a
2GB buffer pool. I used "select count(pg_buffercache_evict(buffered))
from pg_buffer_cache", and "sudo purge", to clear the two layers of
cache for each test, and then measured:maintenance_io_concurrency=0, ANALYZE: 2311ms
maintenance_io_concurrency=10, ANALYZE: 652ms
maintenance_io_concurrency=25, ANALYZE: 389msIt works!
Cool! I'll work on a more polished patch.
On 14.08.24 16:39, Peter Eisentraut wrote:
On 14.08.24 14:36, Thomas Munro wrote:
On Wed, Aug 14, 2024 at 7:04 PM Peter Eisentraut
<peter@eisentraut.org> wrote:Attached is a patch to implement this. It seems to work, but of course
it's kind of hard to tell whether it actually does anything useful.Header order problem: pg_config_os.h defines __darwin__, but
pg_config_manual.h is included first, and tests __darwin__. I hacked
my way around that, and then made a table of 40,000,000 integers in a
2GB buffer pool. I used "select count(pg_buffercache_evict(buffered))
from pg_buffer_cache", and "sudo purge", to clear the two layers of
cache for each test, and then measured:maintenance_io_concurrency=0, ANALYZE: 2311ms
maintenance_io_concurrency=10, ANALYZE: 652ms
maintenance_io_concurrency=25, ANALYZE: 389msIt works!
Cool! I'll work on a more polished patch.
Here it is.
Some interesting questions:
What to do about the order of the symbols and include files. I threw
something into src/include/port/darwin.h, but I'm not sure if that's
good. Alternatively, we could not use __darwin__ but instead the more
standard and predefined defined(__APPLE__) && defined(__MACH__).
How to document it. The current documentation makes references mainly
to the availability of posix_fadvise(). That seems quite low-level.
How could a user of a prepared package even find out about that? Should
we just say "requires OS support" (kind of like I did here) and you can
query the effective state by looking at the *_io_concurrency settings?
Or do we need a read-only parameter that shows whether prefetch support
exists (kind of along the lines of huge pages)?
Btw., for context, here is what I gather the prefetch support (with this
patch) is:
cygwin posix_fadvise
darwin fcntl
freebsd posix_fadvise
linux posix_fadvise
netbsd posix_fadvise
openbsd no
solaris fake
win32 no
(There is also the possibility that we provide an implementation of
posix_fadvise() for macOS that wraps the platform-specific code in this
patch. And then Apple could just take that. ;-) )
Attachments:
v2-0001-Add-prefetching-support-on-macOS.patchtext/plain; charset=UTF-8; name=v2-0001-Add-prefetching-support-on-macOS.patchDownload+57-37
On Sat, Aug 17, 2024 at 6:58 AM Peter Eisentraut <peter@eisentraut.org> wrote:
What to do about the order of the symbols and include files. I threw
something into src/include/port/darwin.h, but I'm not sure if that's
good. Alternatively, we could not use __darwin__ but instead the more
standard and predefined defined(__APPLE__) && defined(__MACH__).
Hmm. fd.h and fd.c test for F_NOCACHE, which is pretty closely
related. Now I'm wondering why we actually need this in
pg_config_manual.h at all. Who would turn it off at compile time, and
why would they not be satisfied with setting relevant GUCs to 0? Can
we just teach fd.h to define USE_PREFETCH if
defined(POSIX_FADV_WILLNEED) || defined(F_RDADVISE)?
(I have also thought multiple times about removing the configure
probes for F_FULLFSYNC, and just doing #ifdef. Oh, that's in my patch
for CF #4453.)
How to document it. The current documentation makes references mainly
to the availability of posix_fadvise(). That seems quite low-level.
How could a user of a prepared package even find out about that? Should
we just say "requires OS support" (kind of like I did here) and you can
query the effective state by looking at the *_io_concurrency settings?
Or do we need a read-only parameter that shows whether prefetch support
exists (kind of along the lines of huge pages)?
I think that's fine. I don't really like the word "prefetch", could
mean many different things. What about "requires OS support for
issuing read-ahead advice", which uses a word that appears in both of
those interfaces? And yeah the GUCs being nailed to zero means you
don't have it.
(There is also the possibility that we provide an implementation of
posix_fadvise() for macOS that wraps the platform-specific code in this
patch. And then Apple could just take that. ;-) )
Yeah might be simpler... as long as we make sure that we test for
having the function AND the relevant POSIX_FADV_xxx in places, which I
see that we do.
Thomas Munro <thomas.munro@gmail.com> writes:
Hmm. fd.h and fd.c test for F_NOCACHE, which is pretty closely
related. Now I'm wondering why we actually need this in
pg_config_manual.h at all. Who would turn it off at compile time, and
why would they not be satisfied with setting relevant GUCs to 0?
+1 for not bothering with a pg_config_manual.h knob.
regards, tom lane
On Sat, Aug 17, 2024 at 6:58 AM Peter Eisentraut <peter@eisentraut.org> wrote:
solaris fake
I'm half tempted to suggest that we take this exception out. If it's
there, we call it. It doesn't do anything right now, but it's a cheap
empty user space function, and I heard they are thinking about adding
a real syscall. (In a burst of enthusiasm for standards and stuff, I
talked to people involved in all the OSes using ZFS, to suggest
hooking that up; they added some other stuff I asked about so I think
it's a real threat.) But I haven't noticed any users on this list in
many years, to opine either way.
It doesn't do anything on Cygwin either. Actually, it's worse than
nothing, it looks like it makes two useless system calls adjusting the
"sequential" flag on the file handle. But really, of the 3 ways to
build on Windows, only MSVC has real users, so it makes no useless
system calls at all, so I don't propose to change anything due to this
observation.
win32 no
I guess you could implement this in two ways:
* CreateFileMapping(), MapViewOfFile(), PrefetchVirtualMemory(),
UnmapViewOfFile(), Closehandle(). That's a lot system calls and maybe
expensive VM stuff, who knows.
* ReadFileEx() in OVERLAPPED mode (async), into a dummy buffer, with a
completion callback that frees the buffer and OVERLAPPED object.
That's a lot of extra work allocating memory, copying to user space,
scheduling a callback, and freeing memory for nothing.
Both are terrible, but likely still better than an I/O stall, I dunno.
I think the VMS, NT, Unix-hater view would be: why would you want such
a stupid programming interface anyway, when you could use async I/O
properly? I love Unix, but they'd have a point.
On 17.08.24 00:01, Thomas Munro wrote:
On Sat, Aug 17, 2024 at 6:58 AM Peter Eisentraut <peter@eisentraut.org> wrote:
What to do about the order of the symbols and include files. I threw
something into src/include/port/darwin.h, but I'm not sure if that's
good. Alternatively, we could not use __darwin__ but instead the more
standard and predefined defined(__APPLE__) && defined(__MACH__).Hmm. fd.h and fd.c test for F_NOCACHE, which is pretty closely
related. Now I'm wondering why we actually need this in
pg_config_manual.h at all. Who would turn it off at compile time, and
why would they not be satisfied with setting relevant GUCs to 0? Can
we just teach fd.h to define USE_PREFETCH if
defined(POSIX_FADV_WILLNEED) || defined(F_RDADVISE)?
I thought USE_PREFETCH existed so that we don't have the run-time
overhead for all the bookkeeping code if we don't have any OS-level
prefetch support at the end. But it looks like most of that bookkeeping
code is skipped anyway if the *_io_concurrency settings are at 0. So
yes, getting rid of USE_PREFETCH globally would be useful.
(I have also thought multiple times about removing the configure
probes for F_FULLFSYNC, and just doing #ifdef. Oh, that's in my patch
for CF #4453.)
Understandable, but we should be careful here that we don't create
setups that can cause bugs like
</messages/by-id/48da4a1f-ccd9-4988-9622-24f37b1de2b4@eisentraut.org>.
I think that's fine. I don't really like the word "prefetch", could
mean many different things. What about "requires OS support for
issuing read-ahead advice", which uses a word that appears in both of
those interfaces?
I like that term.
On 18.08.24 15:35, Peter Eisentraut wrote:
On 17.08.24 00:01, Thomas Munro wrote:
Hmm. fd.h and fd.c test for F_NOCACHE, which is pretty closely
related. Now I'm wondering why we actually need this in
pg_config_manual.h at all. Who would turn it off at compile time, and
why would they not be satisfied with setting relevant GUCs to 0? Can
we just teach fd.h to define USE_PREFETCH if
defined(POSIX_FADV_WILLNEED) || defined(F_RDADVISE)?I thought USE_PREFETCH existed so that we don't have the run-time
overhead for all the bookkeeping code if we don't have any OS-level
prefetch support at the end. But it looks like most of that bookkeeping
code is skipped anyway if the *_io_concurrency settings are at 0. So
yes, getting rid of USE_PREFETCH globally would be useful.(I have also thought multiple times about removing the configure
probes for F_FULLFSYNC, and just doing #ifdef. Oh, that's in my patch
for CF #4453.)Understandable, but we should be careful here that we don't create
setups that can cause bugs like
</messages/by-id/48da4a1f-ccd9-4988-9622-24f37b1de2b4@eisentraut.org>.I think that's fine. I don't really like the word "prefetch", could
mean many different things. What about "requires OS support for
issuing read-ahead advice", which uses a word that appears in both of
those interfaces?I like that term.
Here is another patch, with the documentation wording adjusted like
this, and a bit of other tidying.
I opted against pursuing some of the other refactoring ideas as part of
this. Removing USE_PREFETCH seems worthwhile, but has some open
questions. For example, pg_prewarm has a prefetch mode, which currently
errors if there is no prefetch support. So we'd still require some
knowledge outside of fd.c, unless we want to change that behavior. The
idea of creating a src/port/posix_fadvise.c also got a bit too
complicated in terms of how to weave that into configure and meson.
There are apparently various old problems, like old Linux systems with
incompatible declarations, or something like that, and then the special
casing of Solaris (which isn't even in meson.build). If we could get
rid of some of that, then it would be easier to add new behavior there,
otherwise it's just likely to break things. So I'm leaving these as
separate projects.
In terms of $subject, this patch seems sufficient for now.
Attachments:
v3-0001-Add-prefetching-support-on-macOS.patchtext/plain; charset=UTF-8; name=v3-0001-Add-prefetching-support-on-macOS.patchDownload+56-36
On Sat, Aug 24, 2024 at 12:28 AM Peter Eisentraut <peter@eisentraut.org> wrote:
In terms of $subject, this patch seems sufficient for now.
WFM. I noticed you don't have an EINTR retry loop, but the man page
doesn't say you need one, so overall this patch LGTM.
+ * posix_fadvise() is the simplest standardized interface that accomplishes
+ * this. We could add an implementation using libaio in the future; but note
+ * that this API is inappropriate for libaio, which wants to have a buffer
+ * provided to read into.
I would consider just dropping that comment about libaio, if touching
this paragraph. Proposals exist for AIO of course, but not with
libaio, and better predictions with useful discussion of the topic
seem unlikely to fit in this margin.
On 26.08.24 07:54, Thomas Munro wrote:
On Sat, Aug 24, 2024 at 12:28 AM Peter Eisentraut <peter@eisentraut.org> wrote:
In terms of $subject, this patch seems sufficient for now.
WFM. I noticed you don't have an EINTR retry loop, but the man page
doesn't say you need one, so overall this patch LGTM.+ * posix_fadvise() is the simplest standardized interface that accomplishes + * this. We could add an implementation using libaio in the future; but note + * that this API is inappropriate for libaio, which wants to have a buffer + * provided to read into.I would consider just dropping that comment about libaio, if touching
this paragraph. Proposals exist for AIO of course, but not with
libaio, and better predictions with useful discussion of the topic
seem unlikely to fit in this margin.
committed with that change
Oh, I missed something: I think we're missing FileAcces(), in case the
vfd has to be re-opened, no?
On 29.08.24 03:22, Thomas Munro wrote:
Oh, I missed something: I think we're missing FileAcces(), in case the
vfd has to be re-opened, no?
Fixed, thanks.
On Mon, Aug 19, 2024 at 1:35 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 17.08.24 00:01, Thomas Munro wrote:
I think that's fine. I don't really like the word "prefetch", could
mean many different things. What about "requires OS support for
issuing read-ahead advice", which uses a word that appears in both of
those interfaces?I like that term.
A couple of other places still use the old specific terminology. PSA.
Attachments:
0001-Standardize-read-ahead-advice-terminology.patchtext/x-patch; charset=US-ASCII; name=0001-Standardize-read-ahead-advice-terminology.patchDownload+10-10
On 03.09.24 03:47, Thomas Munro wrote:
On Mon, Aug 19, 2024 at 1:35 AM Peter Eisentraut <peter@eisentraut.org> wrote:
On 17.08.24 00:01, Thomas Munro wrote:
I think that's fine. I don't really like the word "prefetch", could
mean many different things. What about "requires OS support for
issuing read-ahead advice", which uses a word that appears in both of
those interfaces?I like that term.
A couple of other places still use the old specific terminology. PSA.
looks good to me