Windows now has fdatasync()
Hi,
While porting some new IO code to lots of OSes I noticed in passing
that there is now a way to do synchronous fdatasync() on Windows.
This mechanism doesn't have an async variant, which is what I was
actually looking for (which turns out to doable with bleeding edge
IoRings, more on that later), but I figured this might be useful
anyway. I see that at least one other open source database has
discovered it and seen speedups. Like some other file API
improvements discussed recently, it's Windows 10+ and NTFS only. I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected. I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.
While testing that I also couldn't resist adding an extra output line
to pg_test_fsync to run open_datasync in buffered I/O mode, like
PostgreSQL actually does in real life. I guess I should really change
it to duplicate less code, though...
Attachments:
0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchDownload+79-9
0002-Add-fdatasync-wrapper-for-Windows.patchtext/x-patch; charset=US-ASCII; name=0002-Add-fdatasync-wrapper-for-Windows.patchDownload+76-4
Great. File sync is a Nice extension for me, as I don't know all file
structures.
Thomas Munro <thomas.munro@gmail.com> schrieb am So., 12. Dez. 2021, 03:48:
Show quoted text
Hi,
While porting some new IO code to lots of OSes I noticed in passing
that there is now a way to do synchronous fdatasync() on Windows.
This mechanism doesn't have an async variant, which is what I was
actually looking for (which turns out to doable with bleeding edge
IoRings, more on that later), but I figured this might be useful
anyway. I see that at least one other open source database has
discovered it and seen speedups. Like some other file API
improvements discussed recently, it's Windows 10+ and NTFS only. I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected. I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.While testing that I also couldn't resist adding an extra output line
to pg_test_fsync to run open_datasync in buffered I/O mode, like
PostgreSQL actually does in real life. I guess I should really change
it to duplicate less code, though...
On Sun, Dec 12, 2021 at 3:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
[...] I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected. I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.
One reason to consider developing this further is the problem
discussed in the aptly named thread "Loaded footgun open_datasync on
Windows"[1]/messages/by-id/1527846213.2475.31.camel@cybertec.at (not the problem that was fixed in pg_test_fsync, but the
problem with cache control, or lack thereof). I saw 10x more TPS with
open_datasync than with this experimental fdatasync on my little test
VM, which was more than a little fishy, so I turned off the device
write caching in "Device Manager" and got about the same number from
open_datasync and fdatasync. Clearly you can lose committed
transactions on power loss[2]https://github.com/MicrosoftDocs/feedback/issues/2747 with the default OS settings and default
PostgreSQL settings, though perhaps only if you're on SATA storage due
to lack of FUA pass-through[3]https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505 (?). I didn't try an NVMe stack.
That suggests that fdatasync would actually be a better default ...
except for the problems already mentioned with versions and not
working on non-NTFS (not sure how it fails on non-NTFS, though, maybe
it does a full flush, [4]https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex doesn't say).
(The patch is a little rough; I couldn't figure out the headers to get
that macro. Insert my usual disclaimer that I'm not a Windows guy,
this is stuff I'm just figuring out, all clues welcome...)
[1]: /messages/by-id/1527846213.2475.31.camel@cybertec.at
[2]: https://github.com/MicrosoftDocs/feedback/issues/2747
[3]: https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505
[4]: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex
I added a commitfest entry for this to try to attract Windows-hacker
reviews. I wondered about adjusting it to run on older systems, but I
think we're about ready to drop support for Windows < 10 anyway, so
maybe I'll go and propose that separately, instead.
On Sun, Dec 12, 2021 at 4:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:
One reason to consider developing this further is the problem
discussed in the aptly named thread "Loaded footgun open_datasync on
Windows"[1] (not the problem that was fixed in pg_test_fsync, but the
problem with cache control, or lack thereof). I saw 10x more TPS with
open_datasync than with this experimental fdatasync on my little test
VM, which was more than a little fishy, so I turned off the device
write caching in "Device Manager" and got about the same number from
open_datasync and fdatasync. Clearly you can lose committed
transactions on power loss[2] with the default OS settings and default
PostgreSQL settings, though perhaps only if you're on SATA storage due
to lack of FUA pass-through[3] (?). I didn't try an NVMe stack.That suggests that fdatasync would actually be a better default ...
except for the problems already mentioned with versions and not
working on non-NTFS (not sure how it fails on non-NTFS, though, maybe
it does a full flush, [4] doesn't say).
So my impression is that today we ship defaults that are unsafe on
Windows. I don't really understand much of what you are saying here,
but if there's a way we can stop doing that, +1 from me, especially if
it allows us to retain reasonable performance.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Feb 5, 2022 at 2:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
So my impression is that today we ship defaults that are unsafe on
Windows. I don't really understand much of what you are saying here,
but if there's a way we can stop doing that, +1 from me, especially if
it allows us to retain reasonable performance.
The PostgreSQL default in combination with the Windows default is
unsafe on SATA drives, but <get-out-clause>if you read our
documentation carefully you might figure out that you need to disable
write caching on your disk</>.
https://www.postgresql.org/docs/14/wal-reliability.html says:
"Consumer-grade IDE and SATA drives are particularly likely to have
write-back caches that will not survive a power failure. Many
solid-state drives (SSD) also have volatile write-back caches. [...]
On Windows, if wal_sync_method is open_datasync (the default), write
caching can be disabled by unchecking My Computer\Open\disk
drive\Properties\Hardware\Properties\Policies\Enable write caching on
the disk. Alternatively, set wal_sync_method to fsync or
fsync_writethrough, which prevent write caching."
I'm not proposing we change our default to this new level, because it
doesn't work on non-NTFS, an annoying complication. This patch would
just provide something faster to put after "Alternatively".
(Actually that whole page needs a refresh. IDE is gone. The
discussion about "recent" support for flushing caches is a bit out of
date, and in fact the problem with open_datasync on this OS is because
of problems with drivers and
https://en.wikipedia.org/wiki/Disk_buffer#Force_Unit_Access_(FUA), not
FLUSH CACHE EXT.)
Here's an updated patch that fixes some silly problems seen on CI.
There's something a bit clunkly/weird about this HAVE_FDATASYNC stuff,
maybe I can find a tidier way, but it's enough for experimentation:
For Mingw, I unconditionally add src/port/fdatasync.o to LIBOBJS, and
I unconditionally #define HAVE_FDATASYNC in win32_port.h, and I also
changed c.h's declaration of fdatasync() because it comes before
port.h is included (I guess I could move it down instead?).
For MSVC, I unconditionally add fdatasync.o to @pgportfiles, and
HAVE_FDATASYNC is defined in Solution.pm.
It'd be interesting to see pg_test_fsync.exe output on real hardware.
Here's what a little Windows 10 VM with a virtual SATA drive says:
C:\Users\thmunro>c:\pg\bin\pg_test_fsync.exe
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync (direct) 7914.872 ops/sec 126 usecs/op
open_datasync (buffered) 6593.056 ops/sec 152 usecs/op
fdatasync 650.317 ops/sec 1538 usecs/op
fsync 512.423 ops/sec 1952 usecs/op
fsync_writethrough 550.881 ops/sec 1815 usecs/op
open_sync (direct) n/a
open_sync (buffered) n/a
Attachments:
v2-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchDownload+79-9
v2-0002-Add-wal_sync_method-fdatasync-for-Windows.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Add-wal_sync_method-fdatasync-for-Windows.patchDownload+83-5
On Fri, Feb 4, 2022 at 4:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I'm not proposing we change our default to this new level, because it
doesn't work on non-NTFS, an annoying complication. This patch would
just provide something faster to put after "Alternatively".
Hmm. I thought NTFS had kind of won the filesystem war on the Windows
side of things. No?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Feb 5, 2022 at 12:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Feb 4, 2022 at 4:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I'm not proposing we change our default to this new level, because it
doesn't work on non-NTFS, an annoying complication. This patch would
just provide something faster to put after "Alternatively".Hmm. I thought NTFS had kind of won the filesystem war on the Windows
side of things. No?
Against FAT, yes, but there are also SMB/CIFS (network) and the new
ReFS (which we recently broke and then unbroke[1]/messages/by-id/16854-905604506e23d5c0@postgresql.org). I haven't tried
those things, lacking general Windows-fu, but I suppose they'd reject
this and we'd panic, because the docs say "file systems supported:
NTFS"[2]https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex.
[1]: /messages/by-id/16854-905604506e23d5c0@postgresql.org
[2]: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex
On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
I tried out a quick POC patch and it runs a bit faster than fsync(), as
expected.
Good news, as a too high difference would be suspect :)
How much difference does it make in % and are the numbers rather
reproducible? Just wondering..
--
Michael
On Sun, Feb 6, 2022 at 7:20 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
I tried out a quick POC patch and it runs a bit faster than fsync(), as
expected.Good news, as a too high difference would be suspect :)
How much difference does it make in % and are the numbers rather
reproducible? Just wondering..
I've only tested on a qemu/kvm virtual machine with a virtual SATA
disk device, so take this with a bucket of salt, but I think that's
enough to see the impact of 'slow' SATA commands hitting the device
and being waited for, and what I see is that wal_sync_method=fdatasync
does about 25% more TPS than wal_sync_method=fsync, and
wal_sync_method=open_datasync is a wildly higher number that I don't
believe (ie I don't believe it waited for power loss durability and
the links above support that understanding), but tumbles back to earth
and almost exactly matches the wal_sync_method=fdatasync number when
the write cache is disabled.
I've bumped this to the next cycle, so I can hopefully skip the
missing version detection stuff that I have no way to test (no CI, no
build farm, and I have zero interest in dumpster diving for Windows 7
or whatever installations).
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle, because the OS patches for
everything older come to an end in October next year[1]https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions, and we have a
lot of patches relating to modern Windows features that stall on
details about old systems that no one actually has.
[1]: https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions
On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle, because the OS patches for
everything older come to an end in October next year[1], and we have a
lot of patches relating to modern Windows features that stall on
details about old systems that no one actually has.[1] https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions
Do you think that we could raise the minimum C standard on WIN32 to
C11, at least for MSVC? There is a patch floating around to add
pg_attribute_aligned() and perhaps pg_attribute_noreturn() for MSVC:
/messages/by-id/Yk6UgCGlZKuxRr4n@paquier.xyz
noreturn() needed at least C11:
https://docs.microsoft.com/en-us/cpp/c-language/noreturn?view=msvc-140
Perhaps we'd better also bump up the minimum version of MSVC
supported..
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle,
Do we have any data on what people are actually using?
Do you think that we could raise the minimum C standard on WIN32 to
C11, at least for MSVC?
As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version. I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions. (That's very
distinct from how far back we need the built code to run.)
regards, tom lane
On Fri, Apr 08, 2022 at 12:40:55AM -0400, Tom Lane wrote:
As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version. I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions. (That's very
distinct from how far back we need the built code to run.)
Good question. Older versions of VS are available, so this is not a
problem:
https://visualstudio.microsoft.com/vs/older-downloads/
I think that we should at least drop 2013, as there is a bunch of
stuff related to _MSC_VER < 1900 that could be removed with that,
particularly for locales.
--
Michael
On Fri, 8 Apr 2022 at 05:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle,Do we have any data on what people are actually using?
None that I know of. Anecdotally, we dropped support for pgAdmin on Windows
< 8 (2012 for the server edition), and had a single complaint - and the
user happily acknowledged they were on an old release and expected support
to be dropped sooner or later. Windows 8 was a pretty unpopular release, so
I would expect shifting to 10/2016+ for PG 16 would be unlikely to be a
major problem.
FWIW, Python dropped support for < 8/2012 with v3.9.
Do you think that we could raise the minimum C standard on WIN32 to
C11, at least for MSVC?As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version. I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions. (That's very
distinct from how far back we need the built code to run.)regards, tom lane
--
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake
On Fri, Apr 8, 2022 at 7:56 PM Dave Page <dpage@pgadmin.org> wrote:
Windows 8 was a pretty unpopular release, so I would expect shifting to 10/2016+ for PG 16 would be unlikely to be a major problem.
Thanks to Michael for making that happen. That removes the main thing
I didn't know how to deal with in this patch. Here's a rebase with
some cleanup.
With my garbage collector hat on, I see that all systems we target
have fdatasync(), except:
1. Windows, but this patch supplies src/port/fdatasync.c.
2. DragonflyBSD before 6.1. We have 6.0 in the build farm.
3. Ancient macOS. Current releases have it, though we have to cope
with a missing declaration.
From a standards point of view, fdatasync() is issue 5 POSIX like
fsync(). Both are optional, but, being a database, we require
fsync(), and they're both covered by the same POSIX option
"Synchronized Input and Output".
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3. Then we could consider removing the
HAVE_FDATASYNC probe and associated #ifdefs when convenient. For that
reason, I'm not too bothered about the slight weirdness of defining
HAVE_FDATASYNC on Windows even though that doesn't come from
configure; it'd hopefully be short-lived. Better ideas welcome,
though. Does that make sense?
Attachments:
v3-0002-Add-wal_sync_method-fdatasync-for-Windows.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Add-wal_sync_method-fdatasync-for-Windows.patchDownload+86-7
v3-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchDownload+69-7
Thomas Munro <thomas.munro@gmail.com> writes:
With my garbage collector hat on, I see that all systems we target
have fdatasync(), except:
1. Windows, but this patch supplies src/port/fdatasync.c.
2. DragonflyBSD before 6.1. We have 6.0 in the build farm.
3. Ancient macOS. Current releases have it, though we have to cope
with a missing declaration.
Hmmm ... according to [1]/messages/by-id/1673109.1610733352@sss.pgh.pa.us, while current macOS has an undocumented
fdatasync function, it doesn't seem to do anything as useful as,
say, sync data to disk. I'm not sure what direction you're headed
in here, but it probably shouldn't include assuming that fdatasync
is actually useful on macOS. But maybe that's not your point?
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3.
You could force my hand by pushing something that requires this ;-).
I'm not feeling particularly wedded to prairiedog per se. As with
my once-and-future HPPA machine, I'd prefer to wait until NetBSD 10
is a thing before spinning up an official buildfarm animal, but
I suppose that that's not far away.
regards, tom lane
On Mon, Jul 18, 2022 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
With my garbage collector hat on, I see that all systems we target
have fdatasync(), except:1. Windows, but this patch supplies src/port/fdatasync.c.
2. DragonflyBSD before 6.1. We have 6.0 in the build farm.
3. Ancient macOS. Current releases have it, though we have to cope
with a missing declaration.Hmmm ... according to [1], while current macOS has an undocumented
fdatasync function, it doesn't seem to do anything as useful as,
say, sync data to disk. I'm not sure what direction you're headed
in here, but it probably shouldn't include assuming that fdatasync
is actually useful on macOS. But maybe that's not your point?
Oh, I'm not planning to change the default choice on macOS (or
Windows). I *am* assuming we're not going to take it away as an
option, that cat being out of the bag ever since configure found
Apple's secret fdatasync (note that O_DSYNC, our default, is also
undocumented and also known not to flush caches, but at least it's
present in an Apple header!). I was just noting an upcoming
opportunity to remove the configure/meson probes for fdatasync, which
made me feel better about the slightly kludgy way this patch is
defining HAVE_FDATASYNC explicitly on Windows.
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3.You could force my hand by pushing something that requires this ;-).
Heh. Let me ask about the DragonFlyBSD thing first.
Thomas Munro <thomas.munro@gmail.com> writes:
... I was just noting an upcoming
opportunity to remove the configure/meson probes for fdatasync, which
made me feel better about the slightly kludgy way this patch is
defining HAVE_FDATASYNC explicitly on Windows.
Hm. There is certainly not any harm in the meson infrastructure
skipping that test, because prairiedog is not able to run meson
anyway. Can we do that and still leave it in place on the autoconf
side? Maybe not, because I suppose you want to remove #ifdefs in
the code itself.
I see that fdatasync goes back as far as SUS v2, which we've long
taken as our minimum POSIX infrastructure. So there's not a lot
of room to insist that we should support allegedly-Unix platforms
without it.
regards, tom lane
On Mon, Jul 18, 2022 at 03:26:36PM +1200, Thomas Munro wrote:
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3. Then we could consider removing the
HAVE_FDATASYNC probe and associated #ifdefs when convenient. For that
reason, I'm not too bothered about the slight weirdness of defining
HAVE_FDATASYNC on Windows even though that doesn't come from
configure; it'd hopefully be short-lived. Better ideas welcome,
though. Does that make sense?
Do you still need HAVE_DECL_FDATASYNC?
--
Michael