Windows now has fdatasync()
Hi,
While porting some new IO code to lots of OSes I noticed in passing
that there is now a way to do synchronous fdatasync() on Windows.
This mechanism doesn't have an async variant, which is what I was
actually looking for (which turns out to doable with bleeding edge
IoRings, more on that later), but I figured this might be useful
anyway. I see that at least one other open source database has
discovered it and seen speedups. Like some other file API
improvements discussed recently, it's Windows 10+ and NTFS only. I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected. I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.
While testing that I also couldn't resist adding an extra output line
to pg_test_fsync to run open_datasync in buffered I/O mode, like
PostgreSQL actually does in real life. I guess I should really change
it to duplicate less code, though...
Attachments:
0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchtext/x-patch; charset=US-ASCII; name=0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchDownload
From 914b4da18d8249b4c1cb1219bcffd17625b0e2f1 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sun, 12 Dec 2021 14:13:35 +1300
Subject: [PATCH 1/4] Fix treatment of direct I/O in pg_test_fsync.
pg_test_fsync traditionally enabled O_DIRECT for the open_datasync and
open_sync tests. In fact current releases of PostgreSQL only do that if
wal_level=minimal (less commonly used). So, run the test both ways.
---
src/bin/pg_test_fsync/pg_test_fsync.c | 87 ++++++++++++++++++++++++---
1 file changed, 79 insertions(+), 8 deletions(-)
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index ddabf64c58..1be017fcd5 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -303,12 +303,12 @@ test_sync(int writes_per_op)
printf(_("(in wal_sync_method preference order, except fdatasync is Linux's default)\n"));
/*
- * Test open_datasync if available
+ * Test open_datasync direct if available
*/
- printf(LABEL_FORMAT, "open_datasync");
+ printf(LABEL_FORMAT, "open_datasync (direct)");
fflush(stdout);
-#ifdef OPEN_DATASYNC_FLAG
+#if defined(OPEN_DATASYNC_FLAG) && (defined(O_DIRECT) || defined(F_NOCACHE))
if ((tmpfile = open_direct(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
@@ -333,6 +333,38 @@ test_sync(int writes_per_op)
printf(NA_FORMAT, _("n/a"));
#endif
+ /*
+ * Test open_datasync buffered if available
+ */
+ printf(LABEL_FORMAT, "open_datasync (buffered)");
+ fflush(stdout);
+
+#ifdef OPEN_DATASYNC_FLAG
+ if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
+ {
+ printf(NA_FORMAT, _("n/a*"));
+ fs_warning = true;
+ }
+ else
+ {
+ START_TIMER;
+ for (ops = 0; alarm_triggered == false; ops++)
+ {
+ for (writes = 0; writes < writes_per_op; writes++)
+ if (pg_pwrite(tmpfile,
+ buf,
+ XLOG_BLCKSZ,
+ writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
+ die("write failed");
+ }
+ STOP_TIMER;
+ close(tmpfile);
+ }
+#else
+ printf(NA_FORMAT, _("n/a"));
+#endif
+
+
/*
* Test fdatasync if available
*/
@@ -409,13 +441,13 @@ test_sync(int writes_per_op)
printf(NA_FORMAT, _("n/a"));
#endif
-/*
- * Test open_sync if available
- */
- printf(LABEL_FORMAT, "open_sync");
+ /*
+ * Test open_sync if available
+ */
+ printf(LABEL_FORMAT, "open_sync (direct)");
fflush(stdout);
-#ifdef OPEN_SYNC_FLAG
+#if defined(OPEN_SYNC_FLAG) && (defined(O_DIRECT) || defined(F_NOCACHE))
if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
@@ -447,6 +479,45 @@ test_sync(int writes_per_op)
printf(NA_FORMAT, _("n/a"));
#endif
+ /*
+ * Test open_sync if available
+ */
+ printf(LABEL_FORMAT, "open_sync (buffered)");
+ fflush(stdout);
+
+#ifdef OPEN_SYNC_FLAG
+ if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+ {
+ printf(NA_FORMAT, _("n/a*"));
+ fs_warning = true;
+ }
+ else
+ {
+ START_TIMER;
+ for (ops = 0; alarm_triggered == false; ops++)
+ {
+ for (writes = 0; writes < writes_per_op; writes++)
+ if (pg_pwrite(tmpfile,
+ buf,
+ XLOG_BLCKSZ,
+ writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
+
+ /*
+ * This can generate write failures if the filesystem has
+ * a large block size, e.g. 4k, and there is no support
+ * for O_DIRECT writes smaller than the file system block
+ * size, e.g. XFS.
+ */
+ die("write failed");
+ }
+ STOP_TIMER;
+ close(tmpfile);
+ }
+#else
+ printf(NA_FORMAT, _("n/a"));
+#endif
+
+
if (fs_warning)
{
printf(_("* This file system and its mount options do not support direct\n"
--
2.33.1
0002-Add-fdatasync-wrapper-for-Windows.patchtext/x-patch; charset=US-ASCII; name=0002-Add-fdatasync-wrapper-for-Windows.patchDownload
From 8f58ea44762b547288e55037701f3b0043837dd5 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sun, 12 Dec 2021 11:55:54 +1300
Subject: [PATCH 2/4] Add fdatasync() wrapper for Windows.
Windows 10 gained support for flushing a file with fdatasync()
semantics. It only works on NTFS local file systems, but that's
possibly OK because wal_sync_method defaults to open_datasync on that
platform.
XXX If we used pg_fdatasync() for more things in the future, we'd have
to think about how to handle the case that it doesn't work on some file
systems.
XXX This patch blithely assumes that we're on Windows 10+ without any
version checking.
XXX For experimentation only.
---
configure | 6 +++
configure.ac | 1 +
src/bin/pg_test_fsync/pg_test_fsync.c | 2 +-
src/include/port/win32ntdll.h | 8 ++++
src/port/fdatasync.c | 53 +++++++++++++++++++++++++++
src/port/win32ntdll.c | 6 ++-
src/tools/msvc/Mkvcbuild.pm | 3 +-
7 files changed, 76 insertions(+), 3 deletions(-)
create mode 100644 src/port/fdatasync.c
diff --git a/configure b/configure
index 3b19105328..1a06e7e73e 100755
--- a/configure
+++ b/configure
@@ -16708,6 +16708,12 @@ fi
;;
esac
+ case " $LIBOBJS " in
+ *" fdatasync.$ac_objext "* ) ;;
+ *) LIBOBJS="$LIBOBJS fdatasync.$ac_objext"
+ ;;
+esac
+
case " $LIBOBJS " in
*" kill.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS kill.$ac_objext"
diff --git a/configure.ac b/configure.ac
index e77d4dcf2d..aaa0d01a7b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1927,6 +1927,7 @@ if test "$PORTNAME" = "win32"; then
AC_CHECK_FUNCS(_configthreadlocale)
AC_REPLACE_FUNCS(gettimeofday)
AC_LIBOBJ(dirmod)
+ AC_LIBOBJ(fdatasync)
AC_LIBOBJ(kill)
AC_LIBOBJ(open)
AC_LIBOBJ(system)
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 1be017fcd5..580b960e66 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -371,7 +371,7 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "fdatasync");
fflush(stdout);
-#ifdef HAVE_FDATASYNC
+#if defined(HAVE_FDATASYNC) || defined(WIN32)
if ((tmpfile = open(filename, O_RDWR | PG_BINARY, 0)) == -1)
die("could not open output file");
START_TIMER;
diff --git a/src/include/port/win32ntdll.h b/src/include/port/win32ntdll.h
index 4d8808b3aa..1e15900030 100644
--- a/src/include/port/win32ntdll.h
+++ b/src/include/port/win32ntdll.h
@@ -20,8 +20,16 @@
#include <ntstatus.h>
#include <winternl.h>
+#ifndef FLUSH_FLAGS_FILE_DATA_SYNC_ONLY
+#define FLUSH_FLAGS_FILE_DATA_SYNC_ONLY 0x4
+#endif
+
typedef NTSTATUS (__stdcall *RtlGetLastNtStatus_t) (void);
+typedef ULONG (__stdcall *RtlNtStatusToDosError_t) (NTSTATUS);
+typedef NTSTATUS (__stdcall *NtFlushBuffersFileEx_t) (HANDLE, ULONG, PVOID, ULONG, PIO_STATUS_BLOCK);
extern RtlGetLastNtStatus_t pg_RtlGetLastNtStatus;
+extern RtlNtStatusToDosError_t pg_RtlNtStatusToDosError;
+extern NtFlushBuffersFileEx_t pg_NtFlushBuffersFileEx;
extern int initialize_ntdll(void);
diff --git a/src/port/fdatasync.c b/src/port/fdatasync.c
new file mode 100644
index 0000000000..9a57f87574
--- /dev/null
+++ b/src/port/fdatasync.c
@@ -0,0 +1,53 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdatasync.c
+ * Win32 fdatasync() replacement
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/port/fdatasync.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define UMDF_USING_NTSTATUS
+
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#endif
+
+#include "port/win32ntdll.h"
+
+int
+fdatasync(int fd)
+{
+ IO_STATUS_BLOCK iosb;
+ NTSTATUS status;
+ HANDLE handle;
+
+ handle = (HANDLE) _get_osfhandle(fd);
+ if (handle == INVALID_HANDLE_VALUE)
+ {
+ errno = EBADF;
+ return -1;
+ }
+
+ if (initialize_ntdll() < 0)
+ return -1;
+
+ memset(&iosb, 0, sizeof(&iosb));
+ status = pg_NtFlushBuffersFileEx(handle,
+ FLUSH_FLAGS_FILE_DATA_SYNC_ONLY,
+ NULL,
+ 0,
+ &iosb);
+
+ if (NT_SUCCESS(status))
+ return 0;
+
+ _dosmaperr(pg_RtlNtStatusToDosError(status));
+ return -1;
+}
diff --git a/src/port/win32ntdll.c b/src/port/win32ntdll.c
index aa3d37c50e..f15897084c 100644
--- a/src/port/win32ntdll.c
+++ b/src/port/win32ntdll.c
@@ -20,6 +20,8 @@
#include "port/win32ntdll.h"
RtlGetLastNtStatus_t pg_RtlGetLastNtStatus;
+RtlNtStatusToDosError_t pg_RtlNtStatusToDosError;
+NtFlushBuffersFileEx_t pg_NtFlushBuffersFileEx;
typedef struct NtDllRoutine
{
@@ -28,7 +30,9 @@ typedef struct NtDllRoutine
} NtDllRoutine;
static const NtDllRoutine routines[] = {
- {"RtlGetLastNtStatus", (pg_funcptr_t *) &pg_RtlGetLastNtStatus}
+ {"RtlGetLastNtStatus", (pg_funcptr_t *) &pg_RtlGetLastNtStatus},
+ {"RtlNtStatusToDosError", (pg_funcptr_t *) &pg_RtlNtStatusToDosError},
+ {"NtFlushBuffersFileEx", (pg_funcptr_t *) &pg_NtFlushBuffersFileEx}
};
static bool initialized;
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 404c45a6f3..6e3d775eae 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -99,7 +99,8 @@ sub mkvcbuild
$solution = CreateSolution($vsVersion, $config);
our @pgportfiles = qw(
- chklocale.c explicit_bzero.c fls.c getpeereid.c getrusage.c inet_aton.c
+ chklocale.c explicit_bzero.c fls.c fdatasync.c
+ getpeereid.c getrusage.c inet_aton.c
getaddrinfo.c gettimeofday.c inet_net_ntop.c kill.c open.c
snprintf.c strlcat.c strlcpy.c dirmod.c noblock.c path.c
dirent.c dlopen.c getopt.c getopt_long.c link.c
--
2.33.1
Great. File sync is a Nice extension for me, as I don't know all file
structures.
Thomas Munro <thomas.munro@gmail.com> schrieb am So., 12. Dez. 2021, 03:48:
Show quoted text
Hi,
While porting some new IO code to lots of OSes I noticed in passing
that there is now a way to do synchronous fdatasync() on Windows.
This mechanism doesn't have an async variant, which is what I was
actually looking for (which turns out to doable with bleeding edge
IoRings, more on that later), but I figured this might be useful
anyway. I see that at least one other open source database has
discovered it and seen speedups. Like some other file API
improvements discussed recently, it's Windows 10+ and NTFS only. I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected. I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.While testing that I also couldn't resist adding an extra output line
to pg_test_fsync to run open_datasync in buffered I/O mode, like
PostgreSQL actually does in real life. I guess I should really change
it to duplicate less code, though...
On Sun, Dec 12, 2021 at 3:48 PM Thomas Munro <thomas.munro@gmail.com> wrote:
[...] I
tried out a quick POC patch and it runs a bit faster than fsync(), as
expected. I'm not sure if it's worth bothering with or not given the
other options, but figured it was worth sharing.
One reason to consider developing this further is the problem
discussed in the aptly named thread "Loaded footgun open_datasync on
Windows"[1]/messages/by-id/1527846213.2475.31.camel@cybertec.at (not the problem that was fixed in pg_test_fsync, but the
problem with cache control, or lack thereof). I saw 10x more TPS with
open_datasync than with this experimental fdatasync on my little test
VM, which was more than a little fishy, so I turned off the device
write caching in "Device Manager" and got about the same number from
open_datasync and fdatasync. Clearly you can lose committed
transactions on power loss[2]https://github.com/MicrosoftDocs/feedback/issues/2747 with the default OS settings and default
PostgreSQL settings, though perhaps only if you're on SATA storage due
to lack of FUA pass-through[3]https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505 (?). I didn't try an NVMe stack.
That suggests that fdatasync would actually be a better default ...
except for the problems already mentioned with versions and not
working on non-NTFS (not sure how it fails on non-NTFS, though, maybe
it does a full flush, [4]https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex doesn't say).
(The patch is a little rough; I couldn't figure out the headers to get
that macro. Insert my usual disclaimer that I'm not a Windows guy,
this is stuff I'm just figuring out, all clues welcome...)
[1]: /messages/by-id/1527846213.2475.31.camel@cybertec.at
[2]: https://github.com/MicrosoftDocs/feedback/issues/2747
[3]: https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505
[4]: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex
I added a commitfest entry for this to try to attract Windows-hacker
reviews. I wondered about adjusting it to run on older systems, but I
think we're about ready to drop support for Windows < 10 anyway, so
maybe I'll go and propose that separately, instead.
On Sun, Dec 12, 2021 at 4:32 PM Thomas Munro <thomas.munro@gmail.com> wrote:
One reason to consider developing this further is the problem
discussed in the aptly named thread "Loaded footgun open_datasync on
Windows"[1] (not the problem that was fixed in pg_test_fsync, but the
problem with cache control, or lack thereof). I saw 10x more TPS with
open_datasync than with this experimental fdatasync on my little test
VM, which was more than a little fishy, so I turned off the device
write caching in "Device Manager" and got about the same number from
open_datasync and fdatasync. Clearly you can lose committed
transactions on power loss[2] with the default OS settings and default
PostgreSQL settings, though perhaps only if you're on SATA storage due
to lack of FUA pass-through[3] (?). I didn't try an NVMe stack.That suggests that fdatasync would actually be a better default ...
except for the problems already mentioned with versions and not
working on non-NTFS (not sure how it fails on non-NTFS, though, maybe
it does a full flush, [4] doesn't say).
So my impression is that today we ship defaults that are unsafe on
Windows. I don't really understand much of what you are saying here,
but if there's a way we can stop doing that, +1 from me, especially if
it allows us to retain reasonable performance.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Feb 5, 2022 at 2:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
So my impression is that today we ship defaults that are unsafe on
Windows. I don't really understand much of what you are saying here,
but if there's a way we can stop doing that, +1 from me, especially if
it allows us to retain reasonable performance.
The PostgreSQL default in combination with the Windows default is
unsafe on SATA drives, but <get-out-clause>if you read our
documentation carefully you might figure out that you need to disable
write caching on your disk</>.
https://www.postgresql.org/docs/14/wal-reliability.html says:
"Consumer-grade IDE and SATA drives are particularly likely to have
write-back caches that will not survive a power failure. Many
solid-state drives (SSD) also have volatile write-back caches. [...]
On Windows, if wal_sync_method is open_datasync (the default), write
caching can be disabled by unchecking My Computer\Open\disk
drive\Properties\Hardware\Properties\Policies\Enable write caching on
the disk. Alternatively, set wal_sync_method to fsync or
fsync_writethrough, which prevent write caching."
I'm not proposing we change our default to this new level, because it
doesn't work on non-NTFS, an annoying complication. This patch would
just provide something faster to put after "Alternatively".
(Actually that whole page needs a refresh. IDE is gone. The
discussion about "recent" support for flushing caches is a bit out of
date, and in fact the problem with open_datasync on this OS is because
of problems with drivers and
https://en.wikipedia.org/wiki/Disk_buffer#Force_Unit_Access_(FUA), not
FLUSH CACHE EXT.)
Here's an updated patch that fixes some silly problems seen on CI.
There's something a bit clunkly/weird about this HAVE_FDATASYNC stuff,
maybe I can find a tidier way, but it's enough for experimentation:
For Mingw, I unconditionally add src/port/fdatasync.o to LIBOBJS, and
I unconditionally #define HAVE_FDATASYNC in win32_port.h, and I also
changed c.h's declaration of fdatasync() because it comes before
port.h is included (I guess I could move it down instead?).
For MSVC, I unconditionally add fdatasync.o to @pgportfiles, and
HAVE_FDATASYNC is defined in Solution.pm.
It'd be interesting to see pg_test_fsync.exe output on real hardware.
Here's what a little Windows 10 VM with a virtual SATA drive says:
C:\Users\thmunro>c:\pg\bin\pg_test_fsync.exe
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync is Linux's default)
open_datasync (direct) 7914.872 ops/sec 126 usecs/op
open_datasync (buffered) 6593.056 ops/sec 152 usecs/op
fdatasync 650.317 ops/sec 1538 usecs/op
fsync 512.423 ops/sec 1952 usecs/op
fsync_writethrough 550.881 ops/sec 1815 usecs/op
open_sync (direct) n/a
open_sync (buffered) n/a
Attachments:
v2-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchDownload
From ea61dab5b8be308bd82e56a5063f7acac70ad291 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sun, 12 Dec 2021 14:13:35 +1300
Subject: [PATCH v2 1/2] Fix treatment of direct I/O in pg_test_fsync.
pg_test_fsync traditionally enabled O_DIRECT for the open_datasync and
open_sync tests. In fact current releases of PostgreSQL only do that if
wal_level=minimal (less commonly used). So, run the test both ways.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
---
src/bin/pg_test_fsync/pg_test_fsync.c | 87 ++++++++++++++++++++++++---
1 file changed, 79 insertions(+), 8 deletions(-)
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index ddabf64c58..1be017fcd5 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -303,12 +303,12 @@ test_sync(int writes_per_op)
printf(_("(in wal_sync_method preference order, except fdatasync is Linux's default)\n"));
/*
- * Test open_datasync if available
+ * Test open_datasync direct if available
*/
- printf(LABEL_FORMAT, "open_datasync");
+ printf(LABEL_FORMAT, "open_datasync (direct)");
fflush(stdout);
-#ifdef OPEN_DATASYNC_FLAG
+#if defined(OPEN_DATASYNC_FLAG) && (defined(O_DIRECT) || defined(F_NOCACHE))
if ((tmpfile = open_direct(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
@@ -333,6 +333,38 @@ test_sync(int writes_per_op)
printf(NA_FORMAT, _("n/a"));
#endif
+ /*
+ * Test open_datasync buffered if available
+ */
+ printf(LABEL_FORMAT, "open_datasync (buffered)");
+ fflush(stdout);
+
+#ifdef OPEN_DATASYNC_FLAG
+ if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
+ {
+ printf(NA_FORMAT, _("n/a*"));
+ fs_warning = true;
+ }
+ else
+ {
+ START_TIMER;
+ for (ops = 0; alarm_triggered == false; ops++)
+ {
+ for (writes = 0; writes < writes_per_op; writes++)
+ if (pg_pwrite(tmpfile,
+ buf,
+ XLOG_BLCKSZ,
+ writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
+ die("write failed");
+ }
+ STOP_TIMER;
+ close(tmpfile);
+ }
+#else
+ printf(NA_FORMAT, _("n/a"));
+#endif
+
+
/*
* Test fdatasync if available
*/
@@ -409,13 +441,13 @@ test_sync(int writes_per_op)
printf(NA_FORMAT, _("n/a"));
#endif
-/*
- * Test open_sync if available
- */
- printf(LABEL_FORMAT, "open_sync");
+ /*
+ * Test open_sync if available
+ */
+ printf(LABEL_FORMAT, "open_sync (direct)");
fflush(stdout);
-#ifdef OPEN_SYNC_FLAG
+#if defined(OPEN_SYNC_FLAG) && (defined(O_DIRECT) || defined(F_NOCACHE))
if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
@@ -447,6 +479,45 @@ test_sync(int writes_per_op)
printf(NA_FORMAT, _("n/a"));
#endif
+ /*
+ * Test open_sync if available
+ */
+ printf(LABEL_FORMAT, "open_sync (buffered)");
+ fflush(stdout);
+
+#ifdef OPEN_SYNC_FLAG
+ if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+ {
+ printf(NA_FORMAT, _("n/a*"));
+ fs_warning = true;
+ }
+ else
+ {
+ START_TIMER;
+ for (ops = 0; alarm_triggered == false; ops++)
+ {
+ for (writes = 0; writes < writes_per_op; writes++)
+ if (pg_pwrite(tmpfile,
+ buf,
+ XLOG_BLCKSZ,
+ writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
+
+ /*
+ * This can generate write failures if the filesystem has
+ * a large block size, e.g. 4k, and there is no support
+ * for O_DIRECT writes smaller than the file system block
+ * size, e.g. XFS.
+ */
+ die("write failed");
+ }
+ STOP_TIMER;
+ close(tmpfile);
+ }
+#else
+ printf(NA_FORMAT, _("n/a"));
+#endif
+
+
if (fs_warning)
{
printf(_("* This file system and its mount options do not support direct\n"
--
2.30.2
v2-0002-Add-wal_sync_method-fdatasync-for-Windows.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Add-wal_sync_method-fdatasync-for-Windows.patchDownload
From 4e80c19fd29454c0f8843488381dd24f928588fc Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 5 Feb 2022 09:15:47 +1300
Subject: [PATCH v2 2/2] Add wal_sync_method=fdatasync for Windows.
Windows 10 gained support for flushing NTFS files with fdatasync()
semantics. Add support for it, though it is not the default.
XXX If we used pg_fdatasync() for more things in the future, we'd have
to think about how to handle the case that it doesn't work on some file
systems.
XXX This patch blithely assumes that we're on Windows 10+ without any
version checking.
XXX For experimentation only.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
---
configure | 6 ++++
configure.ac | 1 +
src/include/c.h | 2 +-
src/include/port/win32_port.h | 6 ++++
src/include/port/win32ntdll.h | 8 ++++++
src/port/fdatasync.c | 53 +++++++++++++++++++++++++++++++++++
src/port/win32ntdll.c | 6 +++-
src/tools/msvc/Mkvcbuild.pm | 3 +-
src/tools/msvc/Solution.pm | 2 +-
9 files changed, 83 insertions(+), 4 deletions(-)
create mode 100644 src/port/fdatasync.c
diff --git a/configure b/configure
index 879f92202f..3b9aa5ead6 100755
--- a/configure
+++ b/configure
@@ -16701,6 +16701,12 @@ fi
;;
esac
+ case " $LIBOBJS " in
+ *" fdatasync.$ac_objext "* ) ;;
+ *) LIBOBJS="$LIBOBJS fdatasync.$ac_objext"
+ ;;
+esac
+
case " $LIBOBJS " in
*" kill.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS kill.$ac_objext"
diff --git a/configure.ac b/configure.ac
index 95287705f6..a226f492e2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1928,6 +1928,7 @@ if test "$PORTNAME" = "win32"; then
AC_CHECK_FUNCS(_configthreadlocale)
AC_REPLACE_FUNCS(gettimeofday)
AC_LIBOBJ(dirmod)
+ AC_LIBOBJ(fdatasync)
AC_LIBOBJ(kill)
AC_LIBOBJ(open)
AC_LIBOBJ(system)
diff --git a/src/include/c.h b/src/include/c.h
index 4f16e589b3..3a7b79998a 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1279,7 +1279,7 @@ typedef union PGAlignedXLogBlock
* standard C library.
*/
-#if defined(HAVE_FDATASYNC) && !HAVE_DECL_FDATASYNC
+#if !HAVE_DECL_FDATASYNC
extern int fdatasync(int fildes);
#endif
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index d3cb765976..5a2f33eca0 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -83,6 +83,12 @@
#define HAVE_FSYNC_WRITETHROUGH
#define FSYNC_WRITETHROUGH_IS_FSYNC
+/*
+ * We have a replacement for fdatasync() in src/port/fdatasync.c, which is
+ * unconditionally used by MSVC and Mingw builds.
+ */
+#define HAVE_FDATASYNC
+
#define USES_WINSOCK
/*
diff --git a/src/include/port/win32ntdll.h b/src/include/port/win32ntdll.h
index 663b9754bd..3a9f2761b5 100644
--- a/src/include/port/win32ntdll.h
+++ b/src/include/port/win32ntdll.h
@@ -23,9 +23,17 @@
#include <ntstatus.h>
#include <winternl.h>
+#ifndef FLUSH_FLAGS_FILE_DATA_SYNC_ONLY
+#define FLUSH_FLAGS_FILE_DATA_SYNC_ONLY 0x4
+#endif
+
typedef NTSTATUS (__stdcall *RtlGetLastNtStatus_t) (void);
+typedef ULONG (__stdcall *RtlNtStatusToDosError_t) (NTSTATUS);
+typedef NTSTATUS (__stdcall *NtFlushBuffersFileEx_t) (HANDLE, ULONG, PVOID, ULONG, PIO_STATUS_BLOCK);
extern RtlGetLastNtStatus_t pg_RtlGetLastNtStatus;
+extern RtlNtStatusToDosError_t pg_RtlNtStatusToDosError;
+extern NtFlushBuffersFileEx_t pg_NtFlushBuffersFileEx;
extern int initialize_ntdll(void);
diff --git a/src/port/fdatasync.c b/src/port/fdatasync.c
new file mode 100644
index 0000000000..afef853aa3
--- /dev/null
+++ b/src/port/fdatasync.c
@@ -0,0 +1,53 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdatasync.c
+ * Win32 fdatasync() replacement
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/port/fdatasync.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define UMDF_USING_NTSTATUS
+
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#endif
+
+#include "port/win32ntdll.h"
+
+int
+fdatasync(int fd)
+{
+ IO_STATUS_BLOCK iosb;
+ NTSTATUS status;
+ HANDLE handle;
+
+ handle = (HANDLE) _get_osfhandle(fd);
+ if (handle == INVALID_HANDLE_VALUE)
+ {
+ errno = EBADF;
+ return -1;
+ }
+
+ if (initialize_ntdll() < 0)
+ return -1;
+
+ memset(&iosb, 0, sizeof(iosb));
+ status = pg_NtFlushBuffersFileEx(handle,
+ FLUSH_FLAGS_FILE_DATA_SYNC_ONLY,
+ NULL,
+ 0,
+ &iosb);
+
+ if (NT_SUCCESS(status))
+ return 0;
+
+ _dosmaperr(pg_RtlNtStatusToDosError(status));
+ return -1;
+}
diff --git a/src/port/win32ntdll.c b/src/port/win32ntdll.c
index 10c33c6a01..eb61407754 100644
--- a/src/port/win32ntdll.c
+++ b/src/port/win32ntdll.c
@@ -20,6 +20,8 @@
#include "port/win32ntdll.h"
RtlGetLastNtStatus_t pg_RtlGetLastNtStatus;
+RtlNtStatusToDosError_t pg_RtlNtStatusToDosError;
+NtFlushBuffersFileEx_t pg_NtFlushBuffersFileEx;
typedef struct NtDllRoutine
{
@@ -28,7 +30,9 @@ typedef struct NtDllRoutine
} NtDllRoutine;
static const NtDllRoutine routines[] = {
- {"RtlGetLastNtStatus", (pg_funcptr_t *) &pg_RtlGetLastNtStatus}
+ {"RtlGetLastNtStatus", (pg_funcptr_t *) &pg_RtlGetLastNtStatus},
+ {"RtlNtStatusToDosError", (pg_funcptr_t *) &pg_RtlNtStatusToDosError},
+ {"NtFlushBuffersFileEx", (pg_funcptr_t *) &pg_NtFlushBuffersFileEx}
};
static bool initialized;
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index a310bcb28c..8cc39d4f8d 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -99,7 +99,8 @@ sub mkvcbuild
$solution = CreateSolution($vsVersion, $config);
our @pgportfiles = qw(
- chklocale.c explicit_bzero.c fls.c getpeereid.c getrusage.c inet_aton.c
+ chklocale.c explicit_bzero.c fls.c fdatasync.c
+ getpeereid.c getrusage.c inet_aton.c
getaddrinfo.c gettimeofday.c inet_net_ntop.c kill.c open.c
snprintf.c strlcat.c strlcpy.c dirmod.c noblock.c path.c
dirent.c dlopen.c getopt.c getopt_long.c link.c
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index ed1c53000f..4f75c07fcf 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -256,7 +256,7 @@ sub GenerateFiles
HAVE_EDITLINE_READLINE_H => undef,
HAVE_EXECINFO_H => undef,
HAVE_EXPLICIT_BZERO => undef,
- HAVE_FDATASYNC => undef,
+ HAVE_FDATASYNC => 1,
HAVE_FLS => undef,
HAVE_FSEEKO => 1,
HAVE_FUNCNAME__FUNC => undef,
--
2.30.2
On Fri, Feb 4, 2022 at 4:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I'm not proposing we change our default to this new level, because it
doesn't work on non-NTFS, an annoying complication. This patch would
just provide something faster to put after "Alternatively".
Hmm. I thought NTFS had kind of won the filesystem war on the Windows
side of things. No?
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Feb 5, 2022 at 12:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Feb 4, 2022 at 4:24 PM Thomas Munro <thomas.munro@gmail.com> wrote:
I'm not proposing we change our default to this new level, because it
doesn't work on non-NTFS, an annoying complication. This patch would
just provide something faster to put after "Alternatively".Hmm. I thought NTFS had kind of won the filesystem war on the Windows
side of things. No?
Against FAT, yes, but there are also SMB/CIFS (network) and the new
ReFS (which we recently broke and then unbroke[1]/messages/by-id/16854-905604506e23d5c0@postgresql.org). I haven't tried
those things, lacking general Windows-fu, but I suppose they'd reject
this and we'd panic, because the docs say "file systems supported:
NTFS"[2]https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex.
[1]: /messages/by-id/16854-905604506e23d5c0@postgresql.org
[2]: https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/ntifs/nf-ntifs-ntflushbuffersfileex
On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
I tried out a quick POC patch and it runs a bit faster than fsync(), as
expected.
Good news, as a too high difference would be suspect :)
How much difference does it make in % and are the numbers rather
reproducible? Just wondering..
--
Michael
On Sun, Feb 6, 2022 at 7:20 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Dec 12, 2021 at 03:48:10PM +1300, Thomas Munro wrote:
I tried out a quick POC patch and it runs a bit faster than fsync(), as
expected.Good news, as a too high difference would be suspect :)
How much difference does it make in % and are the numbers rather
reproducible? Just wondering..
I've only tested on a qemu/kvm virtual machine with a virtual SATA
disk device, so take this with a bucket of salt, but I think that's
enough to see the impact of 'slow' SATA commands hitting the device
and being waited for, and what I see is that wal_sync_method=fdatasync
does about 25% more TPS than wal_sync_method=fsync, and
wal_sync_method=open_datasync is a wildly higher number that I don't
believe (ie I don't believe it waited for power loss durability and
the links above support that understanding), but tumbles back to earth
and almost exactly matches the wal_sync_method=fdatasync number when
the write cache is disabled.
I've bumped this to the next cycle, so I can hopefully skip the
missing version detection stuff that I have no way to test (no CI, no
build farm, and I have zero interest in dumpster diving for Windows 7
or whatever installations).
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle, because the OS patches for
everything older come to an end in October next year[1]https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions, and we have a
lot of patches relating to modern Windows features that stall on
details about old systems that no one actually has.
[1]: https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions
On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle, because the OS patches for
everything older come to an end in October next year[1], and we have a
lot of patches relating to modern Windows features that stall on
details about old systems that no one actually has.[1] https://en.wikipedia.org/wiki/List_of_Microsoft_Windows_versions
Do you think that we could raise the minimum C standard on WIN32 to
C11, at least for MSVC? There is a patch floating around to add
pg_attribute_aligned() and perhaps pg_attribute_noreturn() for MSVC:
/messages/by-id/Yk6UgCGlZKuxRr4n@paquier.xyz
noreturn() needed at least C11:
https://docs.microsoft.com/en-us/cpp/c-language/noreturn?view=msvc-140
Perhaps we'd better also bump up the minimum version of MSVC
supported..
--
Michael
Michael Paquier <michael@paquier.xyz> writes:
On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle,
Do we have any data on what people are actually using?
Do you think that we could raise the minimum C standard on WIN32 to
C11, at least for MSVC?
As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version. I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions. (That's very
distinct from how far back we need the built code to run.)
regards, tom lane
On Fri, Apr 08, 2022 at 12:40:55AM -0400, Tom Lane wrote:
As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version. I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions. (That's very
distinct from how far back we need the built code to run.)
Good question. Older versions of VS are available, so this is not a
problem:
https://visualstudio.microsoft.com/vs/older-downloads/
I think that we should at least drop 2013, as there is a bunch of
stuff related to _MSC_VER < 1900 that could be removed with that,
particularly for locales.
--
Michael
On Fri, 8 Apr 2022 at 05:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael@paquier.xyz> writes:
On Fri, Apr 08, 2022 at 02:56:15PM +1200, Thomas Munro wrote:
I propose that we drop support for Windows versions older than
10/Server 2016 in the PostgreSQL 16 cycle,Do we have any data on what people are actually using?
None that I know of. Anecdotally, we dropped support for pgAdmin on Windows
< 8 (2012 for the server edition), and had a single complaint - and the
user happily acknowledged they were on an old release and expected support
to be dropped sooner or later. Windows 8 was a pretty unpopular release, so
I would expect shifting to 10/2016+ for PG 16 would be unlikely to be a
major problem.
FWIW, Python dropped support for < 8/2012 with v3.9.
Do you think that we could raise the minimum C standard on WIN32 to
C11, at least for MSVC?As long as the C11-isms are in MSVC-only code, it seems like this is
exactly equivalent to setting a minimum MSVC version. I don't see
an objection-in-principle there, it's just a practical question of
how far back is reasonable to support MSVC versions. (That's very
distinct from how far back we need the built code to run.)regards, tom lane
--
Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake
On Fri, Apr 8, 2022 at 7:56 PM Dave Page <dpage@pgadmin.org> wrote:
Windows 8 was a pretty unpopular release, so I would expect shifting to 10/2016+ for PG 16 would be unlikely to be a major problem.
Thanks to Michael for making that happen. That removes the main thing
I didn't know how to deal with in this patch. Here's a rebase with
some cleanup.
With my garbage collector hat on, I see that all systems we target
have fdatasync(), except:
1. Windows, but this patch supplies src/port/fdatasync.c.
2. DragonflyBSD before 6.1. We have 6.0 in the build farm.
3. Ancient macOS. Current releases have it, though we have to cope
with a missing declaration.
From a standards point of view, fdatasync() is issue 5 POSIX like
fsync(). Both are optional, but, being a database, we require
fsync(), and they're both covered by the same POSIX option
"Synchronized Input and Output".
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3. Then we could consider removing the
HAVE_FDATASYNC probe and associated #ifdefs when convenient. For that
reason, I'm not too bothered about the slight weirdness of defining
HAVE_FDATASYNC on Windows even though that doesn't come from
configure; it'd hopefully be short-lived. Better ideas welcome,
though. Does that make sense?
Attachments:
v3-0002-Add-wal_sync_method-fdatasync-for-Windows.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Add-wal_sync_method-fdatasync-for-Windows.patchDownload
From 56882f910e58f72987433c72c32fc6eedc1cbac9 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sat, 5 Feb 2022 09:15:47 +1300
Subject: [PATCH v3 2/2] Add wal_sync_method=fdatasync for Windows.
Windows 10 gained support for flushing files with fdatasync() semantics.
The main advantage over wal_sync_method=open_datasync is that the latter
does not flush SATA drive caches.
According to OS documentation FLUSH_FLAGS_FILE_DATA_SYNC_ONLY works
only on NTFS filesystems, so we can't consider making it the default.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
---
configure | 6 ++++
configure.ac | 1 +
doc/src/sgml/wal.sgml | 3 +-
src/include/c.h | 2 +-
src/include/port/win32_port.h | 6 ++++
src/include/port/win32ntdll.h | 10 ++++++-
src/port/fdatasync.c | 53 +++++++++++++++++++++++++++++++++++
src/port/win32ntdll.c | 6 +++-
src/tools/msvc/Mkvcbuild.pm | 3 +-
src/tools/msvc/Solution.pm | 2 +-
10 files changed, 86 insertions(+), 6 deletions(-)
create mode 100644 src/port/fdatasync.c
diff --git a/configure b/configure
index 1e63c6862b..a03339301d 100755
--- a/configure
+++ b/configure
@@ -16980,6 +16980,12 @@ fi
;;
esac
+ case " $LIBOBJS " in
+ *" fdatasync.$ac_objext "* ) ;;
+ *) LIBOBJS="$LIBOBJS fdatasync.$ac_objext"
+ ;;
+esac
+
case " $LIBOBJS " in
*" kill.$ac_objext "* ) ;;
*) LIBOBJS="$LIBOBJS kill.$ac_objext"
diff --git a/configure.ac b/configure.ac
index 71191f14ad..9528f7fb42 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1975,6 +1975,7 @@ if test "$PORTNAME" = "win32"; then
AC_CHECK_FUNCS(_configthreadlocale)
AC_REPLACE_FUNCS(gettimeofday)
AC_LIBOBJ(dirmod)
+ AC_LIBOBJ(fdatasync)
AC_LIBOBJ(kill)
AC_LIBOBJ(open)
AC_LIBOBJ(system)
diff --git a/doc/src/sgml/wal.sgml b/doc/src/sgml/wal.sgml
index 4b6ef283c1..01f7379ebb 100644
--- a/doc/src/sgml/wal.sgml
+++ b/doc/src/sgml/wal.sgml
@@ -108,7 +108,8 @@
<literal>open_datasync</literal> (the default), write caching can be disabled
by unchecking <literal>My Computer\Open\<replaceable>disk drive</replaceable>\Properties\Hardware\Properties\Policies\Enable write caching on the disk</literal>.
Alternatively, set <varname>wal_sync_method</varname> to
- <literal>fsync</literal> or <literal>fsync_writethrough</literal>, which prevent
+ <literal>fdatasync</literal> (NTFS only), <literal>fsync</literal> or
+ <literal>fsync_writethrough</literal>, which prevent
write caching.
</para>
</listitem>
diff --git a/src/include/c.h b/src/include/c.h
index 863a16c6a6..fdf89e2742 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -1290,7 +1290,7 @@ typedef union PGAlignedXLogBlock
* standard C library.
*/
-#if defined(HAVE_FDATASYNC) && !HAVE_DECL_FDATASYNC
+#if !HAVE_DECL_FDATASYNC
extern int fdatasync(int fildes);
#endif
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 5121c0c626..5ea66528fa 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -83,6 +83,12 @@
#define HAVE_FSYNC_WRITETHROUGH
#define FSYNC_WRITETHROUGH_IS_FSYNC
+/*
+ * We have a replacement for fdatasync() in src/port/fdatasync.c, which is
+ * unconditionally used by MSVC and Mingw builds.
+ */
+#define HAVE_FDATASYNC
+
#define USES_WINSOCK
/*
diff --git a/src/include/port/win32ntdll.h b/src/include/port/win32ntdll.h
index 291b067ea4..34cebddd54 100644
--- a/src/include/port/win32ntdll.h
+++ b/src/include/port/win32ntdll.h
@@ -23,9 +23,17 @@
#include <ntstatus.h>
#include <winternl.h>
-typedef NTSTATUS (__stdcall * RtlGetLastNtStatus_t) (void);
+#ifndef FLUSH_FLAGS_FILE_DATA_SYNC_ONLY
+#define FLUSH_FLAGS_FILE_DATA_SYNC_ONLY 0x4
+#endif
+
+typedef NTSTATUS (__stdcall *RtlGetLastNtStatus_t) (void);
+typedef ULONG (__stdcall *RtlNtStatusToDosError_t) (NTSTATUS);
+typedef NTSTATUS (__stdcall *NtFlushBuffersFileEx_t) (HANDLE, ULONG, PVOID, ULONG, PIO_STATUS_BLOCK);
extern PGDLLIMPORT RtlGetLastNtStatus_t pg_RtlGetLastNtStatus;
+extern PGDLLIMPORT RtlNtStatusToDosError_t pg_RtlNtStatusToDosError;
+extern PGDLLIMPORT NtFlushBuffersFileEx_t pg_NtFlushBuffersFileEx;
extern int initialize_ntdll(void);
diff --git a/src/port/fdatasync.c b/src/port/fdatasync.c
new file mode 100644
index 0000000000..afef853aa3
--- /dev/null
+++ b/src/port/fdatasync.c
@@ -0,0 +1,53 @@
+/*-------------------------------------------------------------------------
+ *
+ * fdatasync.c
+ * Win32 fdatasync() replacement
+ *
+ *
+ * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
+ *
+ * src/port/fdatasync.c
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#define UMDF_USING_NTSTATUS
+
+#ifdef FRONTEND
+#include "postgres_fe.h"
+#else
+#include "postgres.h"
+#endif
+
+#include "port/win32ntdll.h"
+
+int
+fdatasync(int fd)
+{
+ IO_STATUS_BLOCK iosb;
+ NTSTATUS status;
+ HANDLE handle;
+
+ handle = (HANDLE) _get_osfhandle(fd);
+ if (handle == INVALID_HANDLE_VALUE)
+ {
+ errno = EBADF;
+ return -1;
+ }
+
+ if (initialize_ntdll() < 0)
+ return -1;
+
+ memset(&iosb, 0, sizeof(iosb));
+ status = pg_NtFlushBuffersFileEx(handle,
+ FLUSH_FLAGS_FILE_DATA_SYNC_ONLY,
+ NULL,
+ 0,
+ &iosb);
+
+ if (NT_SUCCESS(status))
+ return 0;
+
+ _dosmaperr(pg_RtlNtStatusToDosError(status));
+ return -1;
+}
diff --git a/src/port/win32ntdll.c b/src/port/win32ntdll.c
index 10c33c6a01..eb61407754 100644
--- a/src/port/win32ntdll.c
+++ b/src/port/win32ntdll.c
@@ -20,6 +20,8 @@
#include "port/win32ntdll.h"
RtlGetLastNtStatus_t pg_RtlGetLastNtStatus;
+RtlNtStatusToDosError_t pg_RtlNtStatusToDosError;
+NtFlushBuffersFileEx_t pg_NtFlushBuffersFileEx;
typedef struct NtDllRoutine
{
@@ -28,7 +30,9 @@ typedef struct NtDllRoutine
} NtDllRoutine;
static const NtDllRoutine routines[] = {
- {"RtlGetLastNtStatus", (pg_funcptr_t *) &pg_RtlGetLastNtStatus}
+ {"RtlGetLastNtStatus", (pg_funcptr_t *) &pg_RtlGetLastNtStatus},
+ {"RtlNtStatusToDosError", (pg_funcptr_t *) &pg_RtlNtStatusToDosError},
+ {"NtFlushBuffersFileEx", (pg_funcptr_t *) &pg_NtFlushBuffersFileEx}
};
static bool initialized;
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index e4feda10fd..cc7a908d10 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -99,7 +99,8 @@ sub mkvcbuild
$solution = CreateSolution($vsVersion, $config);
our @pgportfiles = qw(
- chklocale.c explicit_bzero.c fls.c getpeereid.c getrusage.c inet_aton.c
+ chklocale.c explicit_bzero.c fls.c fdatasync.c
+ getpeereid.c getrusage.c inet_aton.c
getaddrinfo.c gettimeofday.c inet_net_ntop.c kill.c open.c
snprintf.c strlcat.c strlcpy.c dirmod.c noblock.c path.c
dirent.c dlopen.c getopt.c getopt_long.c link.c
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index fa32dc371d..c87bca8e6a 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -257,7 +257,7 @@ sub GenerateFiles
HAVE_EDITLINE_READLINE_H => undef,
HAVE_EXECINFO_H => undef,
HAVE_EXPLICIT_BZERO => undef,
- HAVE_FDATASYNC => undef,
+ HAVE_FDATASYNC => 1,
HAVE_FLS => undef,
HAVE_FSEEKO => 1,
HAVE_FUNCNAME__FUNC => undef,
--
2.35.1
v3-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Fix-treatment-of-direct-I-O-in-pg_test_fsync.patchDownload
From a985d52d1870ccfb92fda3316158242ce2ba3fe8 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Sun, 12 Dec 2021 14:13:35 +1300
Subject: [PATCH v3 1/2] Fix treatment of direct I/O in pg_test_fsync.
pg_test_fsync traditionally enabled O_DIRECT for the open_datasync and
open_sync tests. In fact current releases of PostgreSQL only do that if
wal_level=minimal (less commonly used). So, run the test both ways.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
---
src/bin/pg_test_fsync/pg_test_fsync.c | 75 ++++++++++++++++++++++++---
1 file changed, 69 insertions(+), 6 deletions(-)
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index f7bc199a30..109ccba819 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -294,14 +294,45 @@ test_sync(int writes_per_op)
printf(_("\nCompare file sync methods using two %dkB writes:\n"), XLOG_BLCKSZ_K);
printf(_("(in wal_sync_method preference order, except fdatasync is Linux's default)\n"));
+/*
+ * Test open_datasync with O_DIRECT if available
+ */
+ printf(LABEL_FORMAT, "open_datasync (direct)");
+ fflush(stdout);
+
+#if defined(OPEN_DATASYNC_FLAG) && (defined(O_DIRECT) || defined(F_NOCACHE))
+ if ((tmpfile = open_direct(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
+ {
+ printf(NA_FORMAT, _("n/a*"));
+ fs_warning = true;
+ }
+ else
+ {
+ START_TIMER;
+ for (ops = 0; alarm_triggered == false; ops++)
+ {
+ for (writes = 0; writes < writes_per_op; writes++)
+ if (pg_pwrite(tmpfile,
+ buf,
+ XLOG_BLCKSZ,
+ writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
+ die("write failed");
+ }
+ STOP_TIMER;
+ close(tmpfile);
+ }
+#else
+ printf(NA_FORMAT, _("n/a"));
+#endif
+
/*
- * Test open_datasync if available
+ * Test open_datasync buffered if available
*/
- printf(LABEL_FORMAT, "open_datasync");
+ printf(LABEL_FORMAT, "open_datasync (buffered)");
fflush(stdout);
#ifdef OPEN_DATASYNC_FLAG
- if ((tmpfile = open_direct(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
+ if ((tmpfile = open(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
fs_warning = true;
@@ -402,12 +433,12 @@ test_sync(int writes_per_op)
#endif
/*
- * Test open_sync if available
+ * Test open_sync with O_DIRECT if available
*/
- printf(LABEL_FORMAT, "open_sync");
+ printf(LABEL_FORMAT, "open_sync (direct)");
fflush(stdout);
-#ifdef OPEN_SYNC_FLAG
+#if defined(OPEN_SYNC_FLAG) && (defined(O_DIRECT) || defined(F_NOCACHE))
if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
@@ -439,6 +470,38 @@ test_sync(int writes_per_op)
printf(NA_FORMAT, _("n/a"));
#endif
+/*
+ * Test open_sync if available
+ */
+ printf(LABEL_FORMAT, "open_sync (buffered)");
+ fflush(stdout);
+
+#ifdef OPEN_SYNC_FLAG
+ if ((tmpfile = open(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+ {
+ printf(NA_FORMAT, _("n/a*"));
+ fs_warning = true;
+ }
+ else
+ {
+ START_TIMER;
+ for (ops = 0; alarm_triggered == false; ops++)
+ {
+ for (writes = 0; writes < writes_per_op; writes++)
+ if (pg_pwrite(tmpfile,
+ buf,
+ XLOG_BLCKSZ,
+ writes * XLOG_BLCKSZ) != XLOG_BLCKSZ)
+ die("write failed");
+ }
+ STOP_TIMER;
+ close(tmpfile);
+ }
+#else
+ printf(NA_FORMAT, _("n/a"));
+#endif
+
+
if (fs_warning)
{
printf(_("* This file system and its mount options do not support direct\n"
--
2.35.1
Thomas Munro <thomas.munro@gmail.com> writes:
With my garbage collector hat on, I see that all systems we target
have fdatasync(), except:
1. Windows, but this patch supplies src/port/fdatasync.c.
2. DragonflyBSD before 6.1. We have 6.0 in the build farm.
3. Ancient macOS. Current releases have it, though we have to cope
with a missing declaration.
Hmmm ... according to [1]/messages/by-id/1673109.1610733352@sss.pgh.pa.us, while current macOS has an undocumented
fdatasync function, it doesn't seem to do anything as useful as,
say, sync data to disk. I'm not sure what direction you're headed
in here, but it probably shouldn't include assuming that fdatasync
is actually useful on macOS. But maybe that's not your point?
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3.
You could force my hand by pushing something that requires this ;-).
I'm not feeling particularly wedded to prairiedog per se. As with
my once-and-future HPPA machine, I'd prefer to wait until NetBSD 10
is a thing before spinning up an official buildfarm animal, but
I suppose that that's not far away.
regards, tom lane
On Mon, Jul 18, 2022 at 3:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thomas Munro <thomas.munro@gmail.com> writes:
With my garbage collector hat on, I see that all systems we target
have fdatasync(), except:1. Windows, but this patch supplies src/port/fdatasync.c.
2. DragonflyBSD before 6.1. We have 6.0 in the build farm.
3. Ancient macOS. Current releases have it, though we have to cope
with a missing declaration.Hmmm ... according to [1], while current macOS has an undocumented
fdatasync function, it doesn't seem to do anything as useful as,
say, sync data to disk. I'm not sure what direction you're headed
in here, but it probably shouldn't include assuming that fdatasync
is actually useful on macOS. But maybe that's not your point?
Oh, I'm not planning to change the default choice on macOS (or
Windows). I *am* assuming we're not going to take it away as an
option, that cat being out of the bag ever since configure found
Apple's secret fdatasync (note that O_DSYNC, our default, is also
undocumented and also known not to flush caches, but at least it's
present in an Apple header!). I was just noting an upcoming
opportunity to remove the configure/meson probes for fdatasync, which
made me feel better about the slightly kludgy way this patch is
defining HAVE_FDATASYNC explicitly on Windows.
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3.You could force my hand by pushing something that requires this ;-).
Heh. Let me ask about the DragonFlyBSD thing first.
Thomas Munro <thomas.munro@gmail.com> writes:
... I was just noting an upcoming
opportunity to remove the configure/meson probes for fdatasync, which
made me feel better about the slightly kludgy way this patch is
defining HAVE_FDATASYNC explicitly on Windows.
Hm. There is certainly not any harm in the meson infrastructure
skipping that test, because prairiedog is not able to run meson
anyway. Can we do that and still leave it in place on the autoconf
side? Maybe not, because I suppose you want to remove #ifdefs in
the code itself.
I see that fdatasync goes back as far as SUS v2, which we've long
taken as our minimum POSIX infrastructure. So there's not a lot
of room to insist that we should support allegedly-Unix platforms
without it.
regards, tom lane
On Mon, Jul 18, 2022 at 03:26:36PM +1200, Thomas Munro wrote:
My plan now is to commit this patch so that problem #1 is solved, prod
conchuela's owner to upgrade to solve #2, and wait until Tom shuts
down prairiedog to solve #3. Then we could consider removing the
HAVE_FDATASYNC probe and associated #ifdefs when convenient. For that
reason, I'm not too bothered about the slight weirdness of defining
HAVE_FDATASYNC on Windows even though that doesn't come from
configure; it'd hopefully be short-lived. Better ideas welcome,
though. Does that make sense?
Do you still need HAVE_DECL_FDATASYNC?
--
Michael
On Tue, Jul 19, 2022 at 4:54 PM Michael Paquier <michael@paquier.xyz> wrote:
Do you still need HAVE_DECL_FDATASYNC?
I guess so, because that is currently used for macOS, and with this
patch would also be used to control the declaration for Windows. The
alternative would be to explicitly test for WIN32 or __darwin__.
The reason we need it for macOS is that they have had fdatasync
function for many years now, and configure detects it, but they
haven't ever declared it in a header, so we (accidentally?) do it in
c.h. We didn't set that up for Apple! The commit that added it was
33cc5d8a, which was about a month before Apple shipped the first
version of OS X (and long before they defined the function). So there
must have been another Unix with that problem, lost in the mists of
time.
Thomas Munro <thomas.munro@gmail.com> writes:
The reason we need it for macOS is that they have had fdatasync
function for many years now, and configure detects it, but they
haven't ever declared it in a header, so we (accidentally?) do it in
c.h. We didn't set that up for Apple! The commit that added it was
33cc5d8a, which was about a month before Apple shipped the first
version of OS X (and long before they defined the function). So there
must have been another Unix with that problem, lost in the mists of
time.
It might have just been paranoia, but I doubt it. Back then we
were still dealing with lots of systems that didn't have every
function described in SUS v2.
If you poked around in the mail archives you could likely find some
associated discussion, but I'm too lazy for that ...
regards, tom lane
Ok, I've pushed the Windows patch. I'll watch the build farm to see
if I've broken any of the frankentoolchain Windows animals.
Mikael kindly upgraded conchuela, so that leaves just prairiedog
without fdatasync. I've attached a patch to drop the configure probe
for that once prairiedog's host is reassigned to new duties, if we're
agreed on that.
While in this part of the code I noticed another anachronism that
could be cleaned up: our handling of the old pre-standard BSD O_FSYNC
flag. Pulling on that I noticed I could remove a bunch of associated
macrology.
Attachments:
0001-Remove-O_FSYNC-and-associated-macros.patchtext/x-patch; charset=US-ASCII; name=0001-Remove-O_FSYNC-and-associated-macros.patchDownload
From b78c07a573c9f7e634eb9d33f57b07e9c37bfb47 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 20 Jul 2022 12:32:26 +1200
Subject: [PATCH 1/2] Remove O_FSYNC and associated macros.
O_FSYNC was a pre-standard way of spelling O_SYNC. It's not needed on
any modern system. We can just use O_SYNC directly, if it exists, and
get rid of our OPEN_SYNC_FLAG macro.
Likewise for O_DSYNC, we can just use it directly, if it exists, and get
rid of our OPEN_DATASYNC_FLAG macro. The only complication is that we
still want to avoid choosing open_datasync as a default if O_DSYNC has
the same value as O_SYNC, so there is no change in behavior.
---
src/backend/access/transam/xlog.c | 20 ++++++++++----------
src/bin/pg_test_fsync/pg_test_fsync.c | 12 ++++++------
src/include/access/xlogdefs.h | 19 +------------------
3 files changed, 17 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9854b51c6a..1da3b8eb2e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -171,10 +171,10 @@ const struct config_enum_entry sync_method_options[] = {
#ifdef HAVE_FDATASYNC
{"fdatasync", SYNC_METHOD_FDATASYNC, false},
#endif
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
{"open_sync", SYNC_METHOD_OPEN, false},
#endif
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
{"open_datasync", SYNC_METHOD_OPEN_DSYNC, false},
#endif
{NULL, 0, false}
@@ -7894,10 +7894,10 @@ get_sync_bit(int method)
/*
* Optimize writes by bypassing kernel cache with O_DIRECT when using
- * O_SYNC/O_FSYNC and O_DSYNC. But only if archiving and streaming are
- * disabled, otherwise the archive command or walsender process will read
- * the WAL soon after writing it, which is guaranteed to cause a physical
- * read if we bypassed the kernel cache. We also skip the
+ * O_SYNC and O_DSYNC. But only if archiving and streaming are disabled,
+ * otherwise the archive command or walsender process will read the WAL
+ * soon after writing it, which is guaranteed to cause a physical read if
+ * we bypassed the kernel cache. We also skip the
* posix_fadvise(POSIX_FADV_DONTNEED) call in XLogFileClose() for the same
* reason.
*
@@ -7921,13 +7921,13 @@ get_sync_bit(int method)
case SYNC_METHOD_FSYNC_WRITETHROUGH:
case SYNC_METHOD_FDATASYNC:
return 0;
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
case SYNC_METHOD_OPEN:
- return OPEN_SYNC_FLAG | o_direct_flag;
+ return O_SYNC | o_direct_flag;
#endif
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
case SYNC_METHOD_OPEN_DSYNC:
- return OPEN_DATASYNC_FLAG | o_direct_flag;
+ return O_DSYNC | o_direct_flag;
#endif
default:
/* can't happen (unless we are out of sync with option array) */
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index f7bc199a30..6739214eb8 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -300,7 +300,7 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "open_datasync");
fflush(stdout);
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
if ((tmpfile = open_direct(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
@@ -407,8 +407,8 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "open_sync");
fflush(stdout);
-#ifdef OPEN_SYNC_FLAG
- if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+#ifdef O_SYNC
+ if ((tmpfile = open_direct(filename, O_RDWR | O_SYNC | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
fs_warning = true;
@@ -466,7 +466,7 @@ test_open_syncs(void)
static void
test_open_sync(const char *msg, int writes_size)
{
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
int tmpfile,
ops,
writes;
@@ -475,8 +475,8 @@ test_open_sync(const char *msg, int writes_size)
printf(LABEL_FORMAT, msg);
fflush(stdout);
-#ifdef OPEN_SYNC_FLAG
- if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+#ifdef O_SYNC
+ if ((tmpfile = open_direct(filename, O_RDWR | O_SYNC | PG_BINARY, 0)) == -1)
printf(NA_FORMAT, _("n/a*"));
else
{
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index a47e3eeb1f..a720169b17 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -70,27 +70,10 @@ typedef uint16 RepOriginId;
* default method. We assume that fsync() is always available, and that
* configure determined whether fdatasync() is.
*/
-#if defined(O_SYNC)
-#define OPEN_SYNC_FLAG O_SYNC
-#elif defined(O_FSYNC)
-#define OPEN_SYNC_FLAG O_FSYNC
-#endif
-
-#if defined(O_DSYNC)
-#if defined(OPEN_SYNC_FLAG)
-/* O_DSYNC is distinct? */
-#if O_DSYNC != OPEN_SYNC_FLAG
-#define OPEN_DATASYNC_FLAG O_DSYNC
-#endif
-#else /* !defined(OPEN_SYNC_FLAG) */
-/* Win32 only has O_DSYNC */
-#define OPEN_DATASYNC_FLAG O_DSYNC
-#endif
-#endif
#if defined(PLATFORM_DEFAULT_SYNC_METHOD)
#define DEFAULT_SYNC_METHOD PLATFORM_DEFAULT_SYNC_METHOD
-#elif defined(OPEN_DATASYNC_FLAG)
+#elif defined(O_DSYNC) && O_DSYNC != O_SYNC
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
#elif defined(HAVE_FDATASYNC)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
--
2.36.1
0002-Remove-fdatasync-configure-probe.patchtext/x-patch; charset=US-ASCII; name=0002-Remove-fdatasync-configure-probe.patchDownload
From 461e3591096933562f70112897e2a59dc5765190 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 20 Jul 2022 14:10:37 +1200
Subject: [PATCH 2/2] Remove fdatasync configure probe.
Every system we target has POSIX fdatasync, except Windows where we
provide a replacement in src/port/fdatasync.c. We can remove the
configure probe and associated #ifdefs.
We retain the probe for the function declaration, which allows us to
supply the missing declaration for macOS and Windows.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
---
configure | 59 +--------------------------
configure.ac | 3 --
src/backend/access/transam/xlog.c | 4 --
src/backend/storage/file/fd.c | 8 ----
src/bin/pg_test_fsync/pg_test_fsync.c | 4 --
src/include/access/xlogdefs.h | 4 +-
src/include/pg_config.h.in | 3 --
src/include/port/freebsd.h | 2 -
src/include/port/win32_port.h | 6 ---
src/tools/msvc/Solution.pm | 1 -
10 files changed, 2 insertions(+), 92 deletions(-)
diff --git a/configure b/configure
index 59fa82b8d7..6edee4af91 100755
--- a/configure
+++ b/configure
@@ -12315,63 +12315,6 @@ if test "$ac_res" != no; then :
fi
-# Solaris:
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing fdatasync" >&5
-$as_echo_n "checking for library containing fdatasync... " >&6; }
-if ${ac_cv_search_fdatasync+:} false; then :
- $as_echo_n "(cached) " >&6
-else
- ac_func_search_save_LIBS=$LIBS
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h. */
-
-/* Override any GCC internal prototype to avoid an error.
- Use char because int might match the return type of a GCC
- builtin and then its argument prototype would still apply. */
-#ifdef __cplusplus
-extern "C"
-#endif
-char fdatasync ();
-int
-main ()
-{
-return fdatasync ();
- ;
- return 0;
-}
-_ACEOF
-for ac_lib in '' rt posix4; do
- if test -z "$ac_lib"; then
- ac_res="none required"
- else
- ac_res=-l$ac_lib
- LIBS="-l$ac_lib $ac_func_search_save_LIBS"
- fi
- if ac_fn_c_try_link "$LINENO"; then :
- ac_cv_search_fdatasync=$ac_res
-fi
-rm -f core conftest.err conftest.$ac_objext \
- conftest$ac_exeext
- if ${ac_cv_search_fdatasync+:} false; then :
- break
-fi
-done
-if ${ac_cv_search_fdatasync+:} false; then :
-
-else
- ac_cv_search_fdatasync=no
-fi
-rm conftest.$ac_ext
-LIBS=$ac_func_search_save_LIBS
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_fdatasync" >&5
-$as_echo "$ac_cv_search_fdatasync" >&6; }
-ac_res=$ac_cv_search_fdatasync
-if test "$ac_res" != no; then :
- test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
-
-fi
-
# Cygwin:
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing shmget" >&5
$as_echo_n "checking for library containing shmget... " >&6; }
@@ -16039,7 +15982,7 @@ fi
LIBS_including_readline="$LIBS"
LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
-for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit inet_pton kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pthread_is_threaded_np readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink syncfs sync_file_range uselocale wcstombs_l writev
+for ac_func in backtrace_symbols clock_gettime copyfile getifaddrs getpeerucred getrlimit inet_pton kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pthread_is_threaded_np readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink syncfs sync_file_range uselocale wcstombs_l writev
do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 612dabf698..177bee2507 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1253,8 +1253,6 @@ AC_SEARCH_LIBS(getopt_long, [getopt gnugetopt])
AC_SEARCH_LIBS(shm_open, rt)
AC_SEARCH_LIBS(shm_unlink, rt)
AC_SEARCH_LIBS(clock_gettime, [rt posix4])
-# Solaris:
-AC_SEARCH_LIBS(fdatasync, [rt posix4])
# Cygwin:
AC_SEARCH_LIBS(shmget, cygipc)
# *BSD:
@@ -1796,7 +1794,6 @@ AC_CHECK_FUNCS(m4_normalize([
backtrace_symbols
clock_gettime
copyfile
- fdatasync
getifaddrs
getpeerucred
getrlimit
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1da3b8eb2e..fc972873c8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -168,9 +168,7 @@ const struct config_enum_entry sync_method_options[] = {
#ifdef HAVE_FSYNC_WRITETHROUGH
{"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false},
#endif
-#ifdef HAVE_FDATASYNC
{"fdatasync", SYNC_METHOD_FDATASYNC, false},
-#endif
#ifdef O_SYNC
{"open_sync", SYNC_METHOD_OPEN, false},
#endif
@@ -8015,12 +8013,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
msg = _("could not fsync write-through file \"%s\": %m");
break;
#endif
-#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
if (pg_fdatasync(fd) != 0)
msg = _("could not fdatasync file \"%s\": %m");
break;
-#endif
case SYNC_METHOD_OPEN:
case SYNC_METHOD_OPEN_DSYNC:
/* not reachable */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index f904f60c08..c3169d14e6 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -442,20 +442,12 @@ pg_fsync_writethrough(int fd)
/*
* pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off
- *
- * Not all platforms have fdatasync; treat as fsync if not available.
*/
int
pg_fdatasync(int fd)
{
if (enableFsync)
- {
-#ifdef HAVE_FDATASYNC
return fdatasync(fd);
-#else
- return fsync(fd);
-#endif
- }
else
return 0;
}
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 6739214eb8..93785bba0a 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -331,7 +331,6 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "fdatasync");
fflush(stdout);
-#ifdef HAVE_FDATASYNC
if ((tmpfile = open(filename, O_RDWR | PG_BINARY, 0)) == -1)
die("could not open output file");
START_TIMER;
@@ -347,9 +346,6 @@ test_sync(int writes_per_op)
}
STOP_TIMER;
close(tmpfile);
-#else
- printf(NA_FORMAT, _("n/a"));
-#endif
/*
* Test fsync
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index a720169b17..d2901e5270 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -75,10 +75,8 @@ typedef uint16 RepOriginId;
#define DEFAULT_SYNC_METHOD PLATFORM_DEFAULT_SYNC_METHOD
#elif defined(O_DSYNC) && O_DSYNC != O_SYNC
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
-#elif defined(HAVE_FDATASYNC)
-#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
#else
-#define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
+#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
#endif
#endif /* XLOG_DEFS_H */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 529fb84a86..7da7fb22e4 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -188,9 +188,6 @@
/* Define to 1 if you have the `explicit_bzero' function. */
#undef HAVE_EXPLICIT_BZERO
-/* Define to 1 if you have the `fdatasync' function. */
-#undef HAVE_FDATASYNC
-
/* Define to 1 if you have the `fls' function. */
#undef HAVE_FLS
diff --git a/src/include/port/freebsd.h b/src/include/port/freebsd.h
index 2e2e749a6b..0e3fde55d6 100644
--- a/src/include/port/freebsd.h
+++ b/src/include/port/freebsd.h
@@ -5,6 +5,4 @@
* would prefer open_datasync on FreeBSD 13+, but that is not a good choice on
* many systems.
*/
-#ifdef HAVE_FDATASYNC
#define PLATFORM_DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
-#endif
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 5ea66528fa..5121c0c626 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -83,12 +83,6 @@
#define HAVE_FSYNC_WRITETHROUGH
#define FSYNC_WRITETHROUGH_IS_FSYNC
-/*
- * We have a replacement for fdatasync() in src/port/fdatasync.c, which is
- * unconditionally used by MSVC and Mingw builds.
- */
-#define HAVE_FDATASYNC
-
#define USES_WINSOCK
/*
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 3ddcd024a7..49e2825974 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -257,7 +257,6 @@ sub GenerateFiles
HAVE_EDITLINE_READLINE_H => undef,
HAVE_EXECINFO_H => undef,
HAVE_EXPLICIT_BZERO => undef,
- HAVE_FDATASYNC => 1,
HAVE_FLS => undef,
HAVE_FSEEKO => 1,
HAVE_FUNCNAME__FUNC => undef,
--
2.36.1
On Wed, 20 Jul 2022 at 15:22, Thomas Munro <thomas.munro@gmail.com> wrote:
Ok, I've pushed the Windows patch. I'll watch the build farm to see
if I've broken any of the frankentoolchain Windows animals.
Just to get in there before the farm does... I just got a boatload of
redefinition of HAVE_FDATASYNC warnings. I see it already gets
defined in pg_config.h
All compiles cleanly with the attached.
David
Attachments:
ifndef_HAVE_FDATASYNC.patchtext/plain; charset=US-ASCII; name=ifndef_HAVE_FDATASYNC.patchDownload
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 5ea66528fa..4de5bf3bf6 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -87,7 +87,9 @@
* We have a replacement for fdatasync() in src/port/fdatasync.c, which is
* unconditionally used by MSVC and Mingw builds.
*/
+#ifndef HAVE_FDATASYNC
#define HAVE_FDATASYNC
+#endif
#define USES_WINSOCK
On Wed, Jul 20, 2022 at 4:08 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 20 Jul 2022 at 15:22, Thomas Munro <thomas.munro@gmail.com> wrote:
Ok, I've pushed the Windows patch. I'll watch the build farm to see
if I've broken any of the frankentoolchain Windows animals.Just to get in there before the farm does... I just got a boatload of
redefinition of HAVE_FDATASYNC warnings. I see it already gets
defined in pg_config.hAll compiles cleanly with the attached.
Oops. Thanks, pushed.
On Wed, Jul 20, 2022 at 4:14 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Jul 20, 2022 at 4:08 PM David Rowley <dgrowleyml@gmail.com> wrote:
On Wed, 20 Jul 2022 at 15:22, Thomas Munro <thomas.munro@gmail.com> wrote:
Ok, I've pushed the Windows patch. I'll watch the build farm to see
if I've broken any of the frankentoolchain Windows animals.Just to get in there before the farm does... I just got a boatload of
redefinition of HAVE_FDATASYNC warnings. I see it already gets
defined in pg_config.hAll compiles cleanly with the attached.
Oops. Thanks, pushed.
... and here's a rebase of the patch that removes that macro stuff,
since cfbot is watching this thread.
Attachments:
v2-0001-Remove-O_FSYNC-and-associated-macros.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Remove-O_FSYNC-and-associated-macros.patchDownload
From c0c3e67637abf67d9b946a08ddc5c597b37b40a1 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 20 Jul 2022 12:32:26 +1200
Subject: [PATCH v2 1/2] Remove O_FSYNC and associated macros.
O_FSYNC was a pre-standard way of spelling O_SYNC. It's not needed on
any modern system. We can just use O_SYNC directly, if it exists, and
get rid of our OPEN_SYNC_FLAG macro.
Likewise for O_DSYNC, we can just use it directly, if it exists, and get
rid of our OPEN_DATASYNC_FLAG macro. The only complication is that we
still want to avoid choosing open_datasync as a default if O_DSYNC has
the same value as O_SYNC, so there is no change in behavior.
---
src/backend/access/transam/xlog.c | 20 ++++++++++----------
src/bin/pg_test_fsync/pg_test_fsync.c | 12 ++++++------
src/include/access/xlogdefs.h | 19 +------------------
3 files changed, 17 insertions(+), 34 deletions(-)
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9854b51c6a..1da3b8eb2e 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -171,10 +171,10 @@ const struct config_enum_entry sync_method_options[] = {
#ifdef HAVE_FDATASYNC
{"fdatasync", SYNC_METHOD_FDATASYNC, false},
#endif
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
{"open_sync", SYNC_METHOD_OPEN, false},
#endif
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
{"open_datasync", SYNC_METHOD_OPEN_DSYNC, false},
#endif
{NULL, 0, false}
@@ -7894,10 +7894,10 @@ get_sync_bit(int method)
/*
* Optimize writes by bypassing kernel cache with O_DIRECT when using
- * O_SYNC/O_FSYNC and O_DSYNC. But only if archiving and streaming are
- * disabled, otherwise the archive command or walsender process will read
- * the WAL soon after writing it, which is guaranteed to cause a physical
- * read if we bypassed the kernel cache. We also skip the
+ * O_SYNC and O_DSYNC. But only if archiving and streaming are disabled,
+ * otherwise the archive command or walsender process will read the WAL
+ * soon after writing it, which is guaranteed to cause a physical read if
+ * we bypassed the kernel cache. We also skip the
* posix_fadvise(POSIX_FADV_DONTNEED) call in XLogFileClose() for the same
* reason.
*
@@ -7921,13 +7921,13 @@ get_sync_bit(int method)
case SYNC_METHOD_FSYNC_WRITETHROUGH:
case SYNC_METHOD_FDATASYNC:
return 0;
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
case SYNC_METHOD_OPEN:
- return OPEN_SYNC_FLAG | o_direct_flag;
+ return O_SYNC | o_direct_flag;
#endif
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
case SYNC_METHOD_OPEN_DSYNC:
- return OPEN_DATASYNC_FLAG | o_direct_flag;
+ return O_DSYNC | o_direct_flag;
#endif
default:
/* can't happen (unless we are out of sync with option array) */
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index f7bc199a30..6739214eb8 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -300,7 +300,7 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "open_datasync");
fflush(stdout);
-#ifdef OPEN_DATASYNC_FLAG
+#ifdef O_DSYNC
if ((tmpfile = open_direct(filename, O_RDWR | O_DSYNC | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
@@ -407,8 +407,8 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "open_sync");
fflush(stdout);
-#ifdef OPEN_SYNC_FLAG
- if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+#ifdef O_SYNC
+ if ((tmpfile = open_direct(filename, O_RDWR | O_SYNC | PG_BINARY, 0)) == -1)
{
printf(NA_FORMAT, _("n/a*"));
fs_warning = true;
@@ -466,7 +466,7 @@ test_open_syncs(void)
static void
test_open_sync(const char *msg, int writes_size)
{
-#ifdef OPEN_SYNC_FLAG
+#ifdef O_SYNC
int tmpfile,
ops,
writes;
@@ -475,8 +475,8 @@ test_open_sync(const char *msg, int writes_size)
printf(LABEL_FORMAT, msg);
fflush(stdout);
-#ifdef OPEN_SYNC_FLAG
- if ((tmpfile = open_direct(filename, O_RDWR | OPEN_SYNC_FLAG | PG_BINARY, 0)) == -1)
+#ifdef O_SYNC
+ if ((tmpfile = open_direct(filename, O_RDWR | O_SYNC | PG_BINARY, 0)) == -1)
printf(NA_FORMAT, _("n/a*"));
else
{
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index a47e3eeb1f..a720169b17 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -70,27 +70,10 @@ typedef uint16 RepOriginId;
* default method. We assume that fsync() is always available, and that
* configure determined whether fdatasync() is.
*/
-#if defined(O_SYNC)
-#define OPEN_SYNC_FLAG O_SYNC
-#elif defined(O_FSYNC)
-#define OPEN_SYNC_FLAG O_FSYNC
-#endif
-
-#if defined(O_DSYNC)
-#if defined(OPEN_SYNC_FLAG)
-/* O_DSYNC is distinct? */
-#if O_DSYNC != OPEN_SYNC_FLAG
-#define OPEN_DATASYNC_FLAG O_DSYNC
-#endif
-#else /* !defined(OPEN_SYNC_FLAG) */
-/* Win32 only has O_DSYNC */
-#define OPEN_DATASYNC_FLAG O_DSYNC
-#endif
-#endif
#if defined(PLATFORM_DEFAULT_SYNC_METHOD)
#define DEFAULT_SYNC_METHOD PLATFORM_DEFAULT_SYNC_METHOD
-#elif defined(OPEN_DATASYNC_FLAG)
+#elif defined(O_DSYNC) && O_DSYNC != O_SYNC
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
#elif defined(HAVE_FDATASYNC)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
--
2.36.1
v2-0002-Remove-fdatasync-configure-probe.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Remove-fdatasync-configure-probe.patchDownload
From b9067bfe81678376acd1b7779fbc68fbff85ed05 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 20 Jul 2022 14:10:37 +1200
Subject: [PATCH v2 2/2] Remove fdatasync configure probe.
Every system we target has POSIX fdatasync, except Windows where we
provide a replacement in src/port/fdatasync.c. We can remove the
configure probe and associated #ifdefs.
We retain the probe for the function declaration, which allows us to
supply the missing declaration for macOS and Windows.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
---
configure | 59 +--------------------------
configure.ac | 3 --
src/backend/access/transam/xlog.c | 4 --
src/backend/storage/file/fd.c | 8 ----
src/bin/pg_test_fsync/pg_test_fsync.c | 4 --
src/include/access/xlogdefs.h | 4 +-
src/include/pg_config.h.in | 3 --
src/include/port/freebsd.h | 2 -
src/include/port/win32_port.h | 8 ----
src/tools/msvc/Solution.pm | 1 -
10 files changed, 2 insertions(+), 94 deletions(-)
diff --git a/configure b/configure
index 59fa82b8d7..6edee4af91 100755
--- a/configure
+++ b/configure
@@ -12315,63 +12315,6 @@ if test "$ac_res" != no; then :
fi
-# Solaris:
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing fdatasync" >&5
-$as_echo_n "checking for library containing fdatasync... " >&6; }
-if ${ac_cv_search_fdatasync+:} false; then :
- $as_echo_n "(cached) " >&6
-else
- ac_func_search_save_LIBS=$LIBS
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h. */
-
-/* Override any GCC internal prototype to avoid an error.
- Use char because int might match the return type of a GCC
- builtin and then its argument prototype would still apply. */
-#ifdef __cplusplus
-extern "C"
-#endif
-char fdatasync ();
-int
-main ()
-{
-return fdatasync ();
- ;
- return 0;
-}
-_ACEOF
-for ac_lib in '' rt posix4; do
- if test -z "$ac_lib"; then
- ac_res="none required"
- else
- ac_res=-l$ac_lib
- LIBS="-l$ac_lib $ac_func_search_save_LIBS"
- fi
- if ac_fn_c_try_link "$LINENO"; then :
- ac_cv_search_fdatasync=$ac_res
-fi
-rm -f core conftest.err conftest.$ac_objext \
- conftest$ac_exeext
- if ${ac_cv_search_fdatasync+:} false; then :
- break
-fi
-done
-if ${ac_cv_search_fdatasync+:} false; then :
-
-else
- ac_cv_search_fdatasync=no
-fi
-rm conftest.$ac_ext
-LIBS=$ac_func_search_save_LIBS
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_fdatasync" >&5
-$as_echo "$ac_cv_search_fdatasync" >&6; }
-ac_res=$ac_cv_search_fdatasync
-if test "$ac_res" != no; then :
- test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
-
-fi
-
# Cygwin:
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing shmget" >&5
$as_echo_n "checking for library containing shmget... " >&6; }
@@ -16039,7 +15982,7 @@ fi
LIBS_including_readline="$LIBS"
LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
-for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit inet_pton kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pthread_is_threaded_np readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink syncfs sync_file_range uselocale wcstombs_l writev
+for ac_func in backtrace_symbols clock_gettime copyfile getifaddrs getpeerucred getrlimit inet_pton kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pthread_is_threaded_np readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink syncfs sync_file_range uselocale wcstombs_l writev
do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 612dabf698..177bee2507 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1253,8 +1253,6 @@ AC_SEARCH_LIBS(getopt_long, [getopt gnugetopt])
AC_SEARCH_LIBS(shm_open, rt)
AC_SEARCH_LIBS(shm_unlink, rt)
AC_SEARCH_LIBS(clock_gettime, [rt posix4])
-# Solaris:
-AC_SEARCH_LIBS(fdatasync, [rt posix4])
# Cygwin:
AC_SEARCH_LIBS(shmget, cygipc)
# *BSD:
@@ -1796,7 +1794,6 @@ AC_CHECK_FUNCS(m4_normalize([
backtrace_symbols
clock_gettime
copyfile
- fdatasync
getifaddrs
getpeerucred
getrlimit
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1da3b8eb2e..fc972873c8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -168,9 +168,7 @@ const struct config_enum_entry sync_method_options[] = {
#ifdef HAVE_FSYNC_WRITETHROUGH
{"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false},
#endif
-#ifdef HAVE_FDATASYNC
{"fdatasync", SYNC_METHOD_FDATASYNC, false},
-#endif
#ifdef O_SYNC
{"open_sync", SYNC_METHOD_OPEN, false},
#endif
@@ -8015,12 +8013,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
msg = _("could not fsync write-through file \"%s\": %m");
break;
#endif
-#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
if (pg_fdatasync(fd) != 0)
msg = _("could not fdatasync file \"%s\": %m");
break;
-#endif
case SYNC_METHOD_OPEN:
case SYNC_METHOD_OPEN_DSYNC:
/* not reachable */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index f904f60c08..c3169d14e6 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -442,20 +442,12 @@ pg_fsync_writethrough(int fd)
/*
* pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off
- *
- * Not all platforms have fdatasync; treat as fsync if not available.
*/
int
pg_fdatasync(int fd)
{
if (enableFsync)
- {
-#ifdef HAVE_FDATASYNC
return fdatasync(fd);
-#else
- return fsync(fd);
-#endif
- }
else
return 0;
}
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 6739214eb8..93785bba0a 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -331,7 +331,6 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "fdatasync");
fflush(stdout);
-#ifdef HAVE_FDATASYNC
if ((tmpfile = open(filename, O_RDWR | PG_BINARY, 0)) == -1)
die("could not open output file");
START_TIMER;
@@ -347,9 +346,6 @@ test_sync(int writes_per_op)
}
STOP_TIMER;
close(tmpfile);
-#else
- printf(NA_FORMAT, _("n/a"));
-#endif
/*
* Test fsync
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index a720169b17..d2901e5270 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -75,10 +75,8 @@ typedef uint16 RepOriginId;
#define DEFAULT_SYNC_METHOD PLATFORM_DEFAULT_SYNC_METHOD
#elif defined(O_DSYNC) && O_DSYNC != O_SYNC
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
-#elif defined(HAVE_FDATASYNC)
-#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
#else
-#define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
+#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
#endif
#endif /* XLOG_DEFS_H */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 529fb84a86..7da7fb22e4 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -188,9 +188,6 @@
/* Define to 1 if you have the `explicit_bzero' function. */
#undef HAVE_EXPLICIT_BZERO
-/* Define to 1 if you have the `fdatasync' function. */
-#undef HAVE_FDATASYNC
-
/* Define to 1 if you have the `fls' function. */
#undef HAVE_FLS
diff --git a/src/include/port/freebsd.h b/src/include/port/freebsd.h
index 2e2e749a6b..0e3fde55d6 100644
--- a/src/include/port/freebsd.h
+++ b/src/include/port/freebsd.h
@@ -5,6 +5,4 @@
* would prefer open_datasync on FreeBSD 13+, but that is not a good choice on
* many systems.
*/
-#ifdef HAVE_FDATASYNC
#define PLATFORM_DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
-#endif
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 4de5bf3bf6..5121c0c626 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -83,14 +83,6 @@
#define HAVE_FSYNC_WRITETHROUGH
#define FSYNC_WRITETHROUGH_IS_FSYNC
-/*
- * We have a replacement for fdatasync() in src/port/fdatasync.c, which is
- * unconditionally used by MSVC and Mingw builds.
- */
-#ifndef HAVE_FDATASYNC
-#define HAVE_FDATASYNC
-#endif
-
#define USES_WINSOCK
/*
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 3ddcd024a7..49e2825974 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -257,7 +257,6 @@ sub GenerateFiles
HAVE_EDITLINE_READLINE_H => undef,
HAVE_EXECINFO_H => undef,
HAVE_EXPLICIT_BZERO => undef,
- HAVE_FDATASYNC => 1,
HAVE_FLS => undef,
HAVE_FSEEKO => 1,
HAVE_FUNCNAME__FUNC => undef,
--
2.36.1
Hearing no objection, I committed the patch to remove O_FSYNC. The
next cleanup one I'll just leave here for now.
Attachments:
v3-0001-Remove-fdatasync-configure-probe.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Remove-fdatasync-configure-probe.patchDownload
From a50957d4c079a3703c9f8074287c7edb5654022c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.munro@gmail.com>
Date: Wed, 20 Jul 2022 14:10:37 +1200
Subject: [PATCH v3] Remove fdatasync configure probe.
Every system we target has POSIX fdatasync, except Windows where we
provide a replacement in src/port/fdatasync.c. We can remove the
configure probe and associated #ifdefs.
We retain the probe for the function declaration, which allows us to
supply the missing declaration for macOS and Windows.
Discussion: https://postgr.es/m/CA%2BhUKGJZJVO%3DiX%2Beb-PXi2_XS9ZRqnn_4URh0NUQOwt6-_51xQ%40mail.gmail.com
---
configure | 59 +--------------------------
configure.ac | 3 --
src/backend/access/transam/xlog.c | 4 --
src/backend/storage/file/fd.c | 8 ----
src/bin/pg_test_fsync/pg_test_fsync.c | 4 --
src/include/access/xlogdefs.h | 7 +---
src/include/pg_config.h.in | 3 --
src/include/port/freebsd.h | 2 -
src/include/port/win32_port.h | 8 ----
src/tools/msvc/Solution.pm | 1 -
10 files changed, 3 insertions(+), 96 deletions(-)
diff --git a/configure b/configure
index 33a425562f..f13822fb3d 100755
--- a/configure
+++ b/configure
@@ -12315,63 +12315,6 @@ if test "$ac_res" != no; then :
fi
-# Solaris:
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing fdatasync" >&5
-$as_echo_n "checking for library containing fdatasync... " >&6; }
-if ${ac_cv_search_fdatasync+:} false; then :
- $as_echo_n "(cached) " >&6
-else
- ac_func_search_save_LIBS=$LIBS
-cat confdefs.h - <<_ACEOF >conftest.$ac_ext
-/* end confdefs.h. */
-
-/* Override any GCC internal prototype to avoid an error.
- Use char because int might match the return type of a GCC
- builtin and then its argument prototype would still apply. */
-#ifdef __cplusplus
-extern "C"
-#endif
-char fdatasync ();
-int
-main ()
-{
-return fdatasync ();
- ;
- return 0;
-}
-_ACEOF
-for ac_lib in '' rt posix4; do
- if test -z "$ac_lib"; then
- ac_res="none required"
- else
- ac_res=-l$ac_lib
- LIBS="-l$ac_lib $ac_func_search_save_LIBS"
- fi
- if ac_fn_c_try_link "$LINENO"; then :
- ac_cv_search_fdatasync=$ac_res
-fi
-rm -f core conftest.err conftest.$ac_objext \
- conftest$ac_exeext
- if ${ac_cv_search_fdatasync+:} false; then :
- break
-fi
-done
-if ${ac_cv_search_fdatasync+:} false; then :
-
-else
- ac_cv_search_fdatasync=no
-fi
-rm conftest.$ac_ext
-LIBS=$ac_func_search_save_LIBS
-fi
-{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_search_fdatasync" >&5
-$as_echo "$ac_cv_search_fdatasync" >&6; }
-ac_res=$ac_cv_search_fdatasync
-if test "$ac_res" != no; then :
- test "$ac_res" = "none required" || LIBS="$ac_res $LIBS"
-
-fi
-
# Cygwin:
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for library containing shmget" >&5
$as_echo_n "checking for library containing shmget... " >&6; }
@@ -16039,7 +15982,7 @@ fi
LIBS_including_readline="$LIBS"
LIBS=`echo "$LIBS" | sed -e 's/-ledit//g' -e 's/-lreadline//g'`
-for ac_func in backtrace_symbols clock_gettime copyfile fdatasync getifaddrs getpeerucred getrlimit inet_pton kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pthread_is_threaded_np readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink syncfs sync_file_range uselocale wcstombs_l writev
+for ac_func in backtrace_symbols clock_gettime copyfile getifaddrs getpeerucred getrlimit inet_pton kqueue mbstowcs_l memset_s poll posix_fallocate ppoll pthread_is_threaded_np readlink readv setproctitle setproctitle_fast setsid shm_open strchrnul strsignal symlink syncfs sync_file_range uselocale wcstombs_l writev
do :
as_ac_var=`$as_echo "ac_cv_func_$ac_func" | $as_tr_sh`
ac_fn_c_check_func "$LINENO" "$ac_func" "$as_ac_var"
diff --git a/configure.ac b/configure.ac
index 5295cb5806..0b64b35b39 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1253,8 +1253,6 @@ AC_SEARCH_LIBS(getopt_long, [getopt gnugetopt])
AC_SEARCH_LIBS(shm_open, rt)
AC_SEARCH_LIBS(shm_unlink, rt)
AC_SEARCH_LIBS(clock_gettime, [rt posix4])
-# Solaris:
-AC_SEARCH_LIBS(fdatasync, [rt posix4])
# Cygwin:
AC_SEARCH_LIBS(shmget, cygipc)
# *BSD:
@@ -1796,7 +1794,6 @@ AC_CHECK_FUNCS(m4_normalize([
backtrace_symbols
clock_gettime
copyfile
- fdatasync
getifaddrs
getpeerucred
getrlimit
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 1da3b8eb2e..fc972873c8 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -168,9 +168,7 @@ const struct config_enum_entry sync_method_options[] = {
#ifdef HAVE_FSYNC_WRITETHROUGH
{"fsync_writethrough", SYNC_METHOD_FSYNC_WRITETHROUGH, false},
#endif
-#ifdef HAVE_FDATASYNC
{"fdatasync", SYNC_METHOD_FDATASYNC, false},
-#endif
#ifdef O_SYNC
{"open_sync", SYNC_METHOD_OPEN, false},
#endif
@@ -8015,12 +8013,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli)
msg = _("could not fsync write-through file \"%s\": %m");
break;
#endif
-#ifdef HAVE_FDATASYNC
case SYNC_METHOD_FDATASYNC:
if (pg_fdatasync(fd) != 0)
msg = _("could not fdatasync file \"%s\": %m");
break;
-#endif
case SYNC_METHOD_OPEN:
case SYNC_METHOD_OPEN_DSYNC:
/* not reachable */
diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index f904f60c08..c3169d14e6 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -442,20 +442,12 @@ pg_fsync_writethrough(int fd)
/*
* pg_fdatasync --- same as fdatasync except does nothing if enableFsync is off
- *
- * Not all platforms have fdatasync; treat as fsync if not available.
*/
int
pg_fdatasync(int fd)
{
if (enableFsync)
- {
-#ifdef HAVE_FDATASYNC
return fdatasync(fd);
-#else
- return fsync(fd);
-#endif
- }
else
return 0;
}
diff --git a/src/bin/pg_test_fsync/pg_test_fsync.c b/src/bin/pg_test_fsync/pg_test_fsync.c
index 6739214eb8..93785bba0a 100644
--- a/src/bin/pg_test_fsync/pg_test_fsync.c
+++ b/src/bin/pg_test_fsync/pg_test_fsync.c
@@ -331,7 +331,6 @@ test_sync(int writes_per_op)
printf(LABEL_FORMAT, "fdatasync");
fflush(stdout);
-#ifdef HAVE_FDATASYNC
if ((tmpfile = open(filename, O_RDWR | PG_BINARY, 0)) == -1)
die("could not open output file");
START_TIMER;
@@ -347,9 +346,6 @@ test_sync(int writes_per_op)
}
STOP_TIMER;
close(tmpfile);
-#else
- printf(NA_FORMAT, _("n/a"));
-#endif
/*
* Test fsync
diff --git a/src/include/access/xlogdefs.h b/src/include/access/xlogdefs.h
index 810cd1fd86..49e581925c 100644
--- a/src/include/access/xlogdefs.h
+++ b/src/include/access/xlogdefs.h
@@ -67,8 +67,7 @@ typedef uint16 RepOriginId;
/*
* This chunk of hackery attempts to determine which file sync methods
* are available on the current platform, and to choose an appropriate
- * default method. We assume that fsync() is always available, and that
- * configure determined whether fdatasync() is.
+ * default method.
*
* Note that we define our own O_DSYNC on Windows, but not O_SYNC.
*/
@@ -76,10 +75,8 @@ typedef uint16 RepOriginId;
#define DEFAULT_SYNC_METHOD PLATFORM_DEFAULT_SYNC_METHOD
#elif defined(O_DSYNC) && (!defined(O_SYNC) || O_DSYNC != O_SYNC)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN_DSYNC
-#elif defined(HAVE_FDATASYNC)
-#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
#else
-#define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
+#define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
#endif
#endif /* XLOG_DEFS_H */
diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index 86d0b1941b..e72a181a73 100644
--- a/src/include/pg_config.h.in
+++ b/src/include/pg_config.h.in
@@ -188,9 +188,6 @@
/* Define to 1 if you have the `explicit_bzero' function. */
#undef HAVE_EXPLICIT_BZERO
-/* Define to 1 if you have the `fdatasync' function. */
-#undef HAVE_FDATASYNC
-
/* Define to 1 if fseeko (and presumably ftello) exists and is declared. */
#undef HAVE_FSEEKO
diff --git a/src/include/port/freebsd.h b/src/include/port/freebsd.h
index 2e2e749a6b..0e3fde55d6 100644
--- a/src/include/port/freebsd.h
+++ b/src/include/port/freebsd.h
@@ -5,6 +5,4 @@
* would prefer open_datasync on FreeBSD 13+, but that is not a good choice on
* many systems.
*/
-#ifdef HAVE_FDATASYNC
#define PLATFORM_DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
-#endif
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index 4de5bf3bf6..5121c0c626 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -83,14 +83,6 @@
#define HAVE_FSYNC_WRITETHROUGH
#define FSYNC_WRITETHROUGH_IS_FSYNC
-/*
- * We have a replacement for fdatasync() in src/port/fdatasync.c, which is
- * unconditionally used by MSVC and Mingw builds.
- */
-#ifndef HAVE_FDATASYNC
-#define HAVE_FDATASYNC
-#endif
-
#define USES_WINSOCK
/*
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 23296527ae..207cf5b7b0 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -257,7 +257,6 @@ sub GenerateFiles
HAVE_EDITLINE_READLINE_H => undef,
HAVE_EXECINFO_H => undef,
HAVE_EXPLICIT_BZERO => undef,
- HAVE_FDATASYNC => 1,
HAVE_FSEEKO => 1,
HAVE_FUNCNAME__FUNC => undef,
HAVE_FUNCNAME__FUNCTION => 1,
--
2.36.1
David kindly ran some tests of this thing on real hardware. The
results were mostly in line with expectations, but we learned some new
things.
TL;DR We probably should consider this as a safer default, but it'd
be good for someone more hands-on with this OS and knowledgeable about
storage to investigate and propose that. My original goal here was
primarily Unix/Windows harmonisation and cleanup since I'm doing a
bunch of hacking on I/O, but I can't unsee an
unsafe-at-least-on-consumer-gear default now that I've seen it. The
main thing I'm aware of that we don't know yet is what happens if you
try it on a non-NTFS file system (ReFS? SMB?) -- hopefully it falls
back to fsync behaviour.
Observations from an old Windows 8.1 system with a SATA drive:
1. So far you can apparently still actually compile and run on 8.1,
despite recent commits to de-support it.
2. You can use the new wal_sync_method=fdatasync, without error, and
timings are consistent with falling back to full fsync behaviour.
That makes sense, I guess, because the function existed. It's just a
new flag bit, and the default behaviour for flags == 0 was already
their fsync. That seems like a good outcome even though 8.1 isn't a
target anymore.
Observations from a current Windows 11 system with an NVMe drive:
1. fdatasync is faster than fsync, as expected. Twice as fast with
write cache disabled, a bit faster with write cache enabled.
2. Timings seem to suggest that open_datasync (the current default)
is not really writing through the drive cache. I'd previously thought
that was a SATA-only problem based on [1]https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505, which said that EIDE/SATA
drivers did not pass through the FUA flag that NTFS sends for
FILE_FLAG_WRITE_THROUGH (= O_DSYNC) on the basis that many drives
ignored it anyway, but these numbers seem to suggest that David's
recent-ish NVMe system has the same problem as the old SATA system.
Generally, Windows' approach seems to be that NTFS
FILE_FLAG_WRITE_THROUGH fires an FUA flag into the storage stack, and
either the driver or the drive is free to fling it out the window, and
it's the user's problem to worry about that, whereas Linux at least
asks nicely if the drive understands FUA and falls back to flushing
the whole cache if not[2]https://techcommunity.microsoft.com/t5/sql-server-blog/sql-server-on-linux-forced-unit-access-fua-internals/ba-p/3199102. I also know that Linux has been flaky
around this in the past too, especially on consumer storage, and macOS
and at least some of the older BSD/UFS systems just don't do this
stuff at all for user data (yet) so it's not like there is anything
universal about this topic. Note that drive caches are enabled by
default in Windows, and our manual does already tell you about this
problem[3]https://www.postgresql.org/docs/devel/wal-reliability.html.
One thing to note about the numbers below: pg_test_fsync.c's
open_datasync test is also using FILE_FLAG_NO_BUFFERING (= O_DIRECT),
unlike PostgreSQL, which muddies the waters slightly. (There was a
patch upthread to fix that and report both numbers, I may come back to
that.)
Windows 11, NVMe, write cache enabled:
open_datasync 27306.286 ops/sec 37 usecs/op
fdatasync 3065.428 ops/sec 326 usecs/op
fsync 2577.498 ops/sec 388 usecs/op
Windows 11, NVMe, write cache disabled:
open_datasync 3477.258 ops/sec 288 usecs/op
fdatasync 3263.418 ops/sec 306 usecs/op
fsync 1641.502 ops/sec 609 usecs/op
Windows 8.1, SATA:
open_datasync 19934.532 ops/sec 50 usecs/op
fdatasync 231.429 ops/sec 4321 usecs/op
fsync 240.050 ops/sec 4166 usecs/op
(We couldn't figure out how to disable the write cache on the 8.1
machine -- the usual checkbox had no effect -- but we didn't waste
time investigating that old system beyond the curiosity of checking if
it'd work at all.)
[1]: https://devblogs.microsoft.com/oldnewthing/20170510-00/?p=95505
[2]: https://techcommunity.microsoft.com/t5/sql-server-blog/sql-server-on-linux-forced-unit-access-fua-internals/ba-p/3199102
[3]: https://www.postgresql.org/docs/devel/wal-reliability.html