Direct I/O issues
I've been trying to optimize a Linux system where benchmarking suggests
large performance differences between the various wal_sync_method options
(with o_sync being the big winner). I started that by using
src/tools/fsync/test_fsync to get an idea what I was dealing with (and to
spot which drives had write caching turned on). Since those results
didn't match what I was seeing in the benchmarks, I've been browsing the
backend source to figure out why. I noticed test_fsync appears to be,
ahem, out of sync with what the engine is doing.
It looks like V8.1 introduced O_DIRECT writes to the WAL, determined at
compile time by a series of preprocessor tests in
src/backend/access/transam/xlog.c When O_DIRECT is available,
O_SYNC/O_FSYNC/O_DSYNC writes use it. test_fsync doesn't do that.
I moved the new code (in 8.2 beta 3, lines 61-92 in xlog.c) into
test_fsync; all the flags had the same name so it dropped right in. You
can get the version I made at http://www.westnet.com/~gsmith/test_fsync.c
(fixed a compiler warning, too)
The results I get now look fishy. I'm not sure if I screwed up a step, or
if I'm seeing a real problem. The system here is running RedHat Linux,
RHEL ES 4.0 kernel 2.6.9, and the disk I'm writing to is a standard
7200RPM IDE drive. I turned off write caching with hdparm -W 0
Here's an excerpt from the stock test_fsync:
Compare one o_sync write to two:
one 16k o_sync write 8.717944
two 8k o_sync writes 17.501980
Compare file sync methods with 2 8k writes:
(o_dsync unavailable)
open o_sync, write 17.018495
write, fdatasync 8.842473
write, fsync, 8.809117
And here's the version I tried to modify to include O_DIRECT support:
Compare one o_sync write to two:
one 16k o_sync write 0.004995
two 8k o_sync writes 0.003027
Compare file sync methods with 2 8k writes:
(o_dsync unavailable)
open o_sync, write 0.004978
write, fdatasync 8.845498
write, fsync, 8.834037
Obivously the o_sync writes aren't waiting for the disk. Is this a
problem with O_DIRECT under Linux? Or is my code just not correctly
testing this behavior?
Just as a sanity check, I did try this on another system, running SuSE
with drives connected to a cciss SCSI device, and I got exactly the same
results. I'm concerned that Linux users who use O_SYNC because they
notice it's faster will be losing their WAL integrity without being aware
of the problem, especially as the whole O_DIRECT business isn't even
mentioned in the WAL documentation--it really deserves to be brought up in
the wal_sync_method notes at
http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.html
And while I'm mentioning improvements to that particular documentation
page...the wal_buffers notes there are so sparse they misled me initially.
They suggest only bumping it up for situations with very large
transactions; since I was testing with small ones I left it woefully
undersized initially. I would suggest copying the text from
http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html to
here: "When full_page_writes is set and the system is very busy, setting
this value higher will help smooth response times during the period
immediately following each checkpoint." That seems to match what I found
in testing.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
I have applied your test_fsync patch for 8.2. Thanks.
---------------------------------------------------------------------------
Greg Smith wrote:
I've been trying to optimize a Linux system where benchmarking suggests
large performance differences between the various wal_sync_method options
(with o_sync being the big winner). I started that by using
src/tools/fsync/test_fsync to get an idea what I was dealing with (and to
spot which drives had write caching turned on). Since those results
didn't match what I was seeing in the benchmarks, I've been browsing the
backend source to figure out why. I noticed test_fsync appears to be,
ahem, out of sync with what the engine is doing.It looks like V8.1 introduced O_DIRECT writes to the WAL, determined at
compile time by a series of preprocessor tests in
src/backend/access/transam/xlog.c When O_DIRECT is available,
O_SYNC/O_FSYNC/O_DSYNC writes use it. test_fsync doesn't do that.I moved the new code (in 8.2 beta 3, lines 61-92 in xlog.c) into
test_fsync; all the flags had the same name so it dropped right in. You
can get the version I made at http://www.westnet.com/~gsmith/test_fsync.c
(fixed a compiler warning, too)The results I get now look fishy. I'm not sure if I screwed up a step, or
if I'm seeing a real problem. The system here is running RedHat Linux,
RHEL ES 4.0 kernel 2.6.9, and the disk I'm writing to is a standard
7200RPM IDE drive. I turned off write caching with hdparm -W 0Here's an excerpt from the stock test_fsync:
Compare one o_sync write to two:
one 16k o_sync write 8.717944
two 8k o_sync writes 17.501980Compare file sync methods with 2 8k writes:
(o_dsync unavailable)
open o_sync, write 17.018495
write, fdatasync 8.842473
write, fsync, 8.809117And here's the version I tried to modify to include O_DIRECT support:
Compare one o_sync write to two:
one 16k o_sync write 0.004995
two 8k o_sync writes 0.003027Compare file sync methods with 2 8k writes:
(o_dsync unavailable)
open o_sync, write 0.004978
write, fdatasync 8.845498
write, fsync, 8.834037Obivously the o_sync writes aren't waiting for the disk. Is this a
problem with O_DIRECT under Linux? Or is my code just not correctly
testing this behavior?Just as a sanity check, I did try this on another system, running SuSE
with drives connected to a cciss SCSI device, and I got exactly the same
results. I'm concerned that Linux users who use O_SYNC because they
notice it's faster will be losing their WAL integrity without being aware
of the problem, especially as the whole O_DIRECT business isn't even
mentioned in the WAL documentation--it really deserves to be brought up in
the wal_sync_method notes at
http://developer.postgresql.org/pgdocs/postgres/runtime-config-wal.htmlAnd while I'm mentioning improvements to that particular documentation
page...the wal_buffers notes there are so sparse they misled me initially.
They suggest only bumping it up for situations with very large
transactions; since I was testing with small ones I left it woefully
undersized initially. I would suggest copying the text from
http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html to
here: "When full_page_writes is set and the system is very busy, setting
this value higher will help smooth response times during the period
immediately following each checkpoint." That seems to match what I found
in testing.--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/rtmp/difftext/x-diffDownload
*** /pg/tools/fsync/test_fsync.c Fri Oct 13 10:18:33 2006
--- test_fsync.c Thu Nov 23 00:24:49 2006
***************
*** 14,19 ****
--- 14,20 ----
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
+ #include <string.h>
#ifdef WIN32
#define FSYNC_FILENAME "./test_fsync.out"
***************
*** 21,40 ****
#define FSYNC_FILENAME "/var/tmp/test_fsync.out"
#endif
! /* O_SYNC and O_FSYNC are the same */
#if defined(O_SYNC)
! #define OPEN_SYNC_FLAG O_SYNC
#elif defined(O_FSYNC)
! #define OPEN_SYNC_FLAG O_FSYNC
! #elif defined(O_DSYNC)
! #define OPEN_DATASYNC_FLAG O_DSYNC
#endif
#if defined(OPEN_SYNC_FLAG)
! #if defined(O_DSYNC) && (O_DSYNC != OPEN_SYNC_FLAG)
! #define OPEN_DATASYNC_FLAG O_DSYNC
#endif
#endif
#define WAL_FILE_SIZE (16 * 1024 * 1024)
--- 22,54 ----
#define FSYNC_FILENAME "/var/tmp/test_fsync.out"
#endif
! /* This logic comes from src/backend/access/transam/xlog.c where it's
! better documented */
! #ifdef O_DIRECT
! #define PG_O_DIRECT O_DIRECT
! #else
! #define PG_O_DIRECT 0
! #endif
!
#if defined(O_SYNC)
! #define BARE_OPEN_SYNC_FLAG O_SYNC
#elif defined(O_FSYNC)
! #define BARE_OPEN_SYNC_FLAG O_FSYNC
! #endif
! #ifdef BARE_OPEN_SYNC_FLAG
! #define OPEN_SYNC_FLAG (BARE_OPEN_SYNC_FLAG | PG_O_DIRECT)
#endif
+ #if defined(O_DSYNC)
#if defined(OPEN_SYNC_FLAG)
! #if O_DSYNC != BARE_OPEN_SYNC_FLAG
! #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
! #endif
! #else
! #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
#endif
#endif
+
#define WAL_FILE_SIZE (16 * 1024 * 1024)
Greg Smith <gsmith@gregsmith.com> writes:
The results I get now look fishy.
There are at least two things wrong with this program:
* It does not respect the alignment requirement for O_DIRECT buffers
(reportedly either 512 or 4096 bytes depending on filesystem).
* It does not check for errors (if it had, you might have realized the
other problem).
regards, tom lane
Bruce Momjian <bruce@momjian.us> writes:
I have applied your test_fsync patch for 8.2. Thanks.
... which means test_fsync is now broken. Why did you apply a patch
when the author pointed out that the program isn't working?
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
I have applied your test_fsync patch for 8.2. Thanks.
... which means test_fsync is now broken. Why did you apply a patch
when the author pointed out that the program isn't working?
I thought his code was OK, but the OS had issues. Clearly we need to
update test_fsync.c because it doesn't match the code. I have reverted
the patch but some day we need a fixed version.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Thu, 23 Nov 2006, Tom Lane wrote:
* It does not check for errors (if it had, you might have realized the
other problem).
All the test_fsync code needs to check for errors better; there have been
multiple occasions where I've run that with quesiontable input and it
didn't complain, it just happily ran and reported times that were almost
0.
Thanks for the note about alignment, I had seen something about that in
the xlog.c but wasn't sure if that was important in this case.
It's very important to the project I'm working on that I get this cleared
up, and I think I'm in a good position to fix it myself now. I just
wanted to report the issue and get some initial feedback on what's wrong.
I'll try to rewrite that code with an eye toward the "Determine optimal
fdatasync/fsync, O_SYNC/O_DSYNC options" to-do item, which is what I'd
really like to have.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote:
On Thu, 23 Nov 2006, Tom Lane wrote:
* It does not check for errors (if it had, you might have realized the
other problem).All the test_fsync code needs to check for errors better; there have been
multiple occasions where I've run that with quesiontable input and it
didn't complain, it just happily ran and reported times that were almost
0.Thanks for the note about alignment, I had seen something about that in
the xlog.c but wasn't sure if that was important in this case.It's very important to the project I'm working on that I get this cleared
up, and I think I'm in a good position to fix it myself now. I just
wanted to report the issue and get some initial feedback on what's wrong.
I'll try to rewrite that code with an eye toward the "Determine optimal
fdatasync/fsync, O_SYNC/O_DSYNC options" to-do item, which is what I'd
really like to have.
Please send an updated patch for test_fsync.c so we can get it working
for 8.2.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Greg Smith wrote:
On Thu, 23 Nov 2006, Tom Lane wrote:
* It does not check for errors (if it had, you might have realized the
other problem).All the test_fsync code needs to check for errors better; there have been
multiple occasions where I've run that with quesiontable input and it
didn't complain, it just happily ran and reported times that were almost
0.Thanks for the note about alignment, I had seen something about that in
the xlog.c but wasn't sure if that was important in this case.It's very important to the project I'm working on that I get this cleared
up, and I think I'm in a good position to fix it myself now. I just
wanted to report the issue and get some initial feedback on what's wrong.
I'll try to rewrite that code with an eye toward the "Determine optimal
fdatasync/fsync, O_SYNC/O_DSYNC options" to-do item, which is what I'd
really like to have.
I have developed a patch that moves the defines into a include file
where they can be used by the backend and test_fsync.c. I have also set
up things so there is proper alignment for O_DIRECT, and added error
checking.
Not sure if people want this for 8.2. I think we can modify
test_fsync.c anytime but the movement of the defines into an include
file is a backend code change.
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/pgpatches/fsynctext/x-diffDownload
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.257
diff -c -c -r1.257 xlog.c
*** src/backend/access/transam/xlog.c 21 Nov 2006 20:59:52 -0000 1.257
--- src/backend/access/transam/xlog.c 24 Nov 2006 18:57:39 -0000
***************
*** 49,127 ****
#include "utils/pg_locale.h"
- /*
- * Because O_DIRECT bypasses the kernel buffers, and because we never
- * read those buffers except during crash recovery, it is a win to use
- * it in all cases where we sync on each write(). We could allow O_DIRECT
- * with fsync(), but because skipping the kernel buffer forces writes out
- * quickly, it seems best just to use it for O_SYNC. It is hard to imagine
- * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
- * Also, O_DIRECT is never enough to force data to the drives, it merely
- * tries to bypass the kernel cache, so we still need O_SYNC or fsync().
- */
- #ifdef O_DIRECT
- #define PG_O_DIRECT O_DIRECT
- #else
- #define PG_O_DIRECT 0
- #endif
-
- /*
- * This chunk of hackery attempts to determine which file sync methods
- * are available on the current platform, and to choose an appropriate
- * default method. We assume that fsync() is always available, and that
- * configure determined whether fdatasync() is.
- */
- #if defined(O_SYNC)
- #define BARE_OPEN_SYNC_FLAG O_SYNC
- #elif defined(O_FSYNC)
- #define BARE_OPEN_SYNC_FLAG O_FSYNC
- #endif
- #ifdef BARE_OPEN_SYNC_FLAG
- #define OPEN_SYNC_FLAG (BARE_OPEN_SYNC_FLAG | PG_O_DIRECT)
- #endif
-
- #if defined(O_DSYNC)
- #if defined(OPEN_SYNC_FLAG)
- /* O_DSYNC is distinct? */
- #if O_DSYNC != BARE_OPEN_SYNC_FLAG
- #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
- #endif
- #else /* !defined(OPEN_SYNC_FLAG) */
- /* Win32 only has O_DSYNC */
- #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
- #endif
- #endif
-
- #if defined(OPEN_DATASYNC_FLAG)
- #define DEFAULT_SYNC_METHOD_STR "open_datasync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN
- #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG
- #elif defined(HAVE_FDATASYNC)
- #define DEFAULT_SYNC_METHOD_STR "fdatasync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
- #define DEFAULT_SYNC_FLAGBIT 0
- #elif defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
- #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
- #define DEFAULT_SYNC_FLAGBIT 0
- #else
- #define DEFAULT_SYNC_METHOD_STR "fsync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
- #define DEFAULT_SYNC_FLAGBIT 0
- #endif
-
-
- /*
- * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
- * but XLOG_BLCKSZ is assumed to be enough for it.
- */
- #ifdef O_DIRECT
- #define ALIGNOF_XLOG_BUFFER XLOG_BLCKSZ
- #else
- #define ALIGNOF_XLOG_BUFFER ALIGNOF_BUFFER
- #endif
-
-
/* File path names (all relative to $PGDATA) */
#define BACKUP_LABEL_FILE "backup_label"
#define BACKUP_LABEL_OLD "backup_label.old"
--- 49,54 ----
Index: src/include/access/xlog.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/access/xlog.h,v
retrieving revision 1.75
diff -c -c -r1.75 xlog.h
*** src/include/access/xlog.h 5 Nov 2006 22:42:10 -0000 1.75
--- src/include/access/xlog.h 24 Nov 2006 18:57:42 -0000
***************
*** 90,95 ****
--- 90,167 ----
extern int sync_method;
/*
+ * Because O_DIRECT bypasses the kernel buffers, and because we never
+ * read those buffers except during crash recovery, it is a win to use
+ * it in all cases where we sync on each write(). We could allow O_DIRECT
+ * with fsync(), but because skipping the kernel buffer forces writes out
+ * quickly, it seems best just to use it for O_SYNC. It is hard to imagine
+ * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
+ * Also, O_DIRECT is never enough to force data to the drives, it merely
+ * tries to bypass the kernel cache, so we still need O_SYNC or fsync().
+ */
+ #ifdef O_DIRECT
+ #define PG_O_DIRECT O_DIRECT
+ #else
+ #define PG_O_DIRECT 0
+ #endif
+
+ /*
+ * This chunk of hackery attempts to determine which file sync methods
+ * are available on the current platform, and to choose an appropriate
+ * default method. We assume that fsync() is always available, and that
+ * configure determined whether fdatasync() is.
+ */
+ #if defined(O_SYNC)
+ #define BARE_OPEN_SYNC_FLAG O_SYNC
+ #elif defined(O_FSYNC)
+ #define BARE_OPEN_SYNC_FLAG O_FSYNC
+ #endif
+ #ifdef BARE_OPEN_SYNC_FLAG
+ #define OPEN_SYNC_FLAG (BARE_OPEN_SYNC_FLAG | PG_O_DIRECT)
+ #endif
+
+ #if defined(O_DSYNC)
+ #if defined(OPEN_SYNC_FLAG)
+ /* O_DSYNC is distinct? */
+ #if O_DSYNC != BARE_OPEN_SYNC_FLAG
+ #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
+ #endif
+ #else /* !defined(OPEN_SYNC_FLAG) */
+ /* Win32 only has O_DSYNC */
+ #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
+ #endif
+ #endif
+
+ #if defined(OPEN_DATASYNC_FLAG)
+ #define DEFAULT_SYNC_METHOD_STR "open_datasync"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN
+ #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG
+ #elif defined(HAVE_FDATASYNC)
+ #define DEFAULT_SYNC_METHOD_STR "fdatasync"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
+ #define DEFAULT_SYNC_FLAGBIT 0
+ #elif defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
+ #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
+ #define DEFAULT_SYNC_FLAGBIT 0
+ #else
+ #define DEFAULT_SYNC_METHOD_STR "fsync"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
+ #define DEFAULT_SYNC_FLAGBIT 0
+ #endif
+
+
+ /*
+ * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
+ * but XLOG_BLCKSZ is assumed to be enough for it.
+ */
+ #ifdef O_DIRECT
+ #define ALIGNOF_XLOG_BUFFER XLOG_BLCKSZ
+ #else
+ #define ALIGNOF_XLOG_BUFFER ALIGNOF_BUFFER
+ #endif
+
+ /*
* The rmgr data to be written by XLogInsert() is defined by a chain of
* one or more XLogRecData structs. (Multiple structs would be used when
* parts of the source data aren't physically adjacent in memory, or when
Index: src/tools/fsync/test_fsync.c
===================================================================
RCS file: /cvsroot/pgsql/src/tools/fsync/test_fsync.c,v
retrieving revision 1.16
diff -c -c -r1.16 test_fsync.c
*** src/tools/fsync/test_fsync.c 23 Nov 2006 17:20:47 -0000 1.16
--- src/tools/fsync/test_fsync.c 24 Nov 2006 18:57:43 -0000
***************
*** 1,10 ****
/*
* test_fsync.c
! * tests if fsync can be done from another process than the original write
*/
! #include "../../include/pg_config.h"
! #include "../../include/pg_config_os.h"
#include <sys/types.h>
#include <sys/stat.h>
--- 1,12 ----
/*
* test_fsync.c
! * test various fsync() methods
*/
! #include "postgres.h"
!
! #include "access/xlog_internal.h"
! #include "access/xlog.h"
#include <sys/types.h>
#include <sys/stat.h>
***************
*** 14,42 ****
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
#ifdef WIN32
#define FSYNC_FILENAME "./test_fsync.out"
#else
#define FSYNC_FILENAME "/var/tmp/test_fsync.out"
#endif
! /* O_SYNC and O_FSYNC are the same */
! #if defined(O_SYNC)
! #define OPEN_SYNC_FLAG O_SYNC
! #elif defined(O_FSYNC)
! #define OPEN_SYNC_FLAG O_FSYNC
! #elif defined(O_DSYNC)
! #define OPEN_DATASYNC_FLAG O_DSYNC
! #endif
!
! #if defined(OPEN_SYNC_FLAG)
! #if defined(O_DSYNC) && (O_DSYNC != OPEN_SYNC_FLAG)
! #define OPEN_DATASYNC_FLAG O_DSYNC
! #endif
! #endif
! #define WAL_FILE_SIZE (16 * 1024 * 1024)
void die(char *str);
void print_elapse(struct timeval start_t, struct timeval elapse_t);
--- 16,34 ----
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
+ #include <string.h>
#ifdef WIN32
#define FSYNC_FILENAME "./test_fsync.out"
#else
+ /* /tmp might be a memory file system */
#define FSYNC_FILENAME "/var/tmp/test_fsync.out"
#endif
! #define WRITE_SIZE (16 * 1024)
! /* We allocate extra to guarantee ALIGNOF_XLOG_BUFFER alignment */
! #define ALLOCATE_WAL_FILE_SIZE (WRITE_SIZE + ALIGNOF_XLOG_BUFFER)
void die(char *str);
void print_elapse(struct timeval start_t, struct timeval elapse_t);
***************
*** 49,55 ****
int tmpfile,
i,
loops = 1000;
! char *strout = (char *) malloc(WAL_FILE_SIZE);
char *filename = FSYNC_FILENAME;
if (argc > 2 && strcmp(argv[1], "-f") == 0)
--- 41,47 ----
int tmpfile,
i,
loops = 1000;
! char *full_buf = (char *) malloc(ALLOCATE_WAL_FILE_SIZE), *buf;
char *filename = FSYNC_FILENAME;
if (argc > 2 && strcmp(argv[1], "-f") == 0)
***************
*** 62,74 ****
if (argc > 1)
loops = atoi(argv[1]);
! for (i = 0; i < WAL_FILE_SIZE; i++)
! strout[i] = 'a';
if ((tmpfile = open(filename, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR)) == -1)
die("Cannot open output file.");
! write(tmpfile, strout, WAL_FILE_SIZE);
! fsync(tmpfile); /* fsync so later fsync's don't have to do it */
close(tmpfile);
printf("Simple write timing:\n");
--- 54,70 ----
if (argc > 1)
loops = atoi(argv[1]);
! buf = (char *)TYPEALIGN(ALIGNOF_XLOG_BUFFER, full_buf);
! for (i = 0; i < WRITE_SIZE; i++)
! buf[i] = 'a';
if ((tmpfile = open(filename, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR)) == -1)
die("Cannot open output file.");
! if (write(tmpfile, buf, WRITE_SIZE) != WRITE_SIZE)
! die("write failed");
! /* fsync so later fsync's don't have to do it */
! if (fsync(tmpfile) != 0)
! die("fsync failed");
close(tmpfile);
printf("Simple write timing:\n");
***************
*** 78,84 ****
{
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! write(tmpfile, strout, 8192);
close(tmpfile);
}
gettimeofday(&elapse_t, NULL);
--- 74,81 ----
{
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
close(tmpfile);
}
gettimeofday(&elapse_t, NULL);
***************
*** 95,102 ****
{
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! write(tmpfile, strout, 8192);
! fsync(tmpfile);
close(tmpfile);
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
--- 92,101 ----
{
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (fsync(tmpfile) != 0)
! die("fsync failed");
close(tmpfile);
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
***************
*** 114,125 ****
{
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! write(tmpfile, strout, 8192);
close(tmpfile);
/* reopen file */
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! fsync(tmpfile);
close(tmpfile);
}
gettimeofday(&elapse_t, NULL);
--- 113,126 ----
{
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
close(tmpfile);
/* reopen file */
if ((tmpfile = open(filename, O_RDWR)) == -1)
die("Cannot open output file.");
! if (fsync(tmpfile) != 0)
! die("fsync failed");
close(tmpfile);
}
gettimeofday(&elapse_t, NULL);
***************
*** 135,141 ****
die("Cannot open output file.");
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
! write(tmpfile, strout, 16384);
gettimeofday(&elapse_t, NULL);
close(tmpfile);
printf("\tone 16k o_sync write ");
--- 136,143 ----
die("Cannot open output file.");
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
! if (write(tmpfile, buf, WRITE_SIZE) != WRITE_SIZE)
! die("write failed");
gettimeofday(&elapse_t, NULL);
close(tmpfile);
printf("\tone 16k o_sync write ");
***************
*** 148,155 ****
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! write(tmpfile, strout, 8192);
! write(tmpfile, strout, 8192);
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
--- 150,159 ----
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
***************
*** 169,175 ****
die("Cannot open output file.");
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
! write(tmpfile, strout, 8192);
gettimeofday(&elapse_t, NULL);
close(tmpfile);
printf("\topen o_dsync, write ");
--- 173,180 ----
die("Cannot open output file.");
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
gettimeofday(&elapse_t, NULL);
close(tmpfile);
printf("\topen o_dsync, write ");
***************
*** 181,187 ****
die("Cannot open output file.");
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
! write(tmpfile, strout, 8192);
gettimeofday(&elapse_t, NULL);
close(tmpfile);
printf("\topen o_sync, write ");
--- 186,193 ----
die("Cannot open output file.");
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
gettimeofday(&elapse_t, NULL);
close(tmpfile);
printf("\topen o_sync, write ");
***************
*** 199,205 ****
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! write(tmpfile, strout, 8192);
fdatasync(tmpfile);
}
gettimeofday(&elapse_t, NULL);
--- 205,212 ----
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
fdatasync(tmpfile);
}
gettimeofday(&elapse_t, NULL);
***************
*** 217,224 ****
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! write(tmpfile, strout, 8192);
! fsync(tmpfile);
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
--- 224,233 ----
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (fsync(tmpfile) != 0)
! die("fsync failed");
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
***************
*** 235,242 ****
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! write(tmpfile, strout, 8192);
! write(tmpfile, strout, 8192);
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
--- 244,253 ----
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
***************
*** 254,261 ****
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! write(tmpfile, strout, 8192);
! write(tmpfile, strout, 8192);
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
--- 265,274 ----
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
***************
*** 271,278 ****
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! write(tmpfile, strout, 8192);
! write(tmpfile, strout, 8192);
fdatasync(tmpfile);
}
gettimeofday(&elapse_t, NULL);
--- 284,293 ----
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
fdatasync(tmpfile);
}
gettimeofday(&elapse_t, NULL);
***************
*** 290,298 ****
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! write(tmpfile, strout, 8192);
! write(tmpfile, strout, 8192);
! fsync(tmpfile);
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
--- 305,316 ----
gettimeofday(&start_t, NULL);
for (i = 0; i < loops; i++)
{
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (write(tmpfile, buf, WRITE_SIZE/2) != WRITE_SIZE/2)
! die("write failed");
! if (fsync(tmpfile) != 0)
! die("fsync failed");
}
gettimeofday(&elapse_t, NULL);
close(tmpfile);
***************
*** 300,305 ****
--- 318,324 ----
print_elapse(start_t, elapse_t);
printf("\n");
+ free(full_buf);
unlink(filename);
return 0;
Bruce Momjian <bruce@momjian.us> writes:
Not sure if people want this for 8.2. I think we can modify
test_fsync.c anytime but the movement of the defines into an include
file is a backend code change.
I think fooling with this on the day before RC1 is an unreasonable risk ...
and I disapprove of moving this code into a widely-used include file
like xlog.h, too.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Not sure if people want this for 8.2. I think we can modify
test_fsync.c anytime but the movement of the defines into an include
file is a backend code change.I think fooling with this on the day before RC1 is an unreasonable risk ...
and I disapprove of moving this code into a widely-used include file
like xlog.h, too.
OK, you want a separate include or xlog_internal.h? And should I put in
just the test_fsync changes next week so at least we are closer to
having it work for 8.2?
--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
On Fri, 24 Nov 2006, Bruce Momjian wrote:
OK, I modified test_fsync.c by copying the defines from xlog.c, and
fixed the O_DIRECT alignment and check write()/fsync().
I just tested your new test_fsync as included in the 8.2rc1, and it's
working perfectly for me now on Linux. All the O_SYNC writes using
O_DIRECT are reporting realistic timings. I'm happy that this code is
working as it should and appreciate the quick response. I still think the
wal_sync_method documentation deserves an update noting that O_DIRECT is
used when available with the sync write methods.
--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Import Notes
Reply to msg id not found: 200611250123.kAP1NZL03480@momjian.usReference msg id not found: 200611250123.kAP1NZL03480@momjian.us | Resolved by subject fallback
Greg Smith wrote:
On Fri, 24 Nov 2006, Bruce Momjian wrote:
OK, I modified test_fsync.c by copying the defines from xlog.c, and
fixed the O_DIRECT alignment and check write()/fsync().I just tested your new test_fsync as included in the 8.2rc1, and it's
working perfectly for me now on Linux. All the O_SYNC writes using
O_DIRECT are reporting realistic timings. I'm happy that this code is
working as it should and appreciate the quick response. I still think the
wal_sync_method documentation deserves an update noting that O_DIRECT is
used when available with the sync write methods.
O_DIRECT mention added, and backpatched to 8.2.X.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/rtmp/difftext/x-diffDownload
Index: doc/src/sgml/config.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v
retrieving revision 1.108
diff -c -c -r1.108 config.sgml
*** doc/src/sgml/config.sgml 1 Feb 2007 00:28:16 -0000 1.108
--- doc/src/sgml/config.sgml 8 Feb 2007 03:52:01 -0000
***************
*** 1385,1390 ****
--- 1385,1391 ----
Not all of these choices are available on all platforms.
The default is the first method in the above list that is supported
by the platform.
+ The <literal>open_</>* options also use <literal>O_DIRECT</> if available.
This parameter can only be set in the <filename>postgresql.conf</>
file or on the server command line.
</para>
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Not sure if people want this for 8.2. I think we can modify
test_fsync.c anytime but the movement of the defines into an include
file is a backend code change.I think fooling with this on the day before RC1 is an unreasonable risk ...
and I disapprove of moving this code into a widely-used include file
like xlog.h, too.
fsync method defines moved to /include/access/xlogdefs.h so they can be
used by test_fsync.c.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachments:
/rtmp/difftext/x-diffDownload
Index: src/backend/access/transam/xlog.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
retrieving revision 1.263
diff -c -c -r1.263 xlog.c
*** src/backend/access/transam/xlog.c 8 Feb 2007 11:10:27 -0000 1.263
--- src/backend/access/transam/xlog.c 14 Feb 2007 04:51:23 -0000
***************
*** 31,36 ****
--- 31,37 ----
#include "access/twophase.h"
#include "access/xact.h"
#include "access/xlog_internal.h"
+ #include "access/xlogdefs.h"
#include "access/xlogutils.h"
#include "catalog/catversion.h"
#include "catalog/pg_control.h"
***************
*** 49,126 ****
#include "utils/pg_locale.h"
- /*
- * Because O_DIRECT bypasses the kernel buffers, and because we never
- * read those buffers except during crash recovery, it is a win to use
- * it in all cases where we sync on each write(). We could allow O_DIRECT
- * with fsync(), but because skipping the kernel buffer forces writes out
- * quickly, it seems best just to use it for O_SYNC. It is hard to imagine
- * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
- * Also, O_DIRECT is never enough to force data to the drives, it merely
- * tries to bypass the kernel cache, so we still need O_SYNC or fsync().
- */
- #ifdef O_DIRECT
- #define PG_O_DIRECT O_DIRECT
- #else
- #define PG_O_DIRECT 0
- #endif
-
- /*
- * This chunk of hackery attempts to determine which file sync methods
- * are available on the current platform, and to choose an appropriate
- * default method. We assume that fsync() is always available, and that
- * configure determined whether fdatasync() is.
- */
- #if defined(O_SYNC)
- #define BARE_OPEN_SYNC_FLAG O_SYNC
- #elif defined(O_FSYNC)
- #define BARE_OPEN_SYNC_FLAG O_FSYNC
- #endif
- #ifdef BARE_OPEN_SYNC_FLAG
- #define OPEN_SYNC_FLAG (BARE_OPEN_SYNC_FLAG | PG_O_DIRECT)
- #endif
-
- #if defined(O_DSYNC)
- #if defined(OPEN_SYNC_FLAG)
- /* O_DSYNC is distinct? */
- #if O_DSYNC != BARE_OPEN_SYNC_FLAG
- #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
- #endif
- #else /* !defined(OPEN_SYNC_FLAG) */
- /* Win32 only has O_DSYNC */
- #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
- #endif
- #endif
-
- #if defined(OPEN_DATASYNC_FLAG)
- #define DEFAULT_SYNC_METHOD_STR "open_datasync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN
- #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG
- #elif defined(HAVE_FDATASYNC)
- #define DEFAULT_SYNC_METHOD_STR "fdatasync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
- #define DEFAULT_SYNC_FLAGBIT 0
- #elif defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
- #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
- #define DEFAULT_SYNC_FLAGBIT 0
- #else
- #define DEFAULT_SYNC_METHOD_STR "fsync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
- #define DEFAULT_SYNC_FLAGBIT 0
- #endif
-
-
- /*
- * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
- * but XLOG_BLCKSZ is assumed to be enough for it.
- */
- #ifdef O_DIRECT
- #define ALIGNOF_XLOG_BUFFER XLOG_BLCKSZ
- #else
- #define ALIGNOF_XLOG_BUFFER ALIGNOF_BUFFER
- #endif
-
/* File path names (all relative to $PGDATA) */
#define BACKUP_LABEL_FILE "backup_label"
--- 50,55 ----
Index: src/include/access/xlogdefs.h
===================================================================
RCS file: /cvsroot/pgsql/src/include/access/xlogdefs.h,v
retrieving revision 1.16
diff -c -c -r1.16 xlogdefs.h
*** src/include/access/xlogdefs.h 5 Jan 2007 22:19:51 -0000 1.16
--- src/include/access/xlogdefs.h 14 Feb 2007 04:51:24 -0000
***************
*** 63,66 ****
--- 63,137 ----
*/
typedef uint32 TimeLineID;
+ /*
+ * Because O_DIRECT bypasses the kernel buffers, and because we never
+ * read those buffers except during crash recovery, it is a win to use
+ * it in all cases where we sync on each write(). We could allow O_DIRECT
+ * with fsync(), but because skipping the kernel buffer forces writes out
+ * quickly, it seems best just to use it for O_SYNC. It is hard to imagine
+ * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
+ * Also, O_DIRECT is never enough to force data to the drives, it merely
+ * tries to bypass the kernel cache, so we still need O_SYNC or fsync().
+ */
+ #ifdef O_DIRECT
+ #define PG_O_DIRECT O_DIRECT
+ #else
+ #define PG_O_DIRECT 0
+ #endif
+
+ /*
+ * This chunk of hackery attempts to determine which file sync methods
+ * are available on the current platform, and to choose an appropriate
+ * default method. We assume that fsync() is always available, and that
+ * configure determined whether fdatasync() is.
+ */
+ #if defined(O_SYNC)
+ #define BARE_OPEN_SYNC_FLAG O_SYNC
+ #elif defined(O_FSYNC)
+ #define BARE_OPEN_SYNC_FLAG O_FSYNC
+ #endif
+ #ifdef BARE_OPEN_SYNC_FLAG
+ #define OPEN_SYNC_FLAG (BARE_OPEN_SYNC_FLAG | PG_O_DIRECT)
+ #endif
+
+ #if defined(O_DSYNC)
+ #if defined(OPEN_SYNC_FLAG)
+ /* O_DSYNC is distinct? */
+ #if O_DSYNC != BARE_OPEN_SYNC_FLAG
+ #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
+ #endif
+ #else /* !defined(OPEN_SYNC_FLAG) */
+ /* Win32 only has O_DSYNC */
+ #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
+ #endif
+ #endif
+
+ #if defined(OPEN_DATASYNC_FLAG)
+ #define DEFAULT_SYNC_METHOD_STR "open_datasync"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN
+ #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG
+ #elif defined(HAVE_FDATASYNC)
+ #define DEFAULT_SYNC_METHOD_STR "fdatasync"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
+ #define DEFAULT_SYNC_FLAGBIT 0
+ #elif defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
+ #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
+ #define DEFAULT_SYNC_FLAGBIT 0
+ #else
+ #define DEFAULT_SYNC_METHOD_STR "fsync"
+ #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
+ #define DEFAULT_SYNC_FLAGBIT 0
+ #endif
+
+ /*
+ * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
+ * but XLOG_BLCKSZ is assumed to be enough for it.
+ */
+ #ifdef O_DIRECT
+ #define ALIGNOF_XLOG_BUFFER XLOG_BLCKSZ
+ #else
+ #define ALIGNOF_XLOG_BUFFER ALIGNOF_BUFFER
+ #endif
+
#endif /* XLOG_DEFS_H */
Index: src/tools/fsync/test_fsync.c
===================================================================
RCS file: /cvsroot/pgsql/src/tools/fsync/test_fsync.c,v
retrieving revision 1.17
diff -c -c -r1.17 test_fsync.c
*** src/tools/fsync/test_fsync.c 25 Nov 2006 01:22:28 -0000 1.17
--- src/tools/fsync/test_fsync.c 14 Feb 2007 04:51:26 -0000
***************
*** 7,12 ****
--- 7,13 ----
#include "access/xlog_internal.h"
#include "access/xlog.h"
+ #include "access/xlogdefs.h"
#include <sys/types.h>
#include <sys/stat.h>
***************
*** 18,100 ****
#include <unistd.h>
#include <string.h>
- /* ---------------------------------------------------------------
- * Copied from xlog.c. Some day this should be moved an include file.
- */
-
- /*
- * Because O_DIRECT bypasses the kernel buffers, and because we never
- * read those buffers except during crash recovery, it is a win to use
- * it in all cases where we sync on each write(). We could allow O_DIRECT
- * with fsync(), but because skipping the kernel buffer forces writes out
- * quickly, it seems best just to use it for O_SYNC. It is hard to imagine
- * how fsync() could be a win for O_DIRECT compared to O_SYNC and O_DIRECT.
- * Also, O_DIRECT is never enough to force data to the drives, it merely
- * tries to bypass the kernel cache, so we still need O_SYNC or fsync().
- */
- #ifdef O_DIRECT
- #define PG_O_DIRECT O_DIRECT
- #else
- #define PG_O_DIRECT 0
- #endif
-
- /*
- * This chunk of hackery attempts to determine which file sync methods
- * are available on the current platform, and to choose an appropriate
- * default method. We assume that fsync() is always available, and that
- * configure determined whether fdatasync() is.
- */
- #if defined(O_SYNC)
- #define BARE_OPEN_SYNC_FLAG O_SYNC
- #elif defined(O_FSYNC)
- #define BARE_OPEN_SYNC_FLAG O_FSYNC
- #endif
- #ifdef BARE_OPEN_SYNC_FLAG
- #define OPEN_SYNC_FLAG (BARE_OPEN_SYNC_FLAG | PG_O_DIRECT)
- #endif
-
- #if defined(O_DSYNC)
- #if defined(OPEN_SYNC_FLAG)
- /* O_DSYNC is distinct? */
- #if O_DSYNC != BARE_OPEN_SYNC_FLAG
- #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
- #endif
- #else /* !defined(OPEN_SYNC_FLAG) */
- /* Win32 only has O_DSYNC */
- #define OPEN_DATASYNC_FLAG (O_DSYNC | PG_O_DIRECT)
- #endif
- #endif
-
- #if defined(OPEN_DATASYNC_FLAG)
- #define DEFAULT_SYNC_METHOD_STR "open_datasync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_OPEN
- #define DEFAULT_SYNC_FLAGBIT OPEN_DATASYNC_FLAG
- #elif defined(HAVE_FDATASYNC)
- #define DEFAULT_SYNC_METHOD_STR "fdatasync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FDATASYNC
- #define DEFAULT_SYNC_FLAGBIT 0
- #elif defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
- #define DEFAULT_SYNC_METHOD_STR "fsync_writethrough"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
- #define DEFAULT_SYNC_FLAGBIT 0
- #else
- #define DEFAULT_SYNC_METHOD_STR "fsync"
- #define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC
- #define DEFAULT_SYNC_FLAGBIT 0
- #endif
-
-
- /*
- * Limitation of buffer-alignment for direct IO depends on OS and filesystem,
- * but XLOG_BLCKSZ is assumed to be enough for it.
- */
- #ifdef O_DIRECT
- #define ALIGNOF_XLOG_BUFFER XLOG_BLCKSZ
- #else
- #define ALIGNOF_XLOG_BUFFER ALIGNOF_BUFFER
- #endif
-
- /* ------------ from xlog.c --------------- */
#ifdef WIN32
#define FSYNC_FILENAME "./test_fsync.out"
--- 19,24 ----