Final(?) proposal for wal_sync_method changes
After reviewing the two ongoing threads about fixing the wal_sync_method
fiasco, I think there is general agreement on these two points:
1. open_datasync shouldn't be the default choice
2. O_DIRECT shouldn't be forcibly bundled in with O_DSYNC/O_SYNC
What I suggest we do about the latter is to invent two new
wal_sync_method values,
open_datasync_direct
open_sync_direct
which are defined only on platforms that define O_DIRECT (and O_DSYNC
or O_SYNC respectively). That puts it in the hands of the DBA whether
we try to use O_DIRECT or not. We'll still keep the hard-wired
optimization of disabling O_DIRECT when archiving or walreceiver are
active.
Dropping open_datasync as the first-choice default is something we have
to back-patch, but I'm less sure about it being a good idea to
back-patch the rearrangement of O_DIRECT management. Somebody who'd
explicitly specified open_sync or open_datasync as wal_sync_method
would find its behavior changing under him, which might be bad.
Comments?
regards, tom lane
On Tuesday 07 December 2010 17:24:14 Tom Lane wrote:
After reviewing the two ongoing threads about fixing the wal_sync_method
fiasco, I think there is general agreement on these two points:1. open_datasync shouldn't be the default choice
2. O_DIRECT shouldn't be forcibly bundled in with O_DSYNC/O_SYNCWhat I suggest we do about the latter is to invent two new
wal_sync_method values,
open_datasync_direct
open_sync_direct
which are defined only on platforms that define O_DIRECT (and O_DSYNC
or O_SYNC respectively). That puts it in the hands of the DBA whether
we try to use O_DIRECT or not. We'll still keep the hard-wired
optimization of disabling O_DIRECT when archiving or walreceiver are
active.Dropping open_datasync as the first-choice default is something we have
to back-patch, but I'm less sure about it being a good idea to
back-patch the rearrangement of O_DIRECT management. Somebody who'd
explicitly specified open_sync or open_datasync as wal_sync_method
would find its behavior changing under him, which might be bad.
I vote for changing the order but not doing the O_DIRECT stuff on the
backbranches.
As I am not seeing myself or clients of mine ever using any O_*SYNC variant I
am not strongly opionated about what to do there. But I guess adding those two
variants is not really much work.
Thanks,
Andres
On 12/07/2010 08:24 AM, Tom Lane wrote:
Dropping open_datasync as the first-choice default is something we have
to back-patch, but I'm less sure about it being a good idea to
back-patch the rearrangement of O_DIRECT management. Somebody who'd
explicitly specified open_sync or open_datasync as wal_sync_method
would find its behavior changing under him, which might be bad.
I agree for the backpatch that we should just swap to fdatasync as
default, and should not attempt to add the extra options.
In addition to the concerns above, adding new GUCS values in an update
release is something we should only do if required for a critical
security or data-loss bug. And this is neither.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
I agree for the backpatch that we should just swap to fdatasync as
default, and should not attempt to add the extra options.
I noticed while updating the documentation for this that the current
documentation is a flat-out lie. It claims that the preference order
for wal_sync_method is
open_datasync
fdatasync
fsync_writethrough
fsync
open_sync
ie you get the first-listed method that is supported on a given
platform. But this is not so: actually, fsync_writethrough will
be selected as default ONLY on Windows. There are other platforms
where the option exists, OS X being the one I have at hand. The
misstatement is masked on OS X because it also has open_datasync
and fdatasync. But since we are about to delete open_datasync from
the list, it's possible there are platforms where it will be exposed.
Oh, and just to add insult to injury, the above is what config.sgml
says, but postgresql.conf.sample says something different.
So I'm wondering whether we should correct the code to match the docs,
or vice versa. The former would just be a matter of saying
#elif defined(HAVE_FSYNC_WRITETHROUGH)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
in place of
#elif defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
To do the latter we'd have to say something like "The default is
fdatasync if it exists, else fsync, except on Windows where it is
fsync_writethrough".
Changing the code would result in a sudden, massive performance change
if there are any platforms for which fsync_writethrough exists but not
fdatasync. But I'm not sure if there are any.
Another point here is that it's not clear why we're selecting a
known-to-be-insecure default on OS X (where in fact all methods except
fsync_writethrough fail to push data to disk). We've been around on
that before, of course, and maybe now is not the time to change it.
Thoughts?
regards, tom lane
On 12/7/10 2:28 PM, Tom Lane wrote:
Another point here is that it's not clear why we're selecting a
known-to-be-insecure default on OS X (where in fact all methods except
fsync_writethrough fail to push data to disk). We've been around on
that before, of course, and maybe now is not the time to change it.
Because nobody sane uses OSX on the server?
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
On 12/7/10 2:28 PM, Tom Lane wrote:
Another point here is that it's not clear why we're selecting a
known-to-be-insecure default on OS X (where in fact all methods except
fsync_writethrough fail to push data to disk). We've been around on
that before, of course, and maybe now is not the time to change it.
Because nobody sane uses OSX on the server?
Some of us would make the same remark about Windows. But we go out of
our way to provide a safe default on that platform anyhow.
regards, tom lane
On 12/07/2010 06:11 PM, Tom Lane wrote:
Josh Berkus<josh@agliodbs.com> writes:
On 12/7/10 2:28 PM, Tom Lane wrote:
Another point here is that it's not clear why we're selecting a
known-to-be-insecure default on OS X (where in fact all methods except
fsync_writethrough fail to push data to disk). We've been around on
that before, of course, and maybe now is not the time to change it.Because nobody sane uses OSX on the server?
Some of us would make the same remark about Windows. But we go out of
our way to provide a safe default on that platform anyhow.
In practice, though, Windows is used a lot on servers and OSX isn't.
That means we are probably going to have lots less push on this sort of
thing from the OSX community, which is not to say that we shouldn't try
to be just as safe on OSX as we try to be everywhere else.
cheers
andrew
On Tue, 2010-12-07 at 18:11 -0500, Tom Lane wrote:
Josh Berkus <josh@agliodbs.com> writes:
On 12/7/10 2:28 PM, Tom Lane wrote:
Another point here is that it's not clear why we're selecting a
known-to-be-insecure default on OS X (where in fact all methods except
fsync_writethrough fail to push data to disk). We've been around on
that before, of course, and maybe now is not the time to change it.Because nobody sane uses OSX on the server?
Some of us would make the same remark about Windows. But we go out of
our way to provide a safe default on that platform anyhow.
Not to mention the assertion that people don't use OSX on a server is
patently false. They don't use it on big servers, but it is very popular
for the SMB (big time capital S).
JD
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
I wrote:
Some of us would make the same remark about Windows. But we go out of
our way to provide a safe default on that platform anyhow.
Oh, wait: that's not the case at all. As of recent releases, we support
open_datasync on Windows, and that's the default despite being unsafe.
You have to go and choose some nondefault drive setting to make it safe:
http://developer.postgresql.org/pgdocs/postgres/wal-reliability.html
So if we drop open_datasync from the preference list then Windows users
*will* see a sudden huge change in the default behavior.
Because open_datasync does now exist on Windows, this code in
xlogdefs.h:
#elif defined(HAVE_FSYNC_WRITETHROUGH_ONLY)
#define DEFAULT_SYNC_METHOD SYNC_METHOD_FSYNC_WRITETHROUGH
is actually dead code at the moment --- it can never be reached on any
platform.
I am unclear as to the reason why there is a test for
HAVE_FSYNC_WRITETHROUGH_ONLY in pg_fsync(). Perhaps that is also
leftover from a previous vision of how this all works? Or does an
fsync() call actually fail on Windows?
I am now tempted to suggest that HAVE_FSYNC_WRITETHROUGH_ONLY should go
away altogether. The documented and implemented behavior ought to be
that the default is "fdatasync if it exists, else fsync", full stop,
on every platform. On both Windows and OS X, you would need to switch
to fsync_writethrough or change OS-level options to get safe behavior;
which is the same as it is today.
regards, tom lane
"Joshua D. Drake" <jd@commandprompt.com> writes:
On Tue, 2010-12-07 at 18:11 -0500, Tom Lane wrote:
Josh Berkus <josh@agliodbs.com> writes:
Because nobody sane uses OSX on the server?
Some of us would make the same remark about Windows. But we go out of
our way to provide a safe default on that platform anyhow.
Not to mention the assertion that people don't use OSX on a server is
patently false. They don't use it on big servers, but it is very popular
for the SMB (big time capital S).
Actually the previous discussions about this are coming back to me now.
With the current code, we don't actually guarantee safe flush behavior
by default on ANY of the common consumer platforms. In the Linux case
we can't, because we can't monkey with hdparm settings. (I think the
same is true on BSDen ... anybody know?) On Windows and OS X we default
to open_datasync despite its not being safe on either platform. We
previously debated switching those to fsync_writethrough which would
make them safe by default, but decided not to, partly on grounds of the
inevitable ZOMG ITS SLOW backlash and partly on grounds of keeping
cross-platform consistency.
I don't think it's a good idea to reopen the fsync_writethrough debate
right now, certainly not for something we're contemplating
back-patching. I think what we'd better do is ensure that that is
*not* selected as the default, on either Windows or OS X. So we need to
get rid of the HAVE_FSYNC_WRITETHROUGH_ONLY hack.
regards, tom lane
I am unclear as to the reason why there is a test for
HAVE_FSYNC_WRITETHROUGH_ONLY in pg_fsync(). Perhaps that is also
leftover from a previous vision of how this all works? Or does an
fsync() call actually fail on Windows?
No, fsync responds fine. It just don't actually sync to disk.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
On Dec 7, 2010, at 2:43 PM, Josh Berkus wrote:
Because nobody sane uses OSX on the server?
The XServe running 10.5 server and 9.0.1 at the other end of the office takes your remark personally. :)
--
-- Christophe Pettus
xof@thebuild.com
Josh Berkus <josh@agliodbs.com> writes:
I am unclear as to the reason why there is a test for
HAVE_FSYNC_WRITETHROUGH_ONLY in pg_fsync(). Perhaps that is also
leftover from a previous vision of how this all works? Or does an
fsync() call actually fail on Windows?
No, fsync responds fine. It just don't actually sync to disk.
Right, which is also an accurate description of its behavior on OS X,
as well as Linux (if you didn't change hdparm settings). So the real
question here is what's the point of treating Windows differently.
regards, tom lane
xof@thebuild.com (Christophe Pettus) writes:
On Dec 7, 2010, at 2:43 PM, Josh Berkus wrote:
Because nobody sane uses OSX on the server?
The XServe running 10.5 server and 9.0.1 at the other end of the
office takes your remark personally. :)
I'd heard that Apple had cancelled XServe. [Poking back at that...]
Yep, they won't be carrying anything for sale that's particularly
rack-mountable after next January. Not precisely "dead," but definitely
moving on smelling funny...
--
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://www3.sympatico.ca/cbbrowne/emacs.html
Photons have mass? I didn't know they were catholic!
Right, which is also an accurate description of its behavior on OS X,
as well as Linux (if you didn't change hdparm settings). So the real
question here is what's the point of treating Windows differently.
So, sounds like we should continue treating fsync_writethrough the same
as we have been, and maybe add a doc patch covering some of the above?
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes:
Right, which is also an accurate description of its behavior on OS X,
as well as Linux (if you didn't change hdparm settings). So the real
question here is what's the point of treating Windows differently.
So, sounds like we should continue treating fsync_writethrough the same
as we have been, and maybe add a doc patch covering some of the above?
Yeah, this patch is shaping up to be about five lines of code change
and a hundred of docs ...
regards, tom lane
On Tue, 2010-12-07 at 19:00 -0500, Chris Browne wrote:
xof@thebuild.com (Christophe Pettus) writes:
On Dec 7, 2010, at 2:43 PM, Josh Berkus wrote:
Because nobody sane uses OSX on the server?
The XServe running 10.5 server and 9.0.1 at the other end of the
office takes your remark personally. :)I'd heard that Apple had cancelled XServe. [Poking back at that...]
Yep, they won't be carrying anything for sale that's particularly
rack-mountable after next January. Not precisely "dead," but definitely
moving on smelling funny...
A bit off topic but Apple is actually marketing OSX Server on the mini
(with RAID 1). Which honestly for 95% of the businesses out there would
make a very nice, reasonably performant database server for say....
PostBooks, or Drupal.
JD
--
(format nil "~S@~S" "cbbrowne" "gmail.com")
http://www3.sympatico.ca/cbbrowne/emacs.html
Photons have mass? I didn't know they were catholic!
--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
On 12/07/2010 07:32 PM, Joshua D. Drake wrote:
On Tue, 2010-12-07 at 19:00 -0500, Chris Browne wrote:
xof@thebuild.com (Christophe Pettus) writes:
On Dec 7, 2010, at 2:43 PM, Josh Berkus wrote:
Because nobody sane uses OSX on the server?
The XServe running 10.5 server and 9.0.1 at the other end of the
office takes your remark personally. :)I'd heard that Apple had cancelled XServe. [Poking back at that...]
Yep, they won't be carrying anything for sale that's particularly
rack-mountable after next January. Not precisely "dead," but definitely
moving on smelling funny...A bit off topic but Apple is actually marketing OSX Server on the mini
(with RAID 1). Which honestly for 95% of the businesses out there would
make a very nice, reasonably performant database server for say....
PostBooks, or Drupal.
Given the constant overheating I have had on my mini, I wouldn't use it
to host anything much ;-) But I guess YMMV.
cheers
andrerw
Josh Berkus <josh@agliodbs.com> writes:
I am unclear as to the reason why there is a test for
HAVE_FSYNC_WRITETHROUGH_ONLY in pg_fsync(). Perhaps that is also
leftover from a previous vision of how this all works? Or does an
fsync() call actually fail on Windows?
No, fsync responds fine. It just don't actually sync to disk.
Sigh ... The closer I look at the Windows code path here, the more of an
inconsistent, badly documented spaghetti-heap it appears to be. So far
as a quick Google search unearths, there is no fsync() primitive on
Windows. What we have actually got is this gem in port/win32.h:
/*
* Even though we don't support 'fsync' as a wal_sync_method,
* we do fsync() a few other places where _commit() is just fine.
*/
#define fsync(fd) _commit(fd)
So actually, there is no difference between selecting fsync and
fsync_writethrough on Windows, this comment and the SGML documentation
to the contrary. Both settings result in invoking _commit() and
presumably are safe. One wonders why we bothered to invent a separate
fsync_writethrough setting on Windows.
What this means is that switching to a simple preference order
"fdatasync, then fsync" will result in choosing fsync on Windows (since
it hasn't got fdatasync), meaning _commit, meaning Windows users see
a behavioral change after all.
I'm afraid that if we don't want a major behavioral change, there's
no option except having a Windows-specific rule for the choice of
default. It'll have to be "fdatasync, then fsync, except on Windows
where open_datasync is the default". Grumble. But it's not like
Windows hasn't got a hundred other special cases already.
Would someone verify via pgbench or similar test (*not* test_fsync) that
on Windows, wal_sync_method = fsync or fsync_writethrough perform the
same (ie tps ~= disk rotation rate) while open_datasync is too fast to
be real? I'm losing confidence that I've found all the spaghetti ends
here, and I don't have a Windows setup to try it myself.
regards, tom lane
Tom Lane wrote:
So actually, there is no difference between selecting fsync and
fsync_writethrough on Windows, this comment and the SGML documentation
to the contrary. Both settings result in invoking _commit() and
presumably are safe. One wonders why we bothered to invent a separate
fsync_writethrough setting on Windows.
Quite; I documented some the details about mapping to _commit() long ago
at http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm but
forgot to suggest fixing the mistakes in the docs afterwards (Windows is
not exactly my favorite platform).
http://archives.postgresql.org/pgsql-hackers/2005-08/msg00227.php
explains some of the history I think you're looking for here.
Would someone verify via pgbench or similar test (*not* test_fsync) that
on Windows, wal_sync_method = fsync or fsync_writethrough perform the
same (ie tps ~= disk rotation rate) while open_datasync is too fast to
be real? I'm losing confidence that I've found all the spaghetti ends
here, and I don't have a Windows setup to try it myself.
I can look into this tomorrow. The laptop I posted Ubuntu/RHEL6
test_fsync numbers from before also boots into Vista, so I can compare
all those platforms on the same hardware. I just need to be aware of
the slightly different sequential speeds on each partition of the drive.
As far as your major battle plan goes, I think what we should do is find
the simplest possible patch that just fixes the newer Linux kernel
problem, preferrably without changing any other platform, then commit
that to HEAD and appropriate backports. Then the larger O_DIRECT
remapping can proceed forward after that, along with cleanup to the
writethrough choices and unifying test_fsync against the server. I
wrote a patch that shuffled around a lot of this code last night, but
the first thing I coded was junk because of some mistaken assumptions.
Have been coming to same realizations about how messy this really is you
have.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books