win32 performance - fsync question
Hi,
looking for the way how to increase performance at Windows XP box, I found
the parameters
#fsync = true # turns forced synchronization on or off
#wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync, open_sync, or open_datasync
I have no idea how it works with win32. May I try fsync = false, or it is
dangerous? Which of wal_sync_method may I try at WinXP?
Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er
looking for the way how to increase performance at Windows XP box, I
found
the parameters
#fsync = true # turns forced synchronization on or
off
#wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync, open_sync, or
open_datasyncI have no idea how it works with win32. May I try fsync = false, or it
is
dangerous? Which of wal_sync_method may I try at WinXP?
wal_sync_method does nothing on XP. The fsync option will tremendously
increase performance on writes at the cost of possible data corruption
in the event of a expected server power down.
The main performance difference between win32 and various unix systems
is that fsync() takes much longer on win32 than linux.
Merlin
Import Notes
Resolved by subject fallback
Hi,
looking for the way how to increase performance at Windows XP
box, I found the parameters#fsync = true # turns forced
synchronization on or off
#wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync,
open_sync, or open_datasyncI have no idea how it works with win32. May I try fsync =
false, or it is dangerous? Which of wal_sync_method may I try
at WinXP?
You can try it, but it is dangerous.
fsync is the correct wal_sync_method.
For some reason the syncing is quite a lot slower on win32. One reason
might be that it does flush metadata about the file as well, which I
beleive at least Linux doesn't.
If it wasn't clear already, if you're running antivirus, try
uninstalling it. Note that you may need to uninstall it to get all
performance back, just disabling is often *not* enough as the kernel
driver is still loaded.
Things worth experimenting with (these are all untested, so please
report any successes):
1) Try reformatting with a cluster size of 8Kb (the pg page size), if
you can.
2) Disable the last access time (like noatime on linux). "fsutil
behavior set disablelastaccess 1"
3) Disable 8.3 filenames "fsutil behavior set disable8dot3 1"
2 and 3 may require a reboot.
(2 and 3 can be done on earlier windows through registry settings only,
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem)
//Magnus
Import Notes
Resolved by subject fallback
Things worth experimenting with (these are all untested, so please
report any successes):
1) Try reformatting with a cluster size of 8Kb (the pg page size), if
you can.
What about recompiling pg with a 4k block size. Win32 file cluster
sizes and memory allocation units are both on 4k boundries.
Merlin
Import Notes
Resolved by subject fallback
In <4214B68C.8000901@dunslane.net>, on 02/17/05
at 10:21 AM, Andrew Dunstan <andrew@dunslane.net> said:
E.Rodichev wrote:
This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?
In the words of the Duke of Wellington, "If you believe that you'll
believe anything."
Please review past discussions on the mailing lists on this point.
BTW, most journalling file systems do not guarantee file integrity, only
file metadata integrity. In particular, I believe this is tru of NTFS
(and whether it even does that has been debated).
So by all means turn off fsync if you want the performance gain *and*
you accept the risk. But if you do, don't come crying later that your
data has been lost or corrupted.
(the results are interesting, though - with fsync off Windows and Linux
are in the same performance ballpark.)
cheers
andrew
In anything I've done, Windows is very slow when you use fsync or the
Windows API equivalent.
If you need the performance, you had better have the machine hooked up to
a UPS (probably a good idea in any case) and set up something that is
triggered by the UPS running down to signal postgreSQL to do an immediate
shutdown.
--
-----------------------------------------------------------
lsunley@mb.sympatico.ca
-----------------------------------------------------------
Import Notes
Reply to msg id not found: 4214B68C.8000901@dunslane.net | Resolved by subject fallback
On Thu, 17 Feb 2005, Magnus Hagander wrote:
Hi,
looking for the way how to increase performance at Windows XP
box, I found the parameters#fsync = true # turns forced
synchronization on or off
#wal_sync_method = fsync # the default varies across platforms:
# fsync, fdatasync,
open_sync, or open_datasyncI have no idea how it works with win32. May I try fsync =
false, or it is dangerous? Which of wal_sync_method may I try
at WinXP?You can try it, but it is dangerous.
fsync is the correct wal_sync_method.For some reason the syncing is quite a lot slower on win32. One reason
might be that it does flush metadata about the file as well, which I
beleive at least Linux doesn't.If it wasn't clear already, if you're running antivirus, try
uninstalling it. Note that you may need to uninstall it to get all
performance back, just disabling is often *not* enough as the kernel
driver is still loaded.
No, I have not any resident disk-related staff.
Things worth experimenting with (these are all untested, so please
report any successes):
1) Try reformatting with a cluster size of 8Kb (the pg page size), if
you can.
2) Disable the last access time (like noatime on linux). "fsutil
behavior set disablelastaccess 1"
3) Disable 8.3 filenames "fsutil behavior set disable8dot3 1"2 and 3 may require a reboot.
(2 and 3 can be done on earlier windows through registry settings only,
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem)
I've repeated the test under 2 and 3 - no noticeable difference. With
disablelastaccess I got about 10% - 15% better results, but it is not
too significant.
Finally I tried
fsync = false
and got 580-620 tps. So, the short summary:
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux 800 tps
The general question is - does PostgreSQL really need fsync? I suppose it
is a question for design, not platform-specific one. It sounds like only
one scenario, when fsync is useful, is to interprocess communication via
open file. But PostgreSQL utilize IPC for this, so does fsync is really
required?
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er
The general question is - does PostgreSQL really need fsync? I suppose it
is a question for design, not platform-specific one. It sounds like only
one scenario, when fsync is useful, is to interprocess communication via
open file. But PostgreSQL utilize IPC for this, so does fsync is really
required?
NO!
Fsync is so that when your computer loses power without warning, you
will have no data loss.
If you turn it off, you run the risk of losing data if you lose power.
Chris
On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
The general question is - does PostgreSQL really need fsync? I suppose it
is a question for design, not platform-specific one. It sounds like only
one scenario, when fsync is useful, is to interprocess communication via
open file. But PostgreSQL utilize IPC for this, so does fsync is really
required?NO!
Fsync is so that when your computer loses power without warning, you will
have no data loss.If you turn it off, you run the risk of losing data if you lose power.
Chris
This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?
Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er
On Thu, 17 Feb 2005 17:54:38 +0300 (MSK)
"E.Rodichev" <er@sai.msu.su> wrote:
On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
The general question is - does PostgreSQL really need fsync? I
suppose it> is a question for design, not platform-specific one. It
sounds like only> one scenario, when fsync is useful, is to
interprocess communication via> open file. But PostgreSQL utilize IPC
for this, so does fsync is really> required?NO!
Fsync is so that when your computer loses power without warning, you
will have no data loss.If you turn it off, you run the risk of losing data if you lose
power.Chris
This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?
NO again!
Fsck only fixes up file system pointers after a crash. If the data did
not make it to the disk, no amount of fscking will put it there.
I'm not positive but I think that journalled file systems also need
fsync to guarantee that the information gets journalled but in any case,
journalling only helps if you have a journalled file system. Not
everyone does.
This is not to say that fsync is always required, just that it solves a
different problem than all those other tools.
--
D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"E.Rodichev" <er@sai.msu.su> writes:
On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
Fsync is so that when your computer loses power without warning, you
will have no data loss.If you turn it off, you run the risk of losing data if you lose power.
Chris
This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?
No, it's not addressed by the file system. fsync() tells the OS to
make sure the data is on disk. Without that, the OS is free to just
keep the WAL data in memory cache, and a power failure could cause
data from committed transactions to be lost (we don't report commit
success until fsync() tells us the file data is on disk).
-Doug
E.Rodichev wrote:
This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?
In the words of the Duke of Wellington, "If you believe that you'll
believe anything."
Please review past discussions on the mailing lists on this point.
BTW, most journalling file systems do not guarantee file integrity, only
file metadata integrity. In particular, I believe this is tru of NTFS
(and whether it even does that has been debated).
So by all means turn off fsync if you want the performance gain *and*
you accept the risk. But if you do, don't come crying later that your
data has been lost or corrupted.
(the results are interesting, though - with fsync off Windows and Linux
are in the same performance ballpark.)
cheers
andrew
So by all means turn off fsync if you want the performance gain *and*
you accept the risk. But if you do, don't come crying later that your
data has been lost or corrupted.(the results are interesting, though - with fsync off Windows
and Linux
are in the same performance ballpark.)
Yes, this is definitly interesting. It confirms Merlins signs of I/O
being what kills the win32 version. IPC etc is a bit slower, but not
significantly.
In anything I've done, Windows is very slow when you use fsync or the
Windows API equivalent.
This is what we have discovered. AFAIK, all other major databases or
other similar apps (like exchange or AD) all open files with
FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably
better performance with an O_DIRECT style WAL logging at least. But I'm
unsure if the current code for O_DIRECT works on win32 - I think it
needs some fixing for that. Which might be worth looking at for 8.1.
Not much to do about the bgwriter, the way it is designed it *has* to
fsync during checkpoint. The Other Databases implement their own cache
and write data files directly also, but pg is designed to have the OS
cache helping out. Bypassing it would not be good for performance.
If you need the performance, you had better have the machine
hooked up to
a UPS (probably a good idea in any case) and set up something that is
triggered by the UPS running down to signal postgreSQL to do
an immediate
shutdown.
UPS will not help you. UPS does not help you if the OS crashes (hey,
yuo're on windows, this *does* happen). UPS does not help you if
somebody accidentally pulls the plug between the UPS and the server. UPS
does not help you if your server overheats and shuts down.
Bottom line, there are lots of cases when an UPS does not help. Having
an UPS (preferrably redundant UPSes feeding redundant power supplies -
this is not at all expensive today) is certainly a good thing, but it is
*not* a replacement for fsync. On *any* platform.
//Magnus
Import Notes
Resolved by subject fallback
"Magnus Hagander" <mha@sollentuna.net> writes:
This is what we have discovered. AFAIK, all other major databases or
other similar apps (like exchange or AD) all open files with
FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably
better performance with an O_DIRECT style WAL logging at least. But I'm
unsure if the current code for O_DIRECT works on win32 - I think it
needs some fixing for that. Which might be worth looking at for 8.1.
Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to open()?
That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the
latter means what I suppose it does.
Not much to do about the bgwriter, the way it is designed it *has* to
fsync during checkpoint.
Theoretically at least, the fsync during checkpoints should not be a
performance killer. The issue that's at hand here is fsyncing the WAL,
and the reason we need that is (a) to be sure a transaction is committed
when we say it is, and (b) to be sure that WAL writes hit disk before
associated data file updates do (it's write AHEAD log remember). Direct
writes of WAL should be fine.
So: try O_SYNC instead of fsync for WAL, ie, wal_sync_method =
open_sync or open_datasync.
regards, tom lane
This is what we have discovered. AFAIK, all other major databases or
other similar apps (like exchange or AD) all open files with
FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It mightgive noticably
better performance with an O_DIRECT style WAL logging at
least. But I'm
unsure if the current code for O_DIRECT works on win32 - I think it
needs some fixing for that. Which might be worth looking at for 8.1.Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to open()?
That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the
latter means what I suppose it does.
They should, but someone said it didn't work. I haven't followed up on
it, though, so it is quite possible it works. If so, it is definitly
worth trying.
Not much to do about the bgwriter, the way it is designed it *has* to
fsync during checkpoint.Theoretically at least, the fsync during checkpoints should not be a
performance killer.
If you run a tight benchmark past a checkpoint, it will make an effect
if the fsync takes twice as long as it does on unix. If the checkpoint
happens when other I/O is fairly low then it shuold not have an effect.
Merlin, was that by any chance you? We've been talking about these
things quite a lot :-)
So: try O_SYNC instead of fsync for WAL, ie, wal_sync_method =
open_sync or open_datasync.
Definitly worth cehcking out.
//Magnus
Import Notes
Resolved by subject fallback
Things worth experimenting with (these are all untested, so please
report any successes):
1) Try reformatting with a cluster size of 8Kb (the pg page size), if
you can.
2) Disable the last access time (like noatime on linux). "fsutil
behavior set disablelastaccess 1"
3) Disable 8.3 filenames "fsutil behavior set disable8dot3 1"2 and 3 may require a reboot.
(2 and 3 can be done on earlier windows through registry
settings only,
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem)
I've repeated the test under 2 and 3 - no noticeable difference. With
disablelastaccess I got about 10% - 15% better results, but it is not
too significant.
Actually, that's enough to care about in a real world deployment.
Finally I tried
fsync = false
and got 580-620 tps. So, the short summary:
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux 800 tps
This Linux figure is really compared to the WinXP fsync=false, since you
have write cacheing on. The interesting one to compare with is the other
one you did:
Linux w/o write cache 80-90 tps
Which is still faster than windows, but not as much faster.
The general question is - does PostgreSQL really need fsync? I
suppose it
is a question for design, not platform-specific one. It sounds
like only
one scenario, when fsync is useful, is to interprocess
communication via
open file. But PostgreSQL utilize IPC for this, so does fsync is really
required?
No, fsync is used to make sure your data is committed to disk once you
commit a transaction. IPC is handled through shared memory and named
pipes.
//Magnus
Import Notes
Resolved by subject fallback
On Thu, 17 Feb 2005, Andrew Dunstan wrote:
(the results are interesting, though - with fsync off Windows and Linux are
in the same performance ballpark.)
Some addition:
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false 980 tps
Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er@sai.msu.su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er
Doesn't Windows support O_SYNC (or even better O_DSYNC) flag
to open()?
That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the
latter means what I suppose it does.They should, but someone said it didn't work. I haven't
followed up on it, though, so it is quite possible it works.
If so, it is definitly worth trying.
Update on that. There is no O_SYNC nor O_DSYNC. They just aren't there.
However, we already have win32_open (in port/open.c) which is used to
open these files. We could probably add code there to check for O_SYNC
and map it to the correct win32 flags for CreateFile (because the
support certainly is there).
To make this happen, is it enough to define O_DSYNC in the win32 port
include file, and then implement it in the open call? Or do I need to
hack xlog.c? The comment claims it's hackery ;-), so I figured I should
verify that before actually testing things.
Oh, and finally. The win32 commands have the following options:
FILE_FLAG_NO_BUFFERING. This disables the cache completely. It also has
lots of limits, like every read and write has to be on a sector boundary
etc. It gives great performance with async I/O, because it bypasses the
memory manager. It appears to be like O_DIRECT on linux?
FILE_FLAG_WRITE_THROUGH:
"
Instructs the system to write through any intermediate cache and go
directly to disk.
If FILE_FLAG_NO_BUFFERING is not also specified, so that system caching
is in effect, then the data is written to the system cache, but is
flushed to disk without delay.
If FILE_FLAG_NO_BUFFERING is also specified, so that system caching is
not in effect, then the data is immediately flushed to disk without
going through the system cache. The operating system also requests a
write-through the hard disk cache to persistent media. However, not all
hardware supports this write-through capability.
"
It seems to me FILE_FLAG_NO_BUFFERING is the same as O_DSYNC. (A
different place in the docs says "Also, the file metadata may still be
cached. To flush the metadata to disk, use the FlushFileBuffers
function.", so it seems it's more DSYNC than SYNC)
//Magnus
Import Notes
Resolved by subject fallback
Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to
open()?
That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the
latter means what I suppose it does.They should, but someone said it didn't work. I haven't followed up on
it, though, so it is quite possible it works. If so, it is definitly
worth trying.
Yes, and the other issue is that FlushFileBuffers() does not play nice
with raid controllers, it actually overrides their write caching so that
you can not get around the fsync performance issue using raid + bbu on
most configurations.
Not much to do about the bgwriter, the way it is designed it *has*
to
fsync during checkpoint.
Theoretically at least, the fsync during checkpoints should not be a
performance killer.
I agree: it's the WAL sync that is the problem. I don't mind a slower
sync during checkpoint because that is controllable. However, there is
also the raid issue.
If you run a tight benchmark past a checkpoint, it will make an effect
if the fsync takes twice as long as it does on unix. If the checkpoint
happens when other I/O is fairly low then it shuold not have an
effect.
Merlin, was that by any chance you? We've been talking about these
things quite a lot :-)
So: try O_SYNC instead of fsync for WAL, ie, wal_sync_method =
open_sync or open_datasync.Definitly worth cehcking out.
Yeah.
Merlin
Import Notes
Resolved by subject fallback
"Magnus Hagander" <mha@sollentuna.net> writes:
Oh, and finally. The win32 commands have the following options:
FILE_FLAG_NO_BUFFERING. This disables the cache completely. It also has
lots of limits, like every read and write has to be on a sector boundary
etc. It gives great performance with async I/O, because it bypasses the
memory manager. It appears to be like O_DIRECT on linux?
FILE_FLAG_WRITE_THROUGH:
"
Instructs the system to write through any intermediate cache and go
directly to disk.
If FILE_FLAG_NO_BUFFERING is not also specified, so that system caching
is in effect, then the data is written to the system cache, but is
flushed to disk without delay.
If FILE_FLAG_NO_BUFFERING is also specified, so that system caching is
not in effect, then the data is immediately flushed to disk without
going through the system cache. The operating system also requests a
write-through the hard disk cache to persistent media. However, not all
hardware supports this write-through capability.
"
AFAICS it would make sense for us to specify both of those flags for WAL
writes.
We could either hack win32_open() to translate O_SYNC to those flags,
or make xlog.c aware of the Windows spellings of the flags. Probably
the former is less painful given that open.c already does wholesale
translations of open() flags.
One point that I no longer recall the reasoning behind is that xlog.c
doesn't think O_SYNC is a preferable default over fsync. We'd certainly
want to hack xlog.c to change its mind about that, at least on Windows;
assuming that the FILE_FLAG way is indeed faster.
regards, tom lane