fsync = true beneficial on ext3?

Started by Ed L.about 22 years ago16 messagesgeneral
Jump to latest
#1Ed L.
pgsql@bluepolka.net

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent performance
tests demonstrating a 45%-70% performance improvement for simple inserts
with fsync off on one particular system. Does fsync = true buy me any
additional recoverability beyond ext3's journal recovery?

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

TIA.

Ed

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ed L. (#1)
Re: fsync = true beneficial on ext3?

"Ed L." <pgsql@bluepolka.net> writes:

If we write something without sync'ing, presumably it's immediately
journaled?

I was under the impression that ext3 journals only filesystem metadata,
not the contents of files.

I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

Doesn't sound very safe to me.

regards, tom lane

#3Richard Welty
rwelty@averillpark.net
In reply to: Tom Lane (#2)
Re: fsync = true beneficial on ext3?

On Sun, 08 Feb 2004 14:02:26 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Ed L." <pgsql@bluepolka.net> writes:

If we write something without sync'ing, presumably it's immediately
journaled?

I was under the impression that ext3 journals only filesystem metadata,
not the contents of files.

by default, it journals everything, but you can set it to journal metadata
only, i think with the mount option data=writeback. do a "man mount"
and look for ext3 options for details on the data= option.

richard
--
Richard Welty rwelty@averillpark.net
Averill Park Networking 518-573-7592
Java, PHP, PostgreSQL, Unix, Linux, IP Network Engineering, Security

#4Ed L.
pgsql@bluepolka.net
In reply to: Tom Lane (#2)
Re: fsync = true beneficial on ext3?

On Sunday February 8 2004 12:02, Tom Lane wrote:

"Ed L." <pgsql@bluepolka.net> writes:

If we write something without sync'ing, presumably it's immediately
journaled?

I was under the impression that ext3 journals only filesystem metadata,
not the contents of files.

Ah, didn't know how that worked. So I gather there is really no
kernel-level substitute for fsync = true when it comes to guaranteeing data
is flushed to disk at commit time, I guess?

In linux, does pgsql's fsync call at commit time obviate the need for
bdflush to do any flushing for that particular data? I'm wondering if
there are bdflush adjustments to be made to improve disk write efficiency
given we can count on fsync = true to guarantee that .

Also, with fsync = true and wal using fdatasync, and assuming all is on the
same disk (which I know is not optimal), is there a particular ext3 mode
(data=writeback?) that gives better performance while maintaining best
recoverability?

#5Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Richard Welty (#3)
Re: fsync = true beneficial on ext3?

FYI - Ext3 has 3 modes :

data=ordered(default) : metadata is journaled (at write time data is
written before metadata - i.e ordered)
data=journal: data and metadata are journaled
data=writeback: metadata journaled (no ordering at write time)

The default will not help to protect database integrity if fsync is
false (as only metadata is journaled)

Will data=journal mode help? I am uncertain. A casual reading if these
definitions suggests that it *might* - anyone know for sure?

regards

Mark

Richard Welty wrote:

Show quoted text

by default, it journals everything, but you can set it to journal metadata
only, i think with the mount option data=writeback. do a "man mount"
and look for ext3 options for details on the data= option.

#6Martijn van Oosterhout
kleptog@svana.org
In reply to: Mark Kirkwood (#5)
Re: fsync = true beneficial on ext3?

On Mon, Feb 09, 2004 at 03:13:08PM +1300, Mark Kirkwood wrote:

FYI - Ext3 has 3 modes :

data=ordered(default) : metadata is journaled (at write time data is
written before metadata - i.e ordered)
data=journal: data and metadata are journaled
data=writeback: metadata journaled (no ordering at write time)

Thanks for that.

The default will not help to protect database integrity if fsync is
false (as only metadata is journaled)

Will data=journal mode help? I am uncertain. A casual reading if these
definitions suggests that it *might* - anyone know for sure?

My problem is that journalling works on a per-file basis. ie, the data for a
file is written before that file's metadata. However, the fsync is used for
the WAL segments and if you can't guarentee the WAL will hit the disk before
the data segments (different files), you're stuffed I think.

Or maybe WAL is not that sensitive to that kind of reordering. Maybe it only
depends on the WAL being consistant.

Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

(... have gone from d-i being barely usable even by its developers
anywhere, to being about 20% done. Sweet. And the last 80% usually takes
20% of the time, too, right?) -- Anthony Towns, debian-devel-announce

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martijn van Oosterhout (#6)
Re: fsync = true beneficial on ext3?

Martijn van Oosterhout <kleptog@svana.org> writes:

My problem is that journalling works on a per-file basis. ie, the data for a
file is written before that file's metadata. However, the fsync is used for
the WAL segments and if you can't guarentee the WAL will hit the disk before
the data segments (different files), you're stuffed I think.

Or maybe WAL is not that sensitive to that kind of reordering. Maybe it only
depends on the WAL being consistant.

The entire *point* of WAL is that WAL entries must hit disk before any
of the data-file changes they describe (that's why it's called write
AHEAD log). Without this you can't use WAL replay to ensure the data
files are brought to a fully consistent state. So yes, we do have to
have cross-file write ordering guarantees. fsync is a pretty blunt tool
for enforcing cross-file write ordering, but it's the only one
available...

regards, tom lane

#8Bruce Momjian
bruce@momjian.us
In reply to: Ed L. (#1)
Re: fsync = true beneficial on ext3?

Ed L. wrote:

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent performance
tests demonstrating a 45%-70% performance improvement for simple inserts
with fsync off on one particular system. Does fsync = true buy me any
additional recoverability beyond ext3's journal recovery?

Yes, it does. Without fsync, you can't be sure the data has been pushed
to the disk drive in case of an OS crash or power failure.

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

The fsync makes sure it hits the drive, rather than staying in the
kernel cache during an OS failure.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9scott.marlowe
scott.marlowe@ihs.com
In reply to: Ed L. (#1)
Re: fsync = true beneficial on ext3?

On Sun, 8 Feb 2004, Ed L. wrote:

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent performance
tests demonstrating a 45%-70% performance improvement for simple inserts
with fsync off on one particular system. Does fsync = true buy me any
additional recoverability beyond ext3's journal recovery?

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

With all the other posts on this topic, I just want to point out that it's
all theory until you build your machine, set it up, initiate a hundred or
so parallel transactions, and pull the plug in the middle.

Without pulling the plug, you just don't know for sure. And you need to
do it a few times, in case your machine "got lucky" once and might fail on
subsequent power fails.

#10Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: scott.marlowe (#9)
Re: fsync = true beneficial on ext3?

Actually, I don't think even that is a valid test. The absence of a
failure doesn't mean one can't occur in this case. Doesn't matter if you
try the test 1 or 10,000 times; the test will only be conclusive if you
actually see a failure.

On Mon, Feb 09, 2004 at 10:19:15AM -0700, scott.marlowe wrote:

On Sun, 8 Feb 2004, Ed L. wrote:

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent performance
tests demonstrating a 45%-70% performance improvement for simple inserts
with fsync off on one particular system. Does fsync = true buy me any
additional recoverability beyond ext3's journal recovery?

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

With all the other posts on this topic, I just want to point out that it's
all theory until you build your machine, set it up, initiate a hundred or
so parallel transactions, and pull the plug in the middle.

Without pulling the plug, you just don't know for sure. And you need to
do it a few times, in case your machine "got lucky" once and might fail on
subsequent power fails.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

--
Jim C. Nasby, Database Consultant jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

#11Ed L.
pgsql@bluepolka.net
In reply to: Jim Nasby (#10)
Re: fsync = true beneficial on ext3?

Sounds like "fsync = true" is the consensus for any circumstances where data
loss is intolerable.

Thx.

#12jerome
jerome@gmanmi.tv
In reply to: Bruce Momjian (#8)
Re: fsync = true beneficial on ext3?

Would a battery backed Card do the trick?

Show quoted text

On Tuesday 10 February 2004 00:42, Bruce Momjian wrote:

Ed L. wrote:

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent
performance tests demonstrating a 45%-70% performance improvement for
simple inserts with fsync off on one particular system. Does fsync =
true buy me any additional recoverability beyond ext3's journal recovery?

Yes, it does. Without fsync, you can't be sure the data has been pushed
to the disk drive in case of an OS crash or power failure.

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

The fsync makes sure it hits the drive, rather than staying in the
kernel cache during an OS failure.

#13Bruce Momjian
bruce@momjian.us
In reply to: jerome (#12)
Re: fsync = true beneficial on ext3?

JM wrote:

Would a battery backed Card do the trick?

No because the fsync causes the data to hit the card. Without the
fscync, the data could remain only in the kernel cache.

---------------------------------------------------------------------------

On Tuesday 10 February 2004 00:42, Bruce Momjian wrote:

Ed L. wrote:

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent
performance tests demonstrating a 45%-70% performance improvement for
simple inserts with fsync off on one particular system. Does fsync =
true buy me any additional recoverability beyond ext3's journal recovery?

Yes, it does. Without fsync, you can't be sure the data has been pushed
to the disk drive in case of an OS crash or power failure.

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

The fsync makes sure it hits the drive, rather than staying in the
kernel cache during an OS failure.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#14scott.marlowe
scott.marlowe@ihs.com
In reply to: Jim Nasby (#10)
Re: fsync = true beneficial on ext3?

I can see you never took statistics...

On Mon, 9 Feb 2004, Jim C. Nasby wrote:

Show quoted text

Actually, I don't think even that is a valid test. The absence of a
failure doesn't mean one can't occur in this case. Doesn't matter if you
try the test 1 or 10,000 times; the test will only be conclusive if you
actually see a failure.

On Mon, Feb 09, 2004 at 10:19:15AM -0700, scott.marlowe wrote:

On Sun, 8 Feb 2004, Ed L. wrote:

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent performance
tests demonstrating a 45%-70% performance improvement for simple inserts
with fsync off on one particular system. Does fsync = true buy me any
additional recoverability beyond ext3's journal recovery?

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

With all the other posts on this topic, I just want to point out that it's
all theory until you build your machine, set it up, initiate a hundred or
so parallel transactions, and pull the plug in the middle.

Without pulling the plug, you just don't know for sure. And you need to
do it a few times, in case your machine "got lucky" once and might fail on
subsequent power fails.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#15scott.marlowe
scott.marlowe@ihs.com
In reply to: jerome (#12)
Re: fsync = true beneficial on ext3?

Yep, it does. We use the lsi megaraid in our postgresql box with and it
has passed all the power plug pull tests we've thrown at it.

On Tue, 10 Feb 2004, JM wrote:

Show quoted text

Would a battery backed Card do the trick?

On Tuesday 10 February 2004 00:42, Bruce Momjian wrote:

Ed L. wrote:

I'm curious what the consensus is, if any, on use of fsync on ext3
filesystems with postgresql 7.3.4 or later. I did some recent
performance tests demonstrating a 45%-70% performance improvement for
simple inserts with fsync off on one particular system. Does fsync =
true buy me any additional recoverability beyond ext3's journal recovery?

Yes, it does. Without fsync, you can't be sure the data has been pushed
to the disk drive in case of an OS crash or power failure.

If we write something without sync'ing, presumably it's immediately
journaled? So even if the DB crashes prior to fsync'ing, are we fully
recoverable? I've been running a few pgsql clusters on ext3 with fsync =
false, suffered numerous OS crashes, and have yet to lose any data or see
any corruption from any of those crashes. Have I just been lucky?

The fsync makes sure it hits the drive, rather than staying in the
kernel cache during an OS failure.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#16Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#13)
Re: fsync = true beneficial on ext3?

Bruce Momjian <pgman@candle.pha.pa.us> writes:

JM wrote:

Would a battery backed Card do the trick?

No because the fsync causes the data to hit the card. Without the
fscync, the data could remain only in the kernel cache.

A battery backed card for the transaction logs wouldn't make it safe to run
without fsync, but it would make the fsyncs basically free.

--
greg