Can we trust fsync?
I'm really concerned by this post on Linux's fsync and disk flush behaviour:
http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
and seeking opinions from folks here who've been deeply involved in
write reliability work.
The amount of change in write reliablity behaviour in Linux across
kernel versions, file systems and storage abstraction layers is worrying
- different results for LVM vs !LVM, md vs !md, ext3 vs other, etc.
If this isn't something that's already been seen and dealt with then
I'll see if I can take a look into it once the RLS work is dealt with.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/21/2013 07:45 AM, Craig Ringer wrote:
I'm really concerned by this post on Linux's fsync and disk flush behaviour:
http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
... and yes, I realise that's partly why we have the "fsync" param to
control different sync modes. Just concerned it's even more variable
than I thought.
--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/21/2013 07:45 AM, Craig Ringer wrote:
I'm really concerned by this post on Linux's fsync and disk flush behaviour:
http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
... and yes, I realise that's partly why we have the "fsync" param to
control different sync modes. Just concerned it's even more variable
than I thought.
So on linux, we don't have any safe option for wal_sync_method?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Craig Ringer <craig@2ndquadrant.com> writes:
The amount of change in write reliablity behaviour in Linux across
kernel versions, file systems and storage abstraction layers is worrying
- different results for LVM vs !LVM, md vs !md, ext3 vs other, etc.
Well, we pretty much *have to* trust fsync --- there's not a lot we can
do if the kernel doesn't get this right. My takeaway is that you don't
want to be running a production database on bleeding-edge kernels or
filesystem stacks. If you want to use Linux, use a distro from a vendor
with a track record for caring about stability. (I'll omit the commercial
for my former employers, but ...)
Also, it's not that hard to do plug-pull testing to verify that your
system is telling the truth about fsync. This really ought to be part
of acceptance testing for any new DB server.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/20/2013 03:45 PM, Craig Ringer wrote:
I'm really concerned by this post on Linux's fsync and disk flush behaviour:
http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
and seeking opinions from folks here who've been deeply involved in
write reliability work.The amount of change in write reliablity behaviour in Linux across
kernel versions, file systems and storage abstraction layers is worrying
- different results for LVM vs !LVM, md vs !md, ext3 vs other, etc.If this isn't something that's already been seen and dealt with then
I'll see if I can take a look into it once the RLS work is dealt with.
I thought Greg did some testing on this a while back and determined
which versions were safe... (/me looks for post)
JD
--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
a rose in the deeps of my heart. - W.B. Yeats
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/21/2013 12:45 AM, Craig Ringer wrote:
I'm really concerned by this post on Linux's fsync and disk flush behaviour:
http://milek.blogspot.com.au/2010/12/linux-osync-and-write-barriers.html
and seeking opinions from folks here who've been deeply involved in
write reliability work.
With ext4 and XFS on plain/LVM/md block devices, this issue should
really be a thing of the past. I think the kernel folks would treat
this as bugs nowadays, too.
--
Florian Weimer / Red Hat Product Security Team
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Nov 21, 2013 at 1:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, it's not that hard to do plug-pull testing to verify that your
system is telling the truth about fsync. This really ought to be part
of acceptance testing for any new DB server.
I've never tried it but I always wondered how easy it was to do. How would
you ever know you had tested it enough?
The original mail was referencing a problem with syncing *meta* data
though. The semantics around meta data syncs are much less clearly
specified, in part because file systems traditionally made nearly all meta
data operations synchronous. Doing plug-pull testing on Postgres would not
test meta data syncing very well since Postgres specifically avoids doing
much meta data operations by overwriting existing files and blocks as much
as possible. You would have to test doing table extensions or pulling the
plug immediately after switching xlog files repeatedly to have any coverage
at all there.
--
greg
Greg Stark <stark@mit.edu> writes:
On Thu, Nov 21, 2013 at 1:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, it's not that hard to do plug-pull testing to verify that your
system is telling the truth about fsync. This really ought to be part
of acceptance testing for any new DB server.
I've never tried it but I always wondered how easy it was to do. How would
you ever know you had tested it enough?
I used the program Greg Smith recommends on our wiki (can't remember the
name offhand) when I got a new house server this spring. With the RAID
card configured for writethrough and no battery, it failed all over the
place. Fixed those configuration bugs, it was okay three or four times
in a row, which was good enough for me.
The original mail was referencing a problem with syncing *meta* data
though. The semantics around meta data syncs are much less clearly
specified, in part because file systems traditionally made nearly all meta
data operations synchronous. Doing plug-pull testing on Postgres would not
test meta data syncing very well since Postgres specifically avoids doing
much meta data operations by overwriting existing files and blocks as much
as possible.
True. You're better off with a specialized testing program. (Though
now you mention it, I wonder whether that program was stressing metadata
or not.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 22, 2013 at 1:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
The original mail was referencing a problem with syncing *meta* data
though. The semantics around meta data syncs are much less clearly
specified, in part because file systems traditionally made nearly all meta
data operations synchronous. Doing plug-pull testing on Postgres would not
test meta data syncing very well since Postgres specifically avoids doing
much meta data operations by overwriting existing files and blocks as much
as possible.True. You're better off with a specialized testing program. (Though
now you mention it, I wonder whether that program was stressing metadata
or not.)
You can always stress metadata by leaving atime updates in their full
setting (whatever it is for that filesystem).
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 22, 2013 at 11:16:06AM -0500, Tom Lane wrote:
Greg Stark <stark@mit.edu> writes:
On Thu, Nov 21, 2013 at 1:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Also, it's not that hard to do plug-pull testing to verify that your
system is telling the truth about fsync. This really ought to be part
of acceptance testing for any new DB server.I've never tried it but I always wondered how easy it was to do. How would
you ever know you had tested it enough?I used the program Greg Smith recommends on our wiki (can't remember the
name offhand) when I got a new house server this spring. With the RAID
card configured for writethrough and no battery, it failed all over the
place. Fixed those configuration bugs, it was okay three or four times
in a row, which was good enough for me.The original mail was referencing a problem with syncing *meta* data
though. The semantics around meta data syncs are much less clearly
specified, in part because file systems traditionally made nearly all meta
data operations synchronous. Doing plug-pull testing on Postgres would not
test meta data syncing very well since Postgres specifically avoids doing
much meta data operations by overwriting existing files and blocks as much
as possible.True. You're better off with a specialized testing program. (Though
now you mention it, I wonder whether that program was stressing metadata
or not.)
The program is diskchecker:
http://brad.livejournal.com/2116715.html
I got the author to re-host the source code on github a few years ago.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
The program is diskchecker:
http://brad.livejournal.com/2116715.html
I got the author to re-host the source code on github a few years ago.
It might be worth re-implementing this for -contrib. The fact that we
mention diskchecker.pl in the docs, and it is a pretty obscure Perl
script on some guy's personal website doesn't inspire much confidence.
--
Peter Geoghegan
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Nov 22, 2013 at 03:06:31PM -0800, Peter Geoghegan wrote:
On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
The program is diskchecker:
http://brad.livejournal.com/2116715.html
I got the author to re-host the source code on github a few years ago.
It might be worth re-implementing this for -contrib. The fact that we
mention diskchecker.pl in the docs, and it is a pretty obscure Perl
script on some guy's personal website doesn't inspire much confidence.
Well, it was his idea, and quite a good one. I guess we could
reimplement this in C if someone wants to do the legwork.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 11/22/2013 03:23 PM, Bruce Momjian wrote:
On Fri, Nov 22, 2013 at 03:06:31PM -0800, Peter Geoghegan wrote:
On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
The program is diskchecker:
http://brad.livejournal.com/2116715.html
I got the author to re-host the source code on github a few years ago.
It might be worth re-implementing this for -contrib. The fact that we
mention diskchecker.pl in the docs, and it is a pretty obscure Perl
script on some guy's personal website doesn't inspire much confidence.Well, it was his idea, and quite a good one. I guess we could
reimplement this in C if someone wants to do the legwork.
Yeah, too bad Brad didn't post a license for it.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMbd083a3685e36f788b062c233f607fa970acde430c4e25d341bbada3a6dbf1ff7c0ffdd07f322d1a804ac3cec023c5d2@asav-1.01.com
On Fri, Nov 22, 2013 at 03:27:29PM -0800, Josh Berkus wrote:
On 11/22/2013 03:23 PM, Bruce Momjian wrote:
On Fri, Nov 22, 2013 at 03:06:31PM -0800, Peter Geoghegan wrote:
On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
The program is diskchecker:
http://brad.livejournal.com/2116715.html
I got the author to re-host the source code on github a few years ago.
It might be worth re-implementing this for -contrib. The fact that we
mention diskchecker.pl in the docs, and it is a pretty obscure Perl
script on some guy's personal website doesn't inspire much confidence.Well, it was his idea, and quite a good one. I guess we could
reimplement this in C if someone wants to do the legwork.Yeah, too bad Brad didn't post a license for it.
We can ask him.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Nov 23, 2013 at 8:06 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, Nov 22, 2013 at 2:57 PM, Bruce Momjian <bruce@momjian.us> wrote:
The program is diskchecker:
http://brad.livejournal.com/2116715.html
I got the author to re-host the source code on github a few years ago.
It might be worth re-implementing this for -contrib. The fact that we
mention diskchecker.pl in the docs, and it is a pretty obscure Perl
script on some guy's personal website doesn't inspire much confidence.
Yes, having that in contrib would be useful. Those would bring a plus
when testing disks for Postgres.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers