WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Started by Bruce Kleinabout 7 years ago35 messagesgeneral
Jump to latest
#1Bruce Klein
brucek@gmail.com

Just in case this helps the next person who can't figure out why their
postgres server won't start today:

If you are running Postgres inside Microsoft WSL (at least on Ubuntu, maybe
on others too), and just picked up a software update to version 11.2, you
will need to go into your /etc/postgresql.conf file and set fsync=off.

This took me a while to fix because the error you message you get if you
don't is the generic:

terminating connection because of crash of another server process
2015-07-15 20:18:37 UTC The postmaster has commanded this server process to
roll back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.

I spent a long time trying to completely uninstall and resintall, etc. to
recover from the "crash" although I don't think there ever was one and the
message appears on first use of the create database command even on a
completely clean install.

I don't know if this is possible/reasonable, but if the database code could
automatically turn fsync off on WSL it might save the next users some
trouble.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Klein (#1)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Bruce Klein <brucek@gmail.com> writes:

If you are running Postgres inside Microsoft WSL (at least on Ubuntu, maybe
on others too), and just picked up a software update to version 11.2, you
will need to go into your /etc/postgresql.conf file and set fsync=off.

Hm. Probably this is some unexpected problem with the
panic-on-fsync-failure change; although that still leaves some things
unexplained, because if fsync is failing for you now, why didn't it fail
before? Anyway, you might try experimenting with data_sync_retry,
instead of running with scissors by turning off fsync altogether.
See first item in the release notes:

https://www.postgresql.org/docs/11/release-11-2.html

Also, we'd quite like to hear more details; can you find any PANIC
messages in the server log?

regards, tom lane

#3Bruce Klein
brucek@gmail.com
In reply to: Tom Lane (#2)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Thanks Tom I feel like I'm in a little over my head here but I'll try to
help as I can.

With fsync off, everything appears to run as it did before on 11.1.

With fsync default/on, the problem is easily reproducible by trying to
create a database. I believe the very first time I saw it it was with a
routine query but I'm not 100% sure.

psql-11.2=> create database testdb;
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
SSL SYSCALL error: EOF detected
The connection to the server was lost. Attempting reset: Failed.
!>

Here are the entries from the log:
1527 2019-02-14 15:06:08.218 DST [8398] PANIC: could not flush dirty data:
Function not implemented
1528 2019-02-14 15:06:08.218 DST [8396] LOG: checkpointer process (PID
8398) was terminated by signal 6: Aborted
1529 2019-02-14 15:06:08.218 DST [8396] LOG: terminating any other active
server processes
1530 2019-02-14 15:06:08.218 DST [8422] homestead@homestead WARNING:
terminating connection because of crash of another server process
1531 2019-02-14 15:06:08.218 DST [8422] homestead@homestead DETAIL: The
postmaster has commanded this server process to roll back the current
transaction an d exit, because another server process exited abnormally
and possibly corrupted shared memory.
1532 2019-02-14 15:06:08.218 DST [8422] homestead@homestead HINT: In a
moment you should be able to reconnect to the database and repeat your
command.
1533 2019-02-14 15:06:08.218 DST [8401] WARNING: terminating connection
because of crash of another server process
1534 2019-02-14 15:06:08.218 DST [8401] DETAIL: The postmaster has
commanded this server process to roll back the current transaction and
exit, because anot her server process exited abnormally and possibly
corrupted shared memory.
1535 2019-02-14 15:06:08.218 DST [8401] HINT: In a moment you should be
able to reconnect to the database and repeat your command.
1536 2019-02-14 15:06:08.241 DST [8396] LOG: all server processes
terminated; reinitializing
1537 2019-02-14 15:06:08.259 DST [8433] LOG: database system was
interrupted; last known up at 2019-02-14 15:05:30 DST
1538 2019-02-14 15:06:08.259 DST [8433] PANIC: could not flush dirty data:
Function not implemented
1539 2019-02-14 15:06:08.264 DST [8396] LOG: startup process (PID 8433)
was terminated by signal 6: Aborted
1540 2019-02-14 15:06:08.264 DST [8396] LOG: aborting startup due to
startup process failure
1541 2019-02-14 15:06:08.266 DST [8434] homestead@homestead FATAL: the
database system is in recovery mode
1542 2019-02-14 15:06:08.268 DST [8396] LOG: database system is shut down

As to why it worked before, I don't think fsync() ever worked on WSL, and
there were places where you'd see warnings about it in 11.1, they just
wouldn't crash the server.

As to the "running with scissors" risk, I'm going to guess the most common
use case for WSL is as a personal dev box where all the data is disposable
anyway. That's the case for me at least.

Best,
Bruce

On Thu, Feb 14, 2019 at 2:48 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Bruce Klein <brucek@gmail.com> writes:

If you are running Postgres inside Microsoft WSL (at least on Ubuntu,

maybe

on others too), and just picked up a software update to version 11.2, you
will need to go into your /etc/postgresql.conf file and set fsync=off.

Hm. Probably this is some unexpected problem with the
panic-on-fsync-failure change; although that still leaves some things
unexplained, because if fsync is failing for you now, why didn't it fail
before? Anyway, you might try experimenting with data_sync_retry,
instead of running with scissors by turning off fsync altogether.
See first item in the release notes:

https://www.postgresql.org/docs/11/release-11-2.html

Also, we'd quite like to hear more details; can you find any PANIC
messages in the server log?

regards, tom lane

#4Bruce Klein
brucek@gmail.com
In reply to: Bruce Klein (#1)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

In 11.1 did you see the message "WARNING: could not flush dirty data:

Function not implemented"
Yes

re: the discussions of O/S and filesystem in that thread:
I am not qualified to describe the implementation of WSL but I believe it
is neither pure Ubuntu running on metal, nor a virtual machine hosted on
Windows. I believe what the Microsoft folks have done is implement
something around the driver/kernel layer that fools Ubuntu into thinking it
is connected to hardware it expects, while it is ultimately still running
on top of a Windows kernel and Windows filesystem. That includes stubbing
out or otherwise presenting an appearance of implementing some functions
like perhaps fsync() that it really doesn't. Note I believe this is
fundamentally different from the old Cygwin and similar projects approach,
i.e. WSL does not involve recompiling on top of window specific libraries
etc. If any of these details are important to anyone you should verify them
from a more credible source.

If it matters, the Ubuntu version I am running on WSL now is 16.04.5.

On Thu, Feb 14, 2019 at 3:44 PM Ravi Krishna <srkrishna100@aol.com> wrote:

Show quoted text

Hi Bruce,

Check my earlier thread on PG 10.5 on Ubuntu Bash with WSL.

/messages/by-id/1301077575.68539.1535929075959@mail.yahoo.com

In 11.1 did you see the message "WARNING: could not flush dirty data:
Function not implemented"

regards

#5Thomas Munro
thomas.munro@gmail.com
In reply to: Bruce Klein (#4)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On Fri, Feb 15, 2019 at 2:56 PM Bruce Klein <brucek@gmail.com> wrote:

In 11.1 did you see the message "WARNING: could not flush dirty data: Function not implemented"

Yes

I wonder if this is coming from sync_file_range(), which is not
implemented on WSL according to random intergoogling, but probably
appears as present to our configure script. I find it harder to
believe they didn't implement fsync().

--
Thomas Munro
http://www.enterprisedb.com

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#5)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On Fri, Feb 15, 2019 at 3:56 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Feb 15, 2019 at 2:56 PM Bruce Klein <brucek@gmail.com> wrote:

In 11.1 did you see the message "WARNING: could not flush dirty data: Function not implemented"

Yes

I wonder if this is coming from sync_file_range(), which is not
implemented on WSL according to random intergoogling, but probably
appears as present to our configure script. I find it harder to
believe they didn't implement fsync().

Here is a place where people go to complain about that:

https://github.com/Microsoft/WSL/issues/645

I suppose we could tolerate ENOSYS.

--
Thomas Munro
http://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#6)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Fri, Feb 15, 2019 at 2:56 PM Bruce Klein <brucek@gmail.com> wrote:

In 11.1 did you see the message "WARNING: could not flush dirty data: Function not implemented"

Yes

Here is a place where people go to complain about that:
https://github.com/Microsoft/WSL/issues/645
I suppose we could tolerate ENOSYS.

What I'm not grasping here is why you considered that sync_file_range
failure should be treated as a reason to PANIC in the first place?
Surely it is not fsync(), nor some facsimile thereof. In fact, if
any of the branches in pg_flush_data really need the data_sync_elevel
treatment, somebody's mental model of that operation needs adjustment.
Maybe it's mine.

regards, tom lane

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#7)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On Fri, Feb 15, 2019 at 5:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

On Fri, Feb 15, 2019 at 2:56 PM Bruce Klein <brucek@gmail.com> wrote:

In 11.1 did you see the message "WARNING: could not flush dirty data: Function not implemented"

Yes

Here is a place where people go to complain about that:
https://github.com/Microsoft/WSL/issues/645
I suppose we could tolerate ENOSYS.

What I'm not grasping here is why you considered that sync_file_range
failure should be treated as a reason to PANIC in the first place?
Surely it is not fsync(), nor some facsimile thereof. In fact, if
any of the branches in pg_flush_data really need the data_sync_elevel
treatment, somebody's mental model of that operation needs adjustment.
Maybe it's mine.

My thinking was that sync_file_range() might in its current, future or
alternative (WSL, ...) implementation eat an error that would
otherwise reach fsync(), due to the single-flag error state treatment
we see in several OSes (older Linux, also recent Linux via the 'seen'
flag that we rely on to receive errors that happened before we opened
the fd). Should we be inspecting the Linux source or asking
assurances from Linux hackers that that can't happen? Perhaps it
behaves more like fdatasync() with the SYNC_FILE_RANGE_WAIT_* flags (=
can clear seen flag), but more like fadvise() without (can't touch
it)? I don't know, and I didn't want to base my choice on what it
looks like it currently does in the Linux tree. Without guarantees
from standards (not relevant here) or man pages (which note only that
EIO is possible), I made what I thought was an appropriately
pessimistic choice.

--
Thomas Munro
http://www.enterprisedb.com

#9Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#2)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Hi,

On 2019-02-14 19:48:05 -0500, Tom Lane wrote:

Bruce Klein <brucek@gmail.com> writes:

If you are running Postgres inside Microsoft WSL (at least on Ubuntu, maybe
on others too), and just picked up a software update to version 11.2, you
will need to go into your /etc/postgresql.conf file and set fsync=off.

Hm. Probably this is some unexpected problem with the
panic-on-fsync-failure change; although that still leaves some things
unexplained, because if fsync is failing for you now, why didn't it fail
before? Anyway, you might try experimenting with data_sync_retry,
instead of running with scissors by turning off fsync altogether.
See first item in the release notes:

https://www.postgresql.org/docs/11/release-11-2.html

Also, we'd quite like to hear more details; can you find any PANIC
messages in the server log?

I suspect that's because WSL has an empty implementation of
sync_file_range(), i.e. it unconditionally returns ENOSYS. But as
configure detects it, we still emit calls for it. I guess we ought to
except ENOSYS for the cases where we do panic-on-fsync-failure?

You temporarily can work around it, mostly, by setting
checkpoint_flush_after = 0 and bgwriter_flush_after = 0.

Greetings,

Andres Freund

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#9)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Andres Freund <andres@anarazel.de> writes:

I suspect that's because WSL has an empty implementation of
sync_file_range(), i.e. it unconditionally returns ENOSYS. But as
configure detects it, we still emit calls for it. I guess we ought to
except ENOSYS for the cases where we do panic-on-fsync-failure?

I'm of the opinion that we shouldn't be panicking for sync_file_range
failure, period.

regards, tom lane

#11Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#10)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On February 15, 2019 9:13:10 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I suspect that's because WSL has an empty implementation of
sync_file_range(), i.e. it unconditionally returns ENOSYS. But as
configure detects it, we still emit calls for it. I guess we ought

to

except ENOSYS for the cases where we do panic-on-fsync-failure?

I'm of the opinion that we shouldn't be panicking for sync_file_range
failure, period.

With some flags it's strictly required, it does"eat"errors depending on the flags. So I'm not sure I understand?

Access
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#11)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Andres Freund <andres@anarazel.de> writes:

On February 15, 2019 9:13:10 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm of the opinion that we shouldn't be panicking for sync_file_range
failure, period.

With some flags it's strictly required, it does"eat"errors depending on the flags. So I'm not sure I understand?

Really? The specification says that it starts I/O, not that it waits
around for any to finish.

The bigger picture here is that this set of patches seems to have moved
us too far in the direction of defending against hypothetical kernel
bugs, and too far away from real-world usability. I am not happy with
the tradeoff.

regards, tom lane

#13Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#12)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On February 15, 2019 9:44:50 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On February 15, 2019 9:13:10 AM PST, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

I'm of the opinion that we shouldn't be panicking for

sync_file_range

failure, period.

With some flags it's strictly required, it does"eat"errors depending

on the flags. So I'm not sure I understand?

Really? The specification says that it starts I/O, not that it waits
around for any to finish.

That depends on the flags you pass in. By memory I don't think it eats an error with our flags in recent kernels, but I'm not sure.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

#14Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#13)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On Sat, Feb 16, 2019 at 6:50 AM Andres Freund <andres@anarazel.de> wrote:

On February 15, 2019 9:44:50 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On February 15, 2019 9:13:10 AM PST, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

I'm of the opinion that we shouldn't be panicking for

sync_file_range

failure, period.

With some flags it's strictly required, it does"eat"errors depending

on the flags. So I'm not sure I understand?

Really? The specification says that it starts I/O, not that it waits
around for any to finish.

That depends on the flags you pass in. By memory I don't think it eats an error with our flags in recent kernels, but I'm not sure.

Right, there was some discussion of that, and I didn't (and still
don't) think it'd be wise to rely on undocumented knowledge about
which flags can eat errors based on a drive-by reading of a particular
snapshot of the Linux tree. The man page says it can return EIO; I
think we should assume that it might actually do that.

BTW I had a report from someone on IRC that PostgreSQL breaks in other
ways (not yet understood) if you build it directly on WSL/Ubuntu. I
guess the OP is reporting about a .deb that was built on a real Linux
system. I'm vaguely familiar with these types of problems from other
platforms (Linux syscall emulation on FreeBSD and Sun-ish systems, and
also I'm old enough to remember people doing SCO SysV syscall
emulation on Linux systems back before certain valuable software was
available natively); it's possible that you get ENOSYS on other
emulators too, considering that other kernels don't seem to have a
sync_file_range()-like facility, but probably no one cares, since
there is no reason to run PostgreSQL on a syscall emulator when you
can run it natively. This is a bit different though: I guess people
want to be able to develop Linux-stack stuff on company-issued Windows
computers for later deployment on Linux servers; someone interested in
this would ideally make it work and set up a build farm animal to tell
us when we break it. It would probably require only minimal changes,
but considering that no one bothered to complain about PostgreSQL
spewing scary looking warnings on WSL for years, it's not too
surprising that we didn't consider this case before. A bit like the
nightjar case, the PANIC patch revealed a pre-existing problem that
had gone unreported and needs some work, but it doesn't seem like a
very good reason to roll back that part of the change completely IMHO.

--
Thomas Munro
http://www.enterprisedb.com

#15Bruce Klein
brucek@gmail.com
In reply to: Thomas Munro (#14)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

I guess the OP is reporting about a .deb that was built on a real Linux

system

Yes, I (OP) installed via:
% wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc |
sudo apt-key add -
% sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/
$(lsb_release -sc)-pgdg main" > /etc/apt/sources.list.d/PostgreSQL.list'
% sudo apt update
% sudo apt-get install postgresql-11

no one bothered to complain about PostgreSQL spewing scary looking

warnings on WSL for years

At least you weren't spamming a once-per-second(!) log entry about a
missing function call like one of my other packages did (can't remember,
maybe it was nginx?)

WSL still feels early and if you're going to try it, you get used to
annoyances like that. I'm glad Microsoft is trying though and I hope with
time and support they get all the way there because developers who have
enterprise or other reasons to be on Windows instead of Mac desktops
deserve to have decent unix tools too. Warts and all I still find it
overall more convenient and fluid than my previous VirtualBox / vagrant
solution.

On Fri, Feb 15, 2019 at 11:20 AM Thomas Munro <thomas.munro@enterprisedb.com>
wrote:

Show quoted text

On Sat, Feb 16, 2019 at 6:50 AM Andres Freund <andres@anarazel.de> wrote:

On February 15, 2019 9:44:50 AM PST, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

On February 15, 2019 9:13:10 AM PST, Tom Lane <tgl@sss.pgh.pa.us>

wrote:

I'm of the opinion that we shouldn't be panicking for

sync_file_range

failure, period.

With some flags it's strictly required, it does"eat"errors depending

on the flags. So I'm not sure I understand?

Really? The specification says that it starts I/O, not that it waits
around for any to finish.

That depends on the flags you pass in. By memory I don't think it eats

an error with our flags in recent kernels, but I'm not sure.

Right, there was some discussion of that, and I didn't (and still
don't) think it'd be wise to rely on undocumented knowledge about
which flags can eat errors based on a drive-by reading of a particular
snapshot of the Linux tree. The man page says it can return EIO; I
think we should assume that it might actually do that.

BTW I had a report from someone on IRC that PostgreSQL breaks in other
ways (not yet understood) if you build it directly on WSL/Ubuntu. I
guess the OP is reporting about a .deb that was built on a real Linux
system. I'm vaguely familiar with these types of problems from other
platforms (Linux syscall emulation on FreeBSD and Sun-ish systems, and
also I'm old enough to remember people doing SCO SysV syscall
emulation on Linux systems back before certain valuable software was
available natively); it's possible that you get ENOSYS on other
emulators too, considering that other kernels don't seem to have a
sync_file_range()-like facility, but probably no one cares, since
there is no reason to run PostgreSQL on a syscall emulator when you
can run it natively. This is a bit different though: I guess people
want to be able to develop Linux-stack stuff on company-issued Windows
computers for later deployment on Linux servers; someone interested in
this would ideally make it work and set up a build farm animal to tell
us when we break it. It would probably require only minimal changes,
but considering that no one bothered to complain about PostgreSQL
spewing scary looking warnings on WSL for years, it's not too
surprising that we didn't consider this case before. A bit like the
nightjar case, the PANIC patch revealed a pre-existing problem that
had gone unreported and needs some work, but it doesn't seem like a
very good reason to roll back that part of the change completely IMHO.

--
Thomas Munro
http://www.enterprisedb.com

#16Ron
ronljohnsonjr@gmail.com
In reply to: Bruce Klein (#15)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On 2/15/19 4:04 PM, Bruce Klein wrote:
[snip]

I'm glad Microsoft is trying though

If Steve "Linux is a cancer" Ballmer were dead, he's be spinning in his grave...

--
Angular momentum makes the world go 'round.

#17Hans Schou
hans.schou@gmail.com
In reply to: Bruce Klein (#1)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On Fri, Feb 15, 2019 at 1:34 AM Bruce Klein <brucek@gmail.com> wrote:

If you are running Postgres inside Microsoft WSL

https://docs.microsoft.com/en-us/windows/wsl/faq
Who is WSL for?
This is primarily a tool for developers ...
-----------------------

One problem with WSL is that the I/O performance is not good and it might
never be solved. So using WSL for production is not what it was ment for.

WSL is called a "compatibility layer". When running WSL there is no Linux
kernel despite "uname" say so. Like WINE, where one can run Windows
binaries on Linux but there is no Windows OS.
https://en.wikipedia.org/wiki/Compatibility_layer

That said, WSL is a great tool for developers. Better than Cygwin.

./hans

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#14)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Really? The specification says that it starts I/O, not that it waits
around for any to finish.

Right, there was some discussion of that, and I didn't (and still
don't) think it'd be wise to rely on undocumented knowledge about
which flags can eat errors based on a drive-by reading of a particular
snapshot of the Linux tree. The man page says it can return EIO; I
think we should assume that it might actually do that.

I had a thought about this: maybe we should restrict the scope of this
behavior to be "panic on EIO", not "panic on anything within hailing
distance of fsync".

The direction you and Andres seem to want to go in is to add a pile of
unprincipled exception cases, which seems like a recipe for constant
pain to me. I think we might be better off with a whitelist of errnos
that mean trouble, instead of a blacklist of some that don't. I'm
especially troubled by the idea that blacklisting some errnos might
reduce to ignoring them completely, which would be a step backwards
from our pre-PANIC behavior.

regards, tom lane

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#18)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

On Sun, Feb 17, 2019 at 4:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

Really? The specification says that it starts I/O, not that it waits
around for any to finish.

Right, there was some discussion of that, and I didn't (and still
don't) think it'd be wise to rely on undocumented knowledge about
which flags can eat errors based on a drive-by reading of a particular
snapshot of the Linux tree. The man page says it can return EIO; I
think we should assume that it might actually do that.

I had a thought about this: maybe we should restrict the scope of this
behavior to be "panic on EIO", not "panic on anything within hailing
distance of fsync".

The direction you and Andres seem to want to go in is to add a pile of
unprincipled exception cases, which seems like a recipe for constant
pain to me. I think we might be better off with a whitelist of errnos
that mean trouble, instead of a blacklist of some that don't. I'm
especially troubled by the idea that blacklisting some errnos might
reduce to ignoring them completely, which would be a step backwards
from our pre-PANIC behavior.

Hmm. Well, at least ENOSPC should be treated the same way as EIO.
Here's an experiment that seems to confirm some speculations about NFS
on Linux from the earlier threads:

$ uname -a
Linux debian 4.18.0-3-amd64 #1 SMP Debian 4.18.20-2 (2018-11-23)
x86_64 GNU/Linux
$ dpkg -l nfs-kernel-server | tail -1
ii nfs-kernel-server 1:1.3.4-2.4 amd64 support for NFS kernel server

First, set up a 10MB loop-back filesystem:

$ dd if=/dev/zero of=/tmp/10mb.loopback bs=1024 count=10000
$ sudo losetup /dev/loop0 /tmp/10mb.loopback
$ sudo mkfs -t ext3 -m 1 -v /dev/loop0
...
$ sudo mkdir /mnt/test_loopback
$ sudo mount -t ext3 /dev/loop0 /mnt/test_loopback

Then, export that via NFS:

$ tail -1 /etc/exports
/mnt/test_loopback localhost(rw,sync,no_subtree_check)
$ sudo exportfs -av
exporting localhost:/mnt/test_loopback

Next, mount that over NFS:

$ sudo mkdir /mnt/test_loopback_remote
$ sudo mount localhost:/mnt/test_loopback /mnt/test_loopback_remote

Now, fill up the whole disk with a file full of newlines:

$ sudo mkdir /mnt/test_loopback/dir
$ sudo chown $USER:$USER /mnt/test_loopback/dir
$ tr "\000" "\n" < /dev/zero > /mnt/test_loopback_remote/dir/file
tr: write error: No space left on device
tr: write error
$ df -h /mnt/test_loopback*
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 8.5M 8.4M 0 100% /mnt/test_loopback
localhost:/mnt/test_loopback 8.5M 8.4M 0 100% /mnt/test_loopback_remote

Now, run a program that appends a greeting and then calls fsync() twice:

$ cat test.c
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
int fd, rc;

fd = open("/mnt/test_loopback_remote/dir/file", O_RDWR | O_APPEND);
if (fd < 0)
{
perror("open");
return 1;
}
rc = write(fd, "hello world\n", 12);
if (rc < 0)
perror("write");
else if (rc < 12)
fprintf(stderr, "only managed to write %d bytes\n", rc);
rc = fsync(fd);
if (rc < 0)
perror("fsync 1");
rc = fsync(fd);
if (rc < 0)
perror("fsync 2");
rc = close(fd);
if (rc < 0)
perror("close");

return 0;
}
$ cc test.c
$ ./a.out
fsync 1: No space left on device
$

The write() and the second fsync() reported success. Great, let's go
and look at our precious data, both through NFS and locally:

$ tail -3 /mnt/test_loopback_remote/dir/file

$ tail -3 /mnt/test_loopback/dir/file

$

It's gone. If you try it again with a file containing just a few
newlines so there is free space, it works correctly and you see the
appended greeting. Perhaps the same sort of thing might happen with
remote EDQUOT, but I haven't tried that. Perhaps there are some
things that could be tuned that would avoid that?

(Some speculation about NFS: To avoid data-loss from running out of
disk space, I think PostgreSQL requires either a filesystem that
reserves space when we're extending a file, so that we can exclude the
possibility of ENOSPC before we evict data from our own shared
buffers, or a page cache that doesn't drop dirty flags or whole
buffers on failure so we can meaningfully retry once space becomes
available. As far as I know, the former would be theoretically
possible with NFS, if the client and server are using NFSv4.2+ with
ALLOCATE support and glibc and kernel versions both support true
fallocate() and pass it all the way through, but current versions
either don't support fallocate() on NFS files at all (this 4.18 kernel
doesn't) or sometimes emulate it by writing zeroes, which is useless
for remote space reservation purposes and (according to some sources I
found) there is currently no reliable way to find out about that
though libc. If that situation improves, we'd still need to do
explicit fallocate() on our side to reserve space, and it'd probably
be slow so we might have to work in bigger chunks to amortise the
latency.)

Returning to your question of how to decide whether to have an errno
include-list or an exclude-list for our new panic behaviour, I think
we should tolerate ENOSYS as a special case for sync_file_range()
only, because:

1. We don't actually need sync_file_range() at all for correct
operation (unlike fsync()).
2. ENOSYS is the only errno that very explicitly says "this didn't go
anywhere near Linux filesystem code". Therefore, it definitely didn't
eat any errors relating to jettisoned data. (Ok, perhaps EBADF and
EINVAL tell you something similar, but they also imply a serious bug
in the calling code, having lost track of file descriptors or
something.)

So far I still think that we should panic if fsync() returns any error
number at all. For sync_file_range(), it sounds like maybe you think
we should leave the warning-spewing in there for ENOSYS, to do exactly
what we did before on principle since that's what back-branches are
all about? Something like:

ereport(errno == ENOSYS ? WARNING : data_sync_elevel(WARNING),

Perhaps for master we could skip it completely, or somehow warn just
once, and/or switch to one of our other implementations at runtime? I
don't really have a strong view on that, not being a user of that
system. Will they ever implement it? Are there other systems we care
about that don't implement it? (Android?)

--
Thomas Munro
http://www.enterprisedb.com

#20Ravi Krishna
tml.rkrishna@gmail.com
In reply to: Tom Lane (#18)
Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

If this one appears in the list, then it means the problem is with AOL.

#21Andres Freund
andres@anarazel.de
In reply to: Thomas Munro (#19)
#22Michael Paquier
michael@paquier.xyz
In reply to: Andres Freund (#21)
#23Thomas Munro
thomas.munro@gmail.com
In reply to: Michael Paquier (#22)
#24Ravi Krishna
srkrishna@yahoo.com
In reply to: Michael Paquier (#22)
#25Andres Freund
andres@anarazel.de
In reply to: Ravi Krishna (#24)
#26Thomas Munro
thomas.munro@gmail.com
In reply to: Andres Freund (#25)
#27James Sewell
james.sewell@jirotech.com
In reply to: Thomas Munro (#26)
#28Andres Freund
andres@anarazel.de
In reply to: James Sewell (#27)
#29James Sewell
james.sewell@jirotech.com
In reply to: Andres Freund (#28)
#30James Sewell
james.sewell@jirotech.com
In reply to: Thomas Munro (#26)
#31Thomas Munro
thomas.munro@gmail.com
In reply to: James Sewell (#30)
#32Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#31)
#33Bruce Klein
brucek@gmail.com
In reply to: Thomas Munro (#32)
#34Thomas Munro
thomas.munro@gmail.com
In reply to: Bruce Klein (#33)
#35Pablo Hendrickx
pablo.hendrickx@exitas.be
In reply to: Thomas Munro (#31)