EINTR error in SunOS

Started by Qingqing Zhouover 20 years ago33 messageshackers

zhouqq@cs.toronto.edu

over 20 years ago

I encountered an error today (can't repeat) on SunOS 5.8:

  --test that we read consecutive LFs properly
  CREATE TEMP TABLE testnl (a int, b text, c int);
+ ERROR:  could not open relation 1663/16384/37713: Interrupted system call

The reason I guess is the open() call is interrupted by a signal (what
signal BTW?). This error may be specific to SunOS/Solaris, but POSIX does
say that an EINTR is possible on open(), close(), read(), write() and also
the fopen() family:

http://www.opengroup.org/onlinepubs/007908799/xsh/open.html

We have patched read()/write(), shall we do so to open()/close() and also
fopen() family? Patching files other than fd.c seems unnecessary for two
reasons: (1) they are not frequently exercised; (2) they don't have the
basic errno-check code there.

Regards,
Qingqing

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Qingqing Zhou (#1)

Re: EINTR error in SunOS

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

+ ERROR: could not open relation 1663/16384/37713: Interrupted system call

The reason I guess is the open() call is interrupted by a signal (what
signal BTW?).

I've heard of this in connection with NFS ... is your DB on an NFS
filesystem by any chance?

regards, tom lane

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Qingqing Zhou (#1)

Re: EINTR error in SunOS

"Tom Lane" <tgl@sss.pgh.pa.us> wrote

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

+ ERROR: could not open relation 1663/16384/37713: Interrupted system
call

The reason I guess is the open() call is interrupted by a signal (what
signal BTW?).

I've heard of this in connection with NFS ... is your DB on an NFS
filesystem by any chance?

Exactly. I guess school machines love NFS.

Regards,
Qingqing

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Tom Lane (#2)

Re: EINTR error in SunOS

On Fri, 30 Dec 2005, Tom Lane wrote:

I've heard of this in connection with NFS ... is your DB on an NFS
filesystem by any chance?

I have patched IO routines in backend/storage that POSIX says EINTR is
possible except unlink(). Though POSIX says EINTR is not possible, during
many regressions, I found it sometimes sets this errno on NFS (I still
don't know where is the smoking-gun):

TRUNCATE TABLE trunc_c,trunc_d,trunc_e; -- ok
+ WARNING: could not remove relation 1663/16384/37822: Interrupted system call

There are many other unlink() scattered in backend, some even without
error check. Shall we patch pg_unlink for this situation and replace them
like this:

pg_unlink(const char* path, int errlevel)
{
retry:
returnCode = unlink(path);
if (returnCode < 0 && errno==EINTR)
goto retry;

if other_errors
elog(elevel, ...);

return returnCode;
}

pg_unlink(const char* path)
{
/* no elog -- but we still have to do error check */
}

let it be ...

If we decide to do something for unlink(), then we'd better do something
for other EINTR-possible IO routines for fairness :-)

By the way, seems POSIX is not very consistent with EINTR. For example,
closedir() can set EINTR, but opendir()/readdir() can't. Any magic in it?

Regards,
Qingqing

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Qingqing Zhou (#4)

Re: EINTR error in SunOS

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

On Fri, 30 Dec 2005, Tom Lane wrote:

I've heard of this in connection with NFS ... is your DB on an NFS
filesystem by any chance?

I have patched IO routines in backend/storage that POSIX says EINTR is
possible except unlink(). Though POSIX says EINTR is not possible, during
many regressions, I found it sometimes sets this errno on NFS (I still
don't know where is the smoking-gun):

Well there is a reason intr is not the default for NFS mounts. It's precisely
because it breaks the traditional unix filesystem interface. Syscalls that
historically are not interruptible become interruptible and not all programs
behave properly when that occurs.

In any case POSIX explicitly allows functions to return other errors aside
from those specified as long as it's for error conditions not listed.

[Chapter 2 Section 3, paragraph 6]

Implementations may support additional errors not included in this list, may
generate errors included in this list under circumstances other than those
described here, or may contain extensions or limitations that prevent some
errors from occurring. The ERRORS section on each reference page specifies
whether an error shall be returned, or whether it may be returned.
Implementations shall not generate a different error number from the ones
described here for error conditions described in this volume of IEEE Std
1003.1-2001, but may generate additional errors unless explicitly disallowed
for a particular function

Ironically EINTR *is* singled out to be specifically forbidden to be returned
from some system calls but only those in the Threads option which are mostly
pthread* functions. unlink isn't covered by that prohibition.

--
greg

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Bruce Momjian (#5)

Re: EINTR error in SunOS

Greg Stark <gsstark@mit.edu> writes:

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

I have patched IO routines in backend/storage that POSIX says EINTR is
possible except unlink(). Though POSIX says EINTR is not possible, during
many regressions, I found it sometimes sets this errno on NFS (I still
don't know where is the smoking-gun):

Well there is a reason intr is not the default for NFS mounts. It's precisely
because it breaks the traditional unix filesystem interface.

Yeah. We have looked at this before and decided that trying to defend
against it is too invasive and too fragile (how will you ever be sure
you've fixed everyplace, or keep other places from sneaking in later?)

What I'd rather do is document prominently that running a DB over NFS
isn't recommended, and running it over NFS with interrupts allowed is
just not going to work.

regards, tom lane

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Tom Lane (#6)

Re: EINTR error in SunOS

On Sat, 31 Dec 2005, Tom Lane wrote:

What I'd rather do is document prominently that running a DB over NFS
isn't recommended, and running it over NFS with interrupts allowed is
just not going to work.

Agreed. IO syscalls is not the only problem for NFS -- if we can't fix
them in a run, then don't do it.

Regards,
Qingqing

Rod Taylor

rbt@rbt.ca

over 20 years ago

In reply to: Tom Lane (#6)

Re: EINTR error in SunOS

On Sat, 2005-12-31 at 14:40 -0500, Tom Lane wrote:

Greg Stark <gsstark@mit.edu> writes:

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

I have patched IO routines in backend/storage that POSIX says EINTR is
possible except unlink(). Though POSIX says EINTR is not possible, during
many regressions, I found it sometimes sets this errno on NFS (I still
don't know where is the smoking-gun):

Well there is a reason intr is not the default for NFS mounts. It's precisely
because it breaks the traditional unix filesystem interface.

What I'd rather do is document prominently that running a DB over NFS
isn't recommended, and running it over NFS with interrupts allowed is
just not going to work.

Are there issues with having an archive_command which does things with
NFS based filesystems?

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Qingqing Zhou (#7)

Re: EINTR error in SunOS

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

On Sat, 31 Dec 2005, Tom Lane wrote:

What I'd rather do is document prominently that running a DB over NFS
isn't recommended, and running it over NFS with interrupts allowed is
just not going to work.

Agreed. IO syscalls is not the only problem for NFS -- if we can't fix
them in a run, then don't do it.

I don't think that's reasonable. The NFS intr option breaks the traditional
unix filesystem semantics which breaks a lot of older or naive programs. But
that's no reason to decide that Postgres can't handle the new semantics.

Handling EINTR after all file system calls doesn't sound like it would be
terribly hard. And Postgres of all systems has the infrastructure necessary to
handle error conditions, abort and roll back the transaction when a file
system error occurs. I think mainly this means it would be possible to hit C-c
or shut down postgres (uncleanly) when there's a network outage.

--
greg

#10

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Bruce Momjian (#9)

Re: EINTR error in SunOS

On Sat, 31 Dec 2005, Greg Stark wrote:

I don't think that's reasonable. The NFS intr option breaks the traditional
unix filesystem semantics which breaks a lot of older or naive programs. But
that's no reason to decide that Postgres can't handle the new semantics.

Is that by default the EINTR is truned off in NFS? If so, I don't see that
will be a problem. Sorry for my limited knowledge, is there any
requirements/benefits that people turn on EINTR?

Handling EINTR after all file system calls doesn't sound like it would be
terribly hard.

The problem is not restricted to file system. Actually my patched
version(only backend/storage) passed hundreds times of regression without
any problem, but EINTR can hurt other syscalls as well. Find out *all* the
EINTR situtations may need big efforts AFAICS.

Regards,
Qingqing

#11

Martijn van Oosterhout

kleptog@svana.org

over 20 years ago

In reply to: Qingqing Zhou (#10)

Re: EINTR error in SunOS

On Sat, Dec 31, 2005 at 04:46:02PM -0500, Qingqing Zhou wrote:

Is that by default the EINTR is truned off in NFS? If so, I don't see that
will be a problem. Sorry for my limited knowledge, is there any
requirements/benefits that people turn on EINTR?

I wont speak for anyone else, but the reason I set intr on for NFS
mounts is so that if I turn off the file server I don't get unkillable
processes on the client. Messy sure, and maybe there's a better
solution made since but I really don't like processes stuck in D state
(ie kill -9 won't work). Better the program die in some wierd way than
that...

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
tool for doing 5% of the work and then sitting around waiting for someone
else to do the other 95% so you can sue them.

#12

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Qingqing Zhou (#10)

Re: EINTR error in SunOS

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

On Sat, 31 Dec 2005, Greg Stark wrote:

I don't think that's reasonable. The NFS intr option breaks the traditional
unix filesystem semantics which breaks a lot of older or naive programs. But
that's no reason to decide that Postgres can't handle the new semantics.

Is that by default the EINTR is truned off in NFS? If so, I don't see that
will be a problem. Sorry for my limited knowledge, is there any
requirements/benefits that people turn on EINTR?

That's why the "intr" option (and the "soft") option has traditionally not
been enabled by default in NFS implementations. But many people don't like
that when their NFS server disappears their client applications become
unkillable. They like to be able to hit C-c and stop whatever is running.

In the case of Postgres having "intr" off on the NFS mount point would mean
you couldn't C-c a query stuck because the database is on NFS. Of course it's
not like you would be able to run any more queries after that, but you might
want your terminal back.

You wouldn't even be able to shut down Postgres, even with kill -9. If your
NFS server is unrecoverable and you want to bring up a Postgres instance using
a backup restored some other place you would have to bring it up on another
port or reboot your machine.

That's the kind of thing that leads lots of sysadmins to use the "intr" and
"soft" options. And those sysadmins generally aren't aware of these kinds of
consequences since it's more of a programming level issue.

Handling EINTR after all file system calls doesn't sound like it would be
terribly hard.

The problem is not restricted to file system. Actually my patched
version(only backend/storage) passed hundreds times of regression without
any problem, but EINTR can hurt other syscalls as well. Find out *all* the
EINTR situtations may need big efforts AFAICS.

Well NFS is only going to affect filesystem calls. If there are other syscalls
that can signal EINTR on some obscure platform where Postgres isn't handling
it then that's just a run-of-the-mill porting issue.

But like I mentioned in the other thread POSIX is of no help here. With the
exception of the pthreads syscalls POSIX doesn't prohibit functions from
signalling errors other than the ones documented in the specification. So in
other words, just about any function can signal just about any error including
errors that are proprietary additions any time. Good luck :)

--
greg

#13

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Bruce Momjian (#12)

Re: EINTR error in SunOS

On Sat, 31 Dec 2005, Greg Stark wrote:

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

Is that by default the EINTR is truned off in NFS? If so, I don't see that
will be a problem. Sorry for my limited knowledge, is there any
requirements/benefits that people turn on EINTR?

That's why the "intr" option (and the "soft") option has traditionally not
been enabled by default in NFS implementations. But many people don't like
that when their NFS server disappears their client applications become
unkillable. They like to be able to hit C-c and stop whatever is running.

Thanks Greg and Martin, I now understand better of intr :-) So we can
killed Postgres or not depends on our signal handler. Query Cancel signal
won't work because "ImmediateInterruptOK" forbids it and the retry style
code in read/write will put the Postgres process into uninterruptable
sleep again. But die signal will work I think.

Regards,
Qingqing

#14

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Rod Taylor (#8)

Re: EINTR error in SunOS

Rod Taylor <pg@rbt.ca> writes:

Are there issues with having an archive_command which does things with
NFS based filesystems?

Well, whatever command you use for archive_command -- probably just "cp" if
you're using NFS would hang if the NFS server went away. What would happen
then might be interesting. If Postgres finds the archive_command hanging
indefinitely will it correctly avoid recycling the WAL log indefinitely? I
assume so.

What's nonoptimal here is that I don't think there would be any warning that
anything was wrong until the WAL logs eventually filled up their filesystem
and then postgres stopped running. In the meantime your archived WAL logs
would be getting older and older and you would have no indication that
anything was failing.

This was the intention with the NFS error handling. The theory being that
eventually the server comes back up and things resume functioning exactly
where they left off with no lost operations. The upside is you don't have
things failing, then resuming later and unhandled errors in the meantime
leading to data corruption. The downside is there's no way for "cp" and
ultimately Postgres to know anything's wrong except to have a timeout itself
and an arbitrary maximum amount of time to expect operations to take.

--
greg

#15

Doug Royer

Doug@Royer.com

over 20 years ago

In reply to: Bruce Momjian (#12)

Re: EINTR error in SunOS

EINTR on read() or write() is not unique to NFS.
It can happen on many file systems - it is just seen
less frequently on most of them.

The code should be able to handle ANY valid read()
and write() errno. And EINTR is documented on Linux, BSD,
Solaris (1 and 2), and POSIX.

Even the Linux man pages can return ENTER on read() and
write(). This can happen on soft-mirrors, SCSI disks, and SOME
other disk drivers when they have errors.

The 'intr' option to NFS is not the same as EINTR. It
it means 'if the server does not respond for a while,
then return an EINTR', just like any other disk read()
or write() does when it fails to reply.

I have seen lots of open source code that assumes that all
disk reads and writs work 100% or fail 100%. Many do not
check the return value to see if all data was written or
read from disk. And many do not look at errno at all.
I have NOT looked to see how postgres does it.

If storage/*.c is where the reads occur, it does
very LITTLE when checking for errors.

Handling EINTR after all file system calls doesn't sound like it would be
terribly hard.

The problem is not restricted to file system. Actually my patched
version(only backend/storage) passed hundreds times of regression without
any problem, but EINTR can hurt other syscalls as well. Find out *all* the
EINTR situtations may need big efforts AFAICS.

Well NFS is only going to affect filesystem calls. If there are other syscalls
that can signal EINTR on some obscure platform where Postgres isn't handling
it then that's just a run-of-the-mill porting issue.

But like I mentioned in the other thread POSIX is of no help here. With the
exception of the pthreads syscalls POSIX doesn't prohibit functions from
signalling errors other than the ones documented in the specification. So in
other words, just about any function can signal just about any error including
errors that are proprietary additions any time. Good luck :)

Doug Royer | http://INET-Consulting.com
-------------------------------|-----------------------------

We Do Standards - You Need Standards

#16

Doug McNaught

doug@mcnaught.org

over 20 years ago

In reply to: Doug Royer (#15)

Re: EINTR error in SunOS

Doug Royer <Doug@Royer.com> writes:

The 'intr' option to NFS is not the same as EINTR. It
it means 'if the server does not respond for a while,
then return an EINTR', just like any other disk read()
or write() does when it fails to reply.

No, you're thinking of 'soft'. 'intr' (which is actually a modifier
to the 'hard' setting) causes the I/O to hang until the server comes
back or the process gets a signal (in which case EINTR is returned).

-Doug

#17

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Qingqing Zhou (#1)

Re: EINTR error in SunOS

"Greg Stark" <gsstark@mit.edu> wrote

Well NFS is only going to affect filesystem calls. If there are other
syscalls
that can signal EINTR on some obscure platform where Postgres isn't
handling
it then that's just a run-of-the-mill porting issue.

Ok, NFS just affects filesystem calls(I mix it with another problem). If
possible, I hope we can draw some conclusion / schetch a fix plan here for
future developers who want to come up with a patch. The question is:

Where and how should we fix exactly in order to incorporate intr NFS in
server side?

More details we write down here, more feasible/infeasible plan we can get. I
could think of these places:

+ direct file system calls
    - open() family, fopen() family in backend/storage
    - scattered open() etc in the whole backend (seems unlink is with 
biggest problem)

The problem of above is if a signal sneaks in, these syscalls will fail.
With a retry, we can fix it.

+ indirect file system calls
    - system("xxx") calls, xxx = cp, etc.

If intr NFS is enabled, what's the problem exactly?

Any others?

Regards,
Qingqing

#18

Bruce Momjian

bruce@momjian.us

over 20 years ago

In reply to: Qingqing Zhou (#17)

Re: EINTR error in SunOS

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:

The problem of above is if a signal sneaks in, these syscalls will fail.
With a retry, we can fix it.

It's a bit stickier than that but only a bit. If you just retry then you're
saying users have to use kill -9 to get away from the situation. For some
filesystem operations that may be the best we can do. But for most it ought to
be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c
or kill -1 normally. Even having the single backend exit (to avoid file
resource leaks) is nicer than having to restart the entire instance.

--
greg

#19

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Bruce Momjian (#18)

Re: EINTR error in SunOS

On Sun, 1 Jan 2006, Greg Stark wrote:

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:

The problem of above is if a signal sneaks in, these syscalls will fail.
With a retry, we can fix it.

It's a bit stickier than that but only a bit. If you just retry then you're
saying users have to use kill -9 to get away from the situation. For some
filesystem operations that may be the best we can do. But for most it ought to
be possible to CHECK_FOR_INTERRUPTS() and handle the regular signals like C-c
or kill -1 normally. Even having the single backend exit (to avoid file
resource leaks) is nicer than having to restart the entire instance.

I understand put a CHECK_FOR_INTERRUPTS() in the retry-loop may make more
graceful stop, but it won't work in some cases -- notice that the io
routines we will patch can be used before the signal mechanism is setup.

Regards,
Qingqing

#20

Tom Lane

tgl@sss.pgh.pa.us

over 20 years ago

In reply to: Qingqing Zhou (#19)

Re: EINTR error in SunOS

Qingqing Zhou <zhouqq@cs.toronto.edu> writes:

I understand put a CHECK_FOR_INTERRUPTS() in the retry-loop may make more
graceful stop, but it won't work in some cases -- notice that the io
routines we will patch can be used before the signal mechanism is setup.

I don't think it will help much at all: too many of the operations in
question are invoked in places where CHECK_FOR_INTERRUPTS is a no-op.
Examples:
* disk writes are mostly done by the bgwriter and not backends at all
* unlinks are generally done during xact commit/rollback

Qingqing's point about failures in system()-invoked commands (think
archive_command for PITR) is a mighty good one too. That puts a
serious crimp into any illusion that we can really fix this in any
reliable way.

regards, tom lane

#21

Qingqing Zhou

zhouqq@cs.toronto.edu

over 20 years ago

In reply to: Tom Lane (#20)

#22

Doug Royer

Doug@Royer.com

over 20 years ago

In reply to: Doug McNaught (#16)

#23