dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

Started by Mario Emmenlauerover 5 years ago10 messageshackersgeneral
Jump to latest
#1Mario Emmenlauer
mario@emmenlauer.de
hackersgeneral

Dear All,

I've used PostgreSQL since version 9.x successfully on Linux, macOS
and Windows. Today I've upgraded from 12.3 to 13.0 and suddenly I can
not start the server any more on Ubuntu 20.04 (inside Docker on Ubuntu
18.04) and on macOS 10.15.

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING: dup(0) failed after 0 successes: Bad file descriptor
2020-10-05 11:48:19.720 CEST [84731] FATAL: insufficient file descriptors available to start server process
2020-10-05 11:48:19.720 CEST [84731] DETAIL: System allows 0, we need at least 58.
2020-10-05 11:48:19.720 CEST [84731] LOG: database system is shut down

What makes this quite curious is that everything continues to work on
Ubuntu 18.04, and Windows with Visual Studio 2019.

I compile postgreSQL myself from source, but there are no patches or
tweaks involved (that I could think relevant for this problem).

I've searched for the particular error and understand that it is usually
caused by system limits on new files(?) But in my case, the failing
setups are relatively modern, powerful CI test machines, with virtually
no load at the time of test.

On Linux Ubuntu 20.04, `ulimit -n` shows `1048576` which seems also
relatively high (but must be the default, not changed by me).

Does this problem mean anything to anyone? I'm completely lost where
to go from here, or what to try next :-( Help would be greatly
appreciated!

All the best,

Mario Emmenlauer

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mario Emmenlauer (#1)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

Mario Emmenlauer <mario@emmenlauer.de> writes:

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING: dup(0) failed after 0 successes: Bad file descriptor

Hmph. That code loop assumes that stdin exists to be duplicated,
but maybe if it had been closed, you'd get this error.

However, that logic hasn't changed in decades, and we've not heard
complaints about it before. Are you starting the server in some
unusual way?

regards, tom lane

#3Mario Emmenlauer
mario@emmenlauer.de
In reply to: Mario Emmenlauer (#1)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

On 05.10.20 13:22, Mario Emmenlauer wrote:

I've used PostgreSQL since version 9.x successfully on Linux, macOS
and Windows. Today I've upgraded from 12.3 to 13.0 and suddenly I can
not start the server any more on Ubuntu 20.04 (inside Docker on Ubuntu
18.04) and on macOS 10.15.

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING: dup(0) failed after 0 successes: Bad file descriptor
2020-10-05 11:48:19.720 CEST [84731] FATAL: insufficient file descriptors available to start server process
2020-10-05 11:48:19.720 CEST [84731] DETAIL: System allows 0, we need at least 58.
2020-10-05 11:48:19.720 CEST [84731] LOG: database system is shut down

What makes this quite curious is that everything continues to work on
Ubuntu 18.04, and Windows with Visual Studio 2019.

I compile postgreSQL myself from source, but there are no patches or
tweaks involved (that I could think relevant for this problem).

I've searched for the particular error and understand that it is usually
caused by system limits on new files(?) But in my case, the failing
setups are relatively modern, powerful CI test machines, with virtually
no load at the time of test.

On Linux Ubuntu 20.04, `ulimit -n` shows `1048576` which seems also
relatively high (but must be the default, not changed by me).

Does this problem mean anything to anyone? I'm completely lost where
to go from here, or what to try next :-( Help would be greatly
appreciated!

I've just regression-tested 12.4 for the above problem, and I can now
isolate that 12.3 and 12.4 do _not_ give me above problem whereas 13.0
_does_ give me above problem. I tried also to isolate commits that may
be related, but could not find anything useful :-(

Help would be greatly appreciated!

All the best,

Mario Emmenlauer

#4Matthias Apitz
guru@unixarea.de
In reply to: Mario Emmenlauer (#3)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

El día lunes, octubre 05, 2020 a las 04:49:27p. m. +0200, Mario Emmenlauer escribió:

On 05.10.20 13:22, Mario Emmenlauer wrote:

I've used PostgreSQL since version 9.x successfully on Linux, macOS
and Windows. Today I've upgraded from 12.3 to 13.0 and suddenly I can
not start the server any more on Ubuntu 20.04 (inside Docker on Ubuntu
18.04) and on macOS 10.15.

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING: dup(0) failed after 0 successes: Bad file descriptor
2020-10-05 11:48:19.720 CEST [84731] FATAL: insufficient file descriptors available to start server process
2020-10-05 11:48:19.720 CEST [84731] DETAIL: System allows 0, we need at least 58.
2020-10-05 11:48:19.720 CEST [84731] LOG: database system is shut down

...

Can you try to catch the situation starting the server under strace,
perhaps with '-f' to follow childs. And look into the output ('-o outfile')
for the failing syscall.

matthias

--
Matthias Apitz, ✉ guru@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub
Без книги нет знания, без знания нет коммунизма (Влaдимир Ильич Ленин)
Without books no knowledge - without knowledge no communism (Vladimir Ilyich Lenin)
Sin libros no hay saber - sin saber no hay comunismo. (Vladimir Ilich Lenin)

#5Mario Emmenlauer
mario@emmenlauer.de
In reply to: Mario Emmenlauer (#1)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

On 05.10.20 14:35, Tom Lane wrote:

Mario Emmenlauer <mario@emmenlauer.de> writes:

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING: dup(0) failed after 0 successes: Bad file descriptor

Hmph. That code loop assumes that stdin exists to be duplicated,
but maybe if it had been closed, you'd get this error.

However, that logic hasn't changed in decades, and we've not heard
complaints about it before. Are you starting the server in some
unusual way?

Replying to a very old thread here: I could indeed trace the problem of
the failing `dup(0)` back to how we start the server! We start the
server from an executable that closes stdin very early on, and this
seems to lead to the problem.

We solved it by not closing stdin. But for future reference, if other
people may be affected by this, the code could probably be modified to
revert to stdout or stderr or a temporary file in case stdin is not
available (guessing here...).

All the best,

Mario Emmenlauer

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mario Emmenlauer (#5)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

[ redirecting to -hackers ]

Mario Emmenlauer <mario@emmenlauer.de> writes:

On 05.10.20 14:35, Tom Lane wrote:

Mario Emmenlauer <mario@emmenlauer.de> writes:

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING: dup(0) failed after 0 successes: Bad file descriptor

Hmph. That code loop assumes that stdin exists to be duplicated,
but maybe if it had been closed, you'd get this error.

However, that logic hasn't changed in decades, and we've not heard
complaints about it before. Are you starting the server in some
unusual way?

Replying to a very old thread here: I could indeed trace the problem of
the failing `dup(0)` back to how we start the server! We start the
server from an executable that closes stdin very early on, and this
seems to lead to the problem.
We solved it by not closing stdin. But for future reference, if other
people may be affected by this, the code could probably be modified to
revert to stdout or stderr or a temporary file in case stdin is not
available (guessing here...).

Hm. I'm tempted to propose that we simply change that from dup(0) to
dup(2). Formally, that's just moving the problem around. In practice
though, there are good reasons not to close the server's stderr, ie you
will lose error messages that might be important. OTOH there does not
seem to be any obvious reason why the server should need valid stdin,
so if we can get rid of an implementation artifact that makes it require
that, that seems like an improvement.

regards, tom lane

#7Mario Emmenlauer
mario@emmenlauer.de
In reply to: Tom Lane (#6)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

On 01.09.21 15:54, Tom Lane wrote:

Mario Emmenlauer <mario@emmenlauer.de> writes:

On 05.10.20 14:35, Tom Lane wrote:

Mario Emmenlauer <mario@emmenlauer.de> writes:

I get reproducibly the error:
2020-10-05 11:48:19.720 CEST [84731] WARNING: dup(0) failed after 0 successes: Bad file descriptor

Hmph. That code loop assumes that stdin exists to be duplicated,
but maybe if it had been closed, you'd get this error.

Replying to a very old thread here: I could indeed trace the problem of
the failing `dup(0)` back to how we start the server! We start the
server from an executable that closes stdin very early on, and this
seems to lead to the problem.

Hm. I'm tempted to propose that we simply change that from dup(0) to
dup(2). Formally, that's just moving the problem around. In practice
though, there are good reasons not to close the server's stderr, ie you
will lose error messages that might be important. OTOH there does not
seem to be any obvious reason why the server should need valid stdin,
so if we can get rid of an implementation artifact that makes it require
that, that seems like an improvement.

The idea to switch to dup(2) sounds very good to me. Also, while at it,
maybe the error message could be improved? The kids nowadays don't learn
so much about system I/O any more, and if someone does not know `dup()`,
then the error message is not very telling. It took me a while to under-
stand what the code was supposed to do. So it may be helpful to add to
the error message something like "possible the stderr stream is closed,
this is not supported". What do you think?

All the best,

Mario Emmenlauer

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mario Emmenlauer (#7)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

Mario Emmenlauer <mario@emmenlauer.de> writes:

The idea to switch to dup(2) sounds very good to me. Also, while at it,
maybe the error message could be improved? The kids nowadays don't learn
so much about system I/O any more, and if someone does not know `dup()`,
then the error message is not very telling. It took me a while to under-
stand what the code was supposed to do. So it may be helpful to add to
the error message something like "possible the stderr stream is closed,
this is not supported". What do you think?

Meh ... it's been like that for ~20 years and you're the first one
to complain, so I'm not inclined to make our translators spend effort
on a HINT message. However, we could reword it to the extent of, say,

elog(WARNING, "duplicating stderr failed after %d successes: %m", used);

which at least reduces the jargon level to something that Unix users
should have heard of.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mario Emmenlauer (#7)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

Mario Emmenlauer <mario@emmenlauer.de> writes:

The idea to switch to dup(2) sounds very good to me.

I poked at this some more, and verified that adding "fclose(stdin);"
at the head of PostmasterMain is enough to trigger the reported
failure. However, after changing fd.c to dup stderr not stdin,
we can pass check-world even with that in place. So that confirms
that there is no very good reason for the postmaster to require
stdin to be available.

Hence, I pushed the fix to make fd.c use stderr here. I only
back-patched to v14, because given the lack of other complaints,
I couldn't quite justify touching stable branches for this.

regards, tom lane

#10Mario Emmenlauer
mario@emmenlauer.de
In reply to: Tom Lane (#9)
hackersgeneral
Re: dup(0) fails on Ubuntu 20.04 and macOS 10.15 with 13.0

On 03.09.21 01:00, Tom Lane wrote:

Mario Emmenlauer <mario@emmenlauer.de> writes:

The idea to switch to dup(2) sounds very good to me.

I poked at this some more, and verified that adding "fclose(stdin);"
at the head of PostmasterMain is enough to trigger the reported
failure. However, after changing fd.c to dup stderr not stdin,
we can pass check-world even with that in place. So that confirms
that there is no very good reason for the postmaster to require
stdin to be available.

Hence, I pushed the fix to make fd.c use stderr here. I only
back-patched to v14, because given the lack of other complaints,
I couldn't quite justify touching stable branches for this.

Thanks a lot Tom, its appreciated!

All the best,

Mario Emmenlauer