BUG #14199: The pg_ctl status check on server start is not compatible with the silent_mode=on

Started by Maxim Sobolevalmost 10 years ago4 messageshackersbugs
Jump to latest
#1Maxim Sobolev
sobomax@freebsd.org
hackersbugs

The following bug has been logged on the website:

Bug reference: 14199
Logged by: Maksym Sobolyev
Email address: sobomax@freebsd.org
PostgreSQL version: 9.1.22
Operating system: FreeBSD 10.3-RELEASE amd64
Description:

There is a problem with pg_ctl when it tries to start server with the
"silent_mode=on" option enabled in postgresql.conf. Specifically, this
option causes postgres to fork once more after start. There are two problems
caused by that:

1. The pm_pid recorded by the pg_ctl when doing fork+execve no longer
matches the PID in the postmaster.pid file. This causes pg_ctl bail out
immediately.

2. Method that pg_ctl uses to poll if postgres exited prematurely no longer
works. In POSIX "child of my child is not my child", therefore it is
impossible to waitpid() on that process even if we use a correct PID from
the postmaster.pid file, while waitpid() on the original process would cause
race condition, since that process just does fork() and exit, so by the time
when real postgres has a chance to fully populate postmaster.pid it might
already be gone.

Attached patch fixes that issue by changing the way pg_ctl polls on the
child status. Instead of using waitpid(), which as described above could not
work even in principle for the "grand-children" processes, we create a
socketpair (i.e. pipe) one end of which is then passed into forked pg_ctl
and hence inherited by the postgres itself after execve.

In the unlikely event of postgres exiting prematurely that pipe would get
closed by the kernel and so that the pg_ctl would get EOF trying to do
non-blocking read on its own end, thereby being able to bail out quickly
instead of waiting for the timeout to happen. This should work nicely no
matter how many times child forks after execve().

This problem is exposed by the fact that FreeBSD port has that option set in
its default server configuration file for the version 9.1 .

https://svnweb.freebsd.org/ports/head/databases/postgresql91-server/files/patch-src%3Abackend%3Autils%3Amisc%3Apostgresql.conf.sample?annotate=340725

For some reason it was not a problem until recently. I think it might be
brought into the limelight by some internal changes in the PG's handling of
the said option.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Maxim Sobolev (#1)
hackersbugs
Re: [BUGS] BUG #14199: The pg_ctl status check on server start is not compatible with the silent_mode=on

sobomax@freebsd.org writes:

There is a problem with pg_ctl when it tries to start server with the
"silent_mode=on" option enabled in postgresql.conf. Specifically, this
option causes postgres to fork once more after start. There are two problems
caused by that:
1. The pm_pid recorded by the pg_ctl when doing fork+execve no longer
matches the PID in the postmaster.pid file. This causes pg_ctl bail out
immediately.
2. Method that pg_ctl uses to poll if postgres exited prematurely no longer
works.

For some reason it was not a problem until recently.

After reviewing recent commits, I realize that this was caused by
https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c869a7d5b
While generally that was a good thing, in hindsight it's obvious that
it doesn't work with silent_mode. That's a non-issue in 9.2 and later
since silent_mode is gone anyway, but it is an issue for 9.1.

Attached patch fixes that issue by changing the way pg_ctl polls on the
child status.

There is no need of this complication in >= 9.2. We could maybe apply it
in the 9.1 branch only, but I am quite loath to accept such a nontrivial
and portability-sensitive change that way. The main reason being that
9.1 is almost EOL: its next minor release might well be its last. If
there's anything wrong with this approach, we may not find out about it
until after 9.1 is out of support and won't get patched anymore.

What seems like a more conservative answer to me is to revert c869a7d5b
in 9.1 only, and address the buildfarm stability issue it sought to
resolve by increasing the fixed timeout from 5 seconds to, say, 10.

Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Michael Paquier
michael@paquier.xyz
In reply to: Tom Lane (#2)
hackersbugs
Re: [BUGS] BUG #14199: The pg_ctl status check on server start is not compatible with the silent_mode=on

On Sun, Jun 19, 2016 at 12:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What seems like a more conservative answer to me is to revert c869a7d5b
in 9.1 only, and address the buildfarm stability issue it sought to
resolve by increasing the fixed timeout from 5 seconds to, say, 10.

+1 for doing that. Knowing that silent_mode will be out of community
support scope in a couple of months, that's the right answer to this
bug call.
-- 
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#3)
hackersbugs
Re: [HACKERS] Re: BUG #14199: The pg_ctl status check on server start is not compatible with the silent_mode=on

Michael Paquier <michael.paquier@gmail.com> writes:

On Sun, Jun 19, 2016 at 12:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What seems like a more conservative answer to me is to revert c869a7d5b
in 9.1 only, and address the buildfarm stability issue it sought to
resolve by increasing the fixed timeout from 5 seconds to, say, 10.

+1 for doing that. Knowing that silent_mode will be out of community
support scope in a couple of months, that's the right answer to this
bug call.

Done that way.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs