We should Axe /contrib/start-scripts
... for the simple reason that nobody is maintaining it. Wheeler just
pointed out to me today that the OSX startup script hasn't been updated
since 7.4 and contains misinformation and dangerous scripting.
Other startup scripts there are equally dilapidated, and aren't used by
the linux distros regardless.
Unless someone volunteers to be *permanent* maintainer of this
directory, I vote that we remove it from 8.5.
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
Josh Berkus wrote:
... for the simple reason that nobody is maintaining it. Wheeler just
pointed out to me today that the OSX startup script hasn't been updated
since 7.4 and contains misinformation and dangerous scripting.Other startup scripts there are equally dilapidated, and aren't used by
the linux distros regardless.
IMHO, those scripts (at least the Linux one) has great value. There are
numerous people that compile and install from source, and for them at
least some of these scripts are used quite a bit.
AFAIK, the Linux one is functional (yes, sub-optimal in some ways, but
it works just fine)...
--
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC 27560
919-463-0999/877-258-8987
http://www.otg-nc.com
Chander Ganesan <chander@otg-nc.com> writes:
Josh Berkus wrote:
... for the simple reason that nobody is maintaining it. Wheeler just
pointed out to me today that the OSX startup script hasn't been updated
since 7.4 and contains misinformation and dangerous scripting.Other startup scripts there are equally dilapidated, and aren't used by
the linux distros regardless.IMHO, those scripts (at least the Linux one) has great value. There are
numerous people that compile and install from source, and for them at
least some of these scripts are used quite a bit.
AFAIK, the Linux one is functional (yes, sub-optimal in some ways, but
it works just fine)...
There are lots of files in our distribution that don't get changed for
years at a time. I think if there's something wrong with these, they
should get fixed. I'm certainly not prepared to remove them on the
basis of an unsubstantiated second-hand report of unspecified problems.
(Personally, I use scripts based on start-scripts/osx/ for a number of
services on my own machines, so if there's something wrong with them
I'd definitely like to know what it is.)
regards, tom lane
Tom,
(Personally, I use scripts based on start-scripts/osx/ for a number of
services on my own machines, so if there's something wrong with them
I'd definitely like to know what it is.)
I quote:
"# What to use to start up the postmaster (we do NOT use pg_ctl for this,
# as it adds no value and can cause the postmaster to misrecognize a stale
# lock file)
DAEMON="$prefix/bin/postmaster"
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
On Aug 19, 2009, at 11:48 AM, Tom Lane wrote:
(Personally, I use scripts based on start-scripts/osx/ for a number of
services on my own machines, so if there's something wrong with them
I'd definitely like to know what it is.)
+1. Please don't remove the start scripts. I use them on every system
on which I install PostgreSQL. They're very handy for those of us who
don't use distro packages.
But do fix them if they need it.
Thanks,
David
Josh Berkus wrote:
... for the simple reason that nobody is maintaining it. Wheeler just
pointed out to me today that the OSX startup script hasn't been updated
since 7.4 and contains misinformation and dangerous scripting.Other startup scripts there are equally dilapidated, and aren't used by
the linux distros regardless.
If they are unmaintained, why not replace them with a pointer to some
script that is maintained?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane wrote:
(Personally, I use scripts based on start-scripts/osx/ for a number of
services on my own machines, so if there's something wrong with them
I'd definitely like to know what it is.)
What kind of "based on"? I mean, are there some changes of yours that
could be applied to the contrib script?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Josh Berkus <josh@agliodbs.com> writes:
Tom,
(Personally, I use scripts based on start-scripts/osx/ for a number of
services on my own machines, so if there's something wrong with them
I'd definitely like to know what it is.)
I quote:
"# What to use to start up the postmaster (we do NOT use pg_ctl for this,
# as it adds no value and can cause the postmaster to misrecognize a stale
# lock file)
DAEMON="$prefix/bin/postmaster"
And? That statement was and remains perfectly correct. We don't use
pg_ctl to start the postmaster in the Linux initscripts, either.
regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes:
Tom Lane wrote:
(Personally, I use scripts based on start-scripts/osx/ for a number of
services on my own machines, so if there's something wrong with them
I'd definitely like to know what it is.)
What kind of "based on"? I mean, are there some changes of yours that
could be applied to the contrib script?
No, I've just modified them to start other services (bind, sendmail, ...).
I'm sure OSX startup scripts are documented elsewhere, but this is a
resource right under my nose, so I used it.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote:
Josh Berkus <josh@agliodbs.com> writes:
we do NOT use pg_ctl for [postmaster start], as it adds no value
and can cause the postmaster to misrecognize a stale lock file
And? That statement was and remains perfectly correct.
Is this mentioned in the documentation somewhere that I've missed?
I'm curious what the issues are, and why we can solve it in a bash
script but not pg_ctl.
-Kevin
Tom,
"# What to use to start up the postmaster (we do NOT use pg_ctl for this,
# as it adds no value and can cause the postmaster to misrecognize a stale
# lock file)
DAEMON="$prefix/bin/postmaster"And? That statement was and remains perfectly correct. We don't use
pg_ctl to start the postmaster in the Linux initscripts, either.
Then WTF do we have pg_ctl, and why do our docs tell people to use it?
--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com
Josh Berkus wrote:
Tom,
"# What to use to start up the postmaster (we do NOT use pg_ctl for this,
# as it adds no value and can cause the postmaster to misrecognize a stale
# lock file)
DAEMON="$prefix/bin/postmaster"And? That statement was and remains perfectly correct. We don't use
pg_ctl to start the postmaster in the Linux initscripts, either.Then WTF do we have pg_ctl, and why do our docs tell people to use it?
I assume on boot we _know_ there can't be another postmaster running,
while normal pg_ctl does not know that. Perhaps we just need to state
that in the comment.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
we do NOT use pg_ctl for [postmaster start], as it adds no value
and can cause the postmaster to misrecognize a stale lock file
And? That statement was and remains perfectly correct.
Is this mentioned in the documentation somewhere that I've missed?
I'm curious what the issues are, and why we can solve it in a bash
script but not pg_ctl.
It's been covered repeatedly in the archives, but I'm not sure if it's
in the docs anywhere. The problem is that after a system crash and
reboot, an old postmaster.pid file might be left behind. The postmaster
can only safely remove this lock file if it is *certain* that it doesn't
represent another live postmaster process. Otherwise it is honor-bound
to commit hara-kiri instead of starting up. It can tell whether or not
the PID in the file belongs to a live process and whether that process
belongs to the postgres userid (by attempting kill(PID, 0) and seeing
what it gets). If not, it can remove the file with a clear conscience.
However, because of the way that Unix startup works, it is very likely
that successive system boots will assign nearly (but not necessarily
exactly) the same PID that the postmaster had on the previous cycle.
So there's a high probability of a false positive from this test.
If the PID matches our own exactly, we can discount it as a false
positive. If it matches our parent's exactly, we can also discount it
(knowing that a postmaster would never launch another postmaster
directly, and being able to get the parent's PID via getppid()).
But further up the chain, we're out of luck, because there is no
"get grandparent pid" operation in Unix.
What this all leads to is that it's safe to launch a postmaster from
an init script via something like
su - postgres sh -c "postmaster ..."
The postmaster's parent process is a shell belonging to postgres,
which it can discount via getppid(), and all further-up ancestors
belong to root, so we can discount them via the kill test. So a
false PID match cannot lead to failing to start. (You still have to
be a bit careful about the form of the shell command, or there might
be an intermediate postgres-owned shell process.)
On the other hand, if you do
su - postgres sh -c "pg_ctl ..."
then the postmaster's parent process is pg_ctl, and its grandparent
is a postgres-owned shell process, and it cannot tell that
postgres-owned shell process apart from a genuine conflicting
postmaster. So a chance match of the shell process's PID to what is in
the leftover postmaster.pid file will force it to refuse to start.
And that chance match is not a low probability --- in my experience
it's one in ten or worse, in a reasonably stable system environment.
You can imagine various workarounds involving having pg_ctl pass down
its parent's PID, but you'll still get screwed if the initscript author
is careless about how many levels of postgres-owned shell process there
are. The long and the short of it is that it's best to not use pg_ctl.
As mentioned, it doesn't buy much of anything for an initscript anyway.
These considerations don't apply to ordinary hand launching of the
postmaster, for the primary reason that the chance of a false PID match
is several orders of magnitude smaller when you're talking about a
manual restart --- the likely postmaster PID now ranges over the whole
PID space instead of being within a few counts of the same thing. So we
don't need to discourage people from using pg_ctl for ordinary restarts.
The whole thing is really only a problem for initscript authors (who all
know about it by now ;-))
regards, tom lane
On Aug 19, 2009, at 2:03 PM, Tom Lane wrote:
These considerations don't apply to ordinary hand launching of the
postmaster, for the primary reason that the chance of a false PID
match
is several orders of magnitude smaller when you're talking about a
manual restart --- the likely postmaster PID now ranges over the whole
PID space instead of being within a few counts of the same thing.
So we
don't need to discourage people from using pg_ctl for ordinary
restarts.
The whole thing is really only a problem for initscript authors (who
all
know about it by now ;-))
Nice summary, Tom. Do the distro packagers know this, though? Do we
have notes for distro packagers somewhere?
Best,
David
Should we add a comment to the startup scripts linking this email?
http://archives.postgresql.org/message-id/28922.1250715832@sss.pgh.pa.us
---------------------------------------------------------------------------
Tom Lane wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
we do NOT use pg_ctl for [postmaster start], as it adds no value
and can cause the postmaster to misrecognize a stale lock fileAnd? That statement was and remains perfectly correct.
Is this mentioned in the documentation somewhere that I've missed?
I'm curious what the issues are, and why we can solve it in a bash
script but not pg_ctl.It's been covered repeatedly in the archives, but I'm not sure if it's
in the docs anywhere. The problem is that after a system crash and
reboot, an old postmaster.pid file might be left behind. The postmaster
can only safely remove this lock file if it is *certain* that it doesn't
represent another live postmaster process. Otherwise it is honor-bound
to commit hara-kiri instead of starting up. It can tell whether or not
the PID in the file belongs to a live process and whether that process
belongs to the postgres userid (by attempting kill(PID, 0) and seeing
what it gets). If not, it can remove the file with a clear conscience.
However, because of the way that Unix startup works, it is very likely
that successive system boots will assign nearly (but not necessarily
exactly) the same PID that the postmaster had on the previous cycle.
So there's a high probability of a false positive from this test.
If the PID matches our own exactly, we can discount it as a false
positive. If it matches our parent's exactly, we can also discount it
(knowing that a postmaster would never launch another postmaster
directly, and being able to get the parent's PID via getppid()).
But further up the chain, we're out of luck, because there is no
"get grandparent pid" operation in Unix.What this all leads to is that it's safe to launch a postmaster from
an init script via something like
su - postgres sh -c "postmaster ..."
The postmaster's parent process is a shell belonging to postgres,
which it can discount via getppid(), and all further-up ancestors
belong to root, so we can discount them via the kill test. So a
false PID match cannot lead to failing to start. (You still have to
be a bit careful about the form of the shell command, or there might
be an intermediate postgres-owned shell process.)On the other hand, if you do
su - postgres sh -c "pg_ctl ..."
then the postmaster's parent process is pg_ctl, and its grandparent
is a postgres-owned shell process, and it cannot tell that
postgres-owned shell process apart from a genuine conflicting
postmaster. So a chance match of the shell process's PID to what is in
the leftover postmaster.pid file will force it to refuse to start.
And that chance match is not a low probability --- in my experience
it's one in ten or worse, in a reasonably stable system environment.You can imagine various workarounds involving having pg_ctl pass down
its parent's PID, but you'll still get screwed if the initscript author
is careless about how many levels of postgres-owned shell process there
are. The long and the short of it is that it's best to not use pg_ctl.
As mentioned, it doesn't buy much of anything for an initscript anyway.These considerations don't apply to ordinary hand launching of the
postmaster, for the primary reason that the chance of a false PID match
is several orders of magnitude smaller when you're talking about a
manual restart --- the likely postmaster PID now ranges over the whole
PID space instead of being within a few counts of the same thing. So we
don't need to discourage people from using pg_ctl for ordinary restarts.
The whole thing is really only a problem for initscript authors (who all
know about it by now ;-))regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
"David E. Wheeler" <david@kineticode.com> writes:
Nice summary, Tom. Do the distro packagers know this, though?
All the active ones I know of learned it the hard way, or were paying
attention when someone else did. Still, it wouldn't be a bad thing
for us to document it somewhere.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote:
The problem is that after a system crash and reboot, an old
postmaster.pid file might be left behind. The postmaster can only
safely remove this lock file if it is *certain* that it doesn't
represent another live postmaster process. Otherwise it is honor-
bound to commit hara-kiri instead of starting up. It can tell
whether or not the PID in the file belongs to a live process and
whether that process belongs to the postgres userid (by attempting
kill(PID, 0) and seeing what it gets). If not, it can remove the
file with a clear conscience.
Right -- we did run into this in spades when our backup server,
running dozens of instances of PostgreSQL in "warm standby" to confirm
the integrity of the files received, crashed hard. I wasn't sure if
this was the problem being addressed. One obvious solution, which we
now rigorously observe, is to use a different OS user for each
PostgreSQL instance. I assume that pg_ctl is safe in such an
environment?
The long and the short of it is that it's best to not use pg_ctl.
As mentioned, it doesn't buy much of anything for an initscript
anyway.
It must buy something in our environment, because our attempts to use
the sample script with minimal modification were problematic.
Unfortunately I forget the details, but our problems vanished when we
switched to pg_ctl. (Well, except for that one unfortunate episode
mentioned above.)
The whole thing is really only a problem for initscript authors (who
all know about it by now ;-))
Well, one of them (at least) didn't quite understand the whole issue
until receiving your email. Thanks for the clear description.
-Kevin
On Wed, Aug 19, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
What this all leads to is that it's safe to launch a postmaster from
an init script via something like
su - postgres sh -c "postmaster ..."
Surely you don't want "-"? If you run postgres's .profile etc. then
random user customization for the postgres user could interfere with
your startup process.
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Right -- we did run into this in spades when our backup server,
running dozens of instances of PostgreSQL in "warm standby" to confirm
the integrity of the files received, crashed hard. I wasn't sure if
this was the problem being addressed. One obvious solution, which we
now rigorously observe, is to use a different OS user for each
PostgreSQL instance. I assume that pg_ctl is safe in such an
environment?
Well, using a different user per instance is a good idea because then
the safety analysis I gave holds rigorously for each instance. It
doesn't get you out of the problem by itself, because the problem as
described can happen with just one instance.
It must buy something in our environment, because our attempts to use
the sample script with minimal modification were problematic.
Unfortunately I forget the details, but our problems vanished when we
switched to pg_ctl. (Well, except for that one unfortunate episode
mentioned above.)
Hmm. As stated, I would expect pg_ctl to make it worse. It would be
interesting to have a closer look at your before-and-after scripts.
regards, tom lane
Greg Stark <gsstark@mit.edu> writes:
On Wed, Aug 19, 2009 at 10:03 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
What this all leads to is that it's safe to launch a postmaster from
an init script via something like
su - postgres sh -c "postmaster ..."
Surely you don't want "-"? If you run postgres's .profile etc. then
random user customization for the postgres user could interfere with
your startup process.
Well, people who put random things into the postgres user's .profile
deserve what they get. But it is useful to be able to customize the
postmaster's PATH and other variables. As far as I know, all the Linux
distros do use "su -" or equivalent, and so do the contrib start-scripts.
regards, tom lane