postmaster.pid file auto-clean up?
I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically clearing out the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however I cannot find any reference to this anymore.
Was this something that did, in fact, exist at one point, and was pulled?
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically clearing out the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however I cannot find any reference to this anymore.
It's always done that.
We occasionally see startup scripts that "helpfully" remove the .pid
file. They are, without exception, wrong and dangerous. The postmaster
is much more likely to get this right by itself.
regards, tom lane
Is this mechanism documented anywhere (besides source code)?
It looks like PG will only clean it up if there's no other process running at all on the pid listed in the postmaster.pid file, even if any process running on that pid isn't a PG process or there's no server running on the data directory (as per `pg_ctl status`).
On Aug 20 2012, at 1:31 PM, Tom Lane wrote:
Show quoted text
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
I vaguely remember reading in the release notes (around the time 9.x was released) something about it automatically clearing out the postmaster.pid file if it was found to be stale/invalid when starting the the database server, however I cannot find any reference to this anymore.
It's always done that.
We occasionally see startup scripts that "helpfully" remove the .pid
file. They are, without exception, wrong and dangerous. The postmaster
is much more likely to get this right by itself.regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Reply to msg id not found: blahblahblah@blah.com
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
Is this mechanism documented anywhere (besides source code)?
No, not really.
It looks like PG will only clean it up if there's no other process running at all on the pid listed in the postmaster.pid file, even if any process running on that pid isn't a PG process or there's no server running on the data directory (as per `pg_ctl status`).
Not sure what you're looking at, but the above is wrong in at least one
critical detail, namely that there's a process-ownership check via
kill(). There are also checks to ensure no children of the previous
postmaster are still alive. These are not things you want to lightly
bypass: two sets of postmaster children running against the same data
directory *will* result in unrecoverable data corruption.
If you're trying to claim you've seen a false-positive situation, it
would be interesting to hear actual details.
regards, tom lane
On Mon, Aug 20, 2012 at 11:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Sebastien Boisvert <sebastienboisvert@yahoo.com> writes:
Is this mechanism documented anywhere (besides source code)?
No, not really.
It looks like PG will only clean it up if there's no other process
running at all on the pid listed in the postmaster.pid file, even if any
process running on that pid isn't a PG process or there's no server running
on the data directory (as per `pg_ctl status`).Not sure what you're looking at, but the above is wrong in at least one
critical detail, namely that there's a process-ownership check via
kill(). There are also checks to ensure no children of the previous
postmaster are still alive. These are not things you want to lightly
bypass: two sets of postmaster children running against the same data
directory *will* result in unrecoverable data corruption.If you're trying to claim you've seen a false-positive situation, it
would be interesting to hear actual details.
Hello, I work with Seb, and have been investigating this deeper.
It does in fact appear that we are getting false-positives.
When trying to start PG using pg_ctl, I am getting this response:
pg_ctl: another server might be running; trying to start server anyway
2012-08-26 04:46:02.211 GMT [] - FATAL: lock file "postmaster.pid" already
exists
2012-08-26 04:46:02.211 GMT [] - HINT: Is another postmaster (PID 8574)
running in data directory "/Users/mclark/Library/Application
Support/com.marketcircle.Daylite4/StorageDebug.dlpdb/Data/9_1"?
pg_ctl: this data directory appears to be running a pre-existing postmaster
pg_ctl: could not start server
Examine the log output.
PID 8574 is actually iTunes, not PG, and PG was cleanly brought down on
it's last run, there are no children processes running.
Seb figured out how to contrive this situation.
Run PG, copy the pid file, stop pg, copy the copied pid file back to the
data dir and edit it, replacing the old PID with that of another running
process.
At first we thought our software was to blame, because it checks the PID
from PG's pid file to see if a process is running with that PID, and if
none are found then we call pg_ctl, otherwise we just continue launching
our software and trying to connect to PG.
I just added an additional check to see if the process name for the PID is
postgres, and if not then try to start PG with pg_ctl, thinking it would
figure it out and remove the pid file as it would if there was no process
running with that pid.
Is this considered a bug? Should PG do a similar check on the process
name, or has the way we contrived this doing something unexpected?
Thanks,
Michael.
On 08/25/12 9:56 PM, Michael Clark wrote:
PID 8574 is actually iTunes, not PG, and PG was cleanly brought down
on it's last run, there are no children processes running.
when postgres is cleanly brought down, the postgresql.pid file is
supposed to be removed. that file contains the PID that pg_ctl uses.
could you be running a pg_ctl from a different version, in the wrong
directory ?
--
john r pierce N 37, W 122
santa cruz ca mid-left coast
Michael Clark <codingninja@gmail.com> writes:
It does in fact appear that we are getting false-positives.
When trying to start PG using pg_ctl, I am getting this response:
pg_ctl: another server might be running; trying to start server anyway
2012-08-26 04:46:02.211 GMT [] - FATAL: lock file "postmaster.pid" already
exists
2012-08-26 04:46:02.211 GMT [] - HINT: Is another postmaster (PID 8574)
running in data directory "/Users/mclark/Library/Application
Support/com.marketcircle.Daylite4/StorageDebug.dlpdb/Data/9_1"?
PID 8574 is actually iTunes, not PG,
iTunes? What is that doing running under PG's userid?
If you mean that you're launching PG under some random user's UID, you
might want to think about giving it a dedicated UID instead, so as to
improve the selectivity of the same-UID check. This would also give
a good deal more protection to the database files.
and PG was cleanly brought down on
it's last run, there are no children processes running.
As John pointed out, if PG was in fact stopped cleanly, the pid file
would not be there.
The symptoms you've described so far seem consistent with the idea that
PG was not stopped "cleanly", but rather by kill -9 on the postmaster
(with the child processes exiting either on their own, or as soon as
they noticed they were orphans). This is not recommended practice.
Seb figured out how to contrive this situation.
Run PG, copy the pid file, stop pg, copy the copied pid file back to the
data dir and edit it, replacing the old PID with that of another running
process.
You're kidding, right? If you intentionally set out to break the
postmaster interlock, you will doubtless be able to do that, and would
still be able to break any other algorithm we might devise. Let's
confine this discussion to scenarios that could arise without
intentional interference.
regards, tom lane
On Sun, Aug 26, 2012 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Clark <codingninja@gmail.com> writes:
PID 8574 is actually iTunes, not PG,
iTunes? What is that doing running under PG's userid?
We back our client application with PG, each OSX user gets their own
instance of PG.
It runs as that OSX user.
Seb figured out how to contrive this situation.
Run PG, copy the pid file, stop pg, copy the copied pid file back to the
data dir and edit it, replacing the old PID with that of another running
process.You're kidding, right? If you intentionally set out to break the
postmaster interlock, you will doubtless be able to do that, and would
still be able to break any other algorithm we might devise. Let's
confine this discussion to scenarios that could arise without
intentional interference.
We were presented with a problem we didn't understand.
We set out to try and figure out how we could replicate the problem, for
debugging purposes.
We managed to do so to see how our application behaves, and to see how PG
behaves.
In the wild this scenario has arisen without intentional interference. In
debugging, yes, we contrived the situation to replicate the behaviour.
Mind you, we may be using PG in an environment that isn't advisable.
We just started this discussion to learn and understand, and to see if this
is a situation that would be expected to be handled.
Thanks,
Michael.
On 26 Aug 2012, at 17:21, Michael Clark wrote:
On Sun, Aug 26, 2012 at 10:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Clark <codingninja@gmail.com> writes:
PID 8574 is actually iTunes, not PG,
iTunes? What is that doing running under PG's userid?
We back our client application with PG,
each OSX user gets their own instance of PG.
Are you certain that's necessary? It's generally a better idea to run a single PG server with a database for each user. Having multiple copies running has its use-cases, but the necessity is quite uncommon.
You could compare what you're doing to giving every user their own copy of OS X. There are situations in which you'd want that, but generally its considered a bad idea.
You'd never have even thought to do that if you were, for example, using Oracle for the database. That's a hugely expensive database license for every user on the system, while you really only need one.
It runs as that OSX user.
Seb figured out how to contrive this situation.
Run PG, copy the pid file, stop pg, copy the copied pid file back to the
data dir and edit it, replacing the old PID with that of another running
process.You're kidding, right? If you intentionally set out to break the
postmaster interlock, you will doubtless be able to do that, and would
still be able to break any other algorithm we might devise. Let's
confine this discussion to scenarios that could arise without
intentional interference.We were presented with a problem we didn't understand.
We set out to try and figure out how we could replicate the problem, for debugging purposes.
We managed to do so to see how our application behaves, and to see how PG behaves.In the wild this scenario has arisen without intentional interference. In debugging, yes, we contrived the situation to replicate the behaviour. Mind you, we may be using PG in an environment that isn't advisable.
What you replicated is not what happens when your problem occurs. Processes don't do things like that with each others PID files.
What's probably happening in your case is that there's a conflict with another copy of Postgres running; perhaps its running under the same user-id twice (or more) or on the same port?
My suggestion would be to get rid of those extra copies of PG and just run one instance.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.
On Sun, Aug 26, 2012 at 1:25 PM, Alban Hertroys <haramrae@gmail.com> wrote:
We back our client application with PG,
each OSX user gets their own instance of PG.
Are you certain that's necessary?
It was a decision made, weighing various trade-offs, 4 years ago now.
In the wild this scenario has arisen without intentional interference.
In debugging, yes, we contrived the situation to replicate the behaviour.
Mind you, we may be using PG in an environment that isn't advisable.What you replicated is not what happens when your problem occurs.
Processes don't do things like that with each others PID files.
That is true.
But the system does recycle pids, especially after a reboot.
I appreciate all the feedback and input from everyone who responded.
Thank you!! You have answered our questions, and it gives us food for
thought.
Michael.