postmaster.pid

Started by Adrian Maierover 23 years ago23 messagesgeneral
Jump to latest

Happy New Year to everyone (and especially to the PostgreSQL Developers)
--------------------------

My computer shut down unexpectedly because of a short-circuit in
another room. The next time i powered on the computer, PostgreSQL
did not start automatically: the old postmaster.pid was still there.
So, I had to remove that file manually.
( everything was ok: the database did not suffer from this shock ).

Tip Suggestion:
i think that somewhere in the docs the administators
should be advised to delete postmaster.pid automatically at boot
time ( for example at the same time when the files in /tmp are
deleted ).
This way, in case of an unexpected shutdown, postgresql would be
able to restart without any manual intervention.

--
Adrian Maier
(am@fx.ro, 0723.002.097)

#2Patric Bechtel
bechtel@ipcon.de
In reply to: Adrian Maier (#1)
Re: postmaster.pid [Viruschecked]

On Fri, 3 Jan 2003 19:38:49 +0200, am@fx.ro wrote:

Hello,

just my 2 cents:

I discovered many cases in which the .pid file was still there, but
postgres wasn't running anymore. Normally it's save to delete the
pid if the process noted in there is not postgres anymore.
The main cause for the pid file is that NEVER EVER two different
copies of the postmaster start upon the same database cluster.
Anyway, if the aforementioned postmaster isn't running anymore,
the first step is to start postmaster. It will recover from the
former fault will be ready within seconds.
I work for the last 24 months with Postgres, I killed it (kill -9),
I made very crude things with it (taking the harddisks away and such),
but I never had problems it did not recover from. One of our
customers even installed it on notebooks (30pcs), and the users
tend to switch them off, or the batteries are empty, or the
notebooks just go down in doze and never wake up again (Bios prob).
Postgres did a good job on that being a very hard "taker" on that
side, so I think you won't get probs, either.
I once tried SAP-DB, which I torn in pieces within minutes this
way... :-)

<blink><red>no panic</red></blink>

Patric

Show quoted text

i think that somewhere in the docs the administators
should be advised to delete postmaster.pid automatically at boot
time ( for example at the same time when the files in /tmp are
deleted ).
This way, in case of an unexpected shutdown, postgresql would be
able to restart without any manual intervention.

Well, this is a *BAD IDEA*. Suppose, for example, your data is corrupt in
some special way, an due to your removal of the pid file, postmaster tries to
recover the database automatically and probably destroys all or data part of
the data. You would like to have been able to do a filseystem level backup
first...

Ooopss... it seems like i am too optimistic about such situations.

Just wondering: if the database is so heavily damaged that the
postmaster would not be able to recover, are there any chances to
restore anything from a filesystem backup, manually ?

.........

Background:
- our team is working on a project for a small company.
- there will be one server and 3-4 clients
- It is the first time we use PostgreSQL.
- Little space.
- Many people around.
- No UPS (yet).
- unpredictable,untrained,desperate,'crazy' users

Possible problems:
- the users will need time to learn how to use the program
( until then they could do many mistakes, such as turning off the
server before leaving the office - without issueing the proper
shutdown command, but simply pushing the power switch. )
- someone might stumble over the cable and unplug the server by mistake.

Questions:
How resistent is PostgreSQL to such shocks?

Are there alternative solutions to handling unexpected shutdowns
automatically? (except deleting postmaster.pid at boot time)

Best wishes,

Adrian Maier
(am@fx.ro)

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#3Shridhar Daithankar
shridhar_daithankar@persistent.co.in
In reply to: Patric Bechtel (#2)
Re: postmaster.pid [Viruschecked]

On Friday 03 January 2003 09:41 pm, you wrote:

I work for the last 24 months with Postgres, I killed it (kill -9),
I made very crude things with it (taking the harddisks away and such),
but I never had problems it did not recover from. One of our
customers even installed it on notebooks (30pcs), and the users
tend to switch them off, or the batteries are empty, or the
notebooks just go down in doze and never wake up again (Bios prob).
Postgres did a good job on that being a very hard "taker" on that
side, so I think you won't get probs, either.

That is brutal and violent. If I ever release my own software, I would not
like to hear such stories even if it works..;-)

Shridhar

#4Joerg Hessdoerfer
Joerg.Hessdoerfer@sea-gmbh.com
In reply to: Adrian Maier (#1)
Re: postmaster.pid

Hi,

On Friday 03 January 2003 12:54, you wrote:

Happy New Year to everyone (and especially to the PostgreSQL Developers)
--------------------------

My computer shut down unexpectedly because of a short-circuit in
another room. The next time i powered on the computer, PostgreSQL
did not start automatically: the old postmaster.pid was still there.
So, I had to remove that file manually.
( everything was ok: the database did not suffer from this shock ).

Tip Suggestion:
i think that somewhere in the docs the administators
should be advised to delete postmaster.pid automatically at boot
time ( for example at the same time when the files in /tmp are
deleted ).
This way, in case of an unexpected shutdown, postgresql would be
able to restart without any manual intervention.

Well, this is a *BAD IDEA*. Suppose, for example, your data is corrupt in
some special way, an due to your removal of the pid file, postmaster tries to
recover the database automatically and probably destroys all or data part of
the data. You would like to have been able to do a filseystem level backup
first...

Greetings,
Joerg
--
Leading SW developer - S.E.A GmbH
Mail: joerg.hessdoerfer@sea-gmbh.com
WWW: http://www.sea-gmbh.com

#5Joerg Hessdoerfer
Joerg.Hessdoerfer@sea-gmbh.com
In reply to: Joerg Hessdoerfer (#4)
Re: postmaster.pid

Hi,

On Friday 03 January 2003 18:38, you wrote:

i think that somewhere in the docs the administators
should be advised to delete postmaster.pid automatically at boot
time ( for example at the same time when the files in /tmp are
deleted ).
This way, in case of an unexpected shutdown, postgresql would be
able to restart without any manual intervention.

Well, this is a *BAD IDEA*. Suppose, for example, your data is corrupt in
some special way, an due to your removal of the pid file, postmaster
tries to recover the database automatically and probably destroys all or
data part of the data. You would like to have been able to do a
filseystem level backup first...

Ooopss... it seems like i am too optimistic about such situations.

Just wondering: if the database is so heavily damaged that the
postmaster would not be able to recover, are there any chances to
restore anything from a filesystem backup, manually ?

With a lot of work, sometimes... it all depends of how much the data is worth
to (read: when did you do you last backup? ;-)

.........

Background:
- our team is working on a project for a small company.
- there will be one server and 3-4 clients
- It is the first time we use PostgreSQL.
- Little space.
- Many people around.
- No UPS (yet).
- unpredictable,untrained,desperate,'crazy' users

Possible problems:
- the users will need time to learn how to use the program
( until then they could do many mistakes, such as turning off the
server before leaving the office - without issueing the proper
shutdown command, but simply pushing the power switch. )
- someone might stumble over the cable and unplug the server by mistake.

Why should someone switch off a server at all? Except for maintenance or
repair?

Questions:
How resistent is PostgreSQL to such shocks?

It never failed me after a crash. But I don't push it... (I can remember only
3 crashes at all, and these were NOT PostgreSQL's fault!)

Are there alternative solutions to handling unexpected shutdowns
automatically? (except deleting postmaster.pid at boot time)

The best idea for your situation seems to be the advice: make automated
backups as often as possible, to avoid too much data loss. Then you could go
the route of 'delete pid file automatically'.

If a really bad thing happens, you can always restore from tha last backup.

Best wishes,

Adrian Maier
(am@fx.ro)

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

HTH,
Joerg
--
Leading SW developer - S.E.A GmbH
Mail: joerg.hessdoerfer@sea-gmbh.com
WWW: http://www.sea-gmbh.com

#6Jean-Luc Lachance
jllachan@nsd.ca
In reply to: Adrian Maier (#1)
Field with default not being set on copy from.

Hello all,

It seems that a field declared with a default is not being set when
executing copy from with less field then the table has.

Should I use a rule or a trigger to for setting the default value?

I run 7.3.1

JLL

In reply to: Joerg Hessdoerfer (#5)
Re: postmaster.pid

i think that somewhere in the docs the administators
should be advised to delete postmaster.pid automatically at boot
time ( for example at the same time when the files in /tmp are
deleted ).
This way, in case of an unexpected shutdown, postgresql would be
able to restart without any manual intervention.

Well, this is a *BAD IDEA*. Suppose, for example, your data is corrupt in
some special way, an due to your removal of the pid file, postmaster tries to
recover the database automatically and probably destroys all or data part of
the data. You would like to have been able to do a filseystem level backup
first...

Ooopss... it seems like i am too optimistic about such situations.

Just wondering: if the database is so heavily damaged that the
postmaster would not be able to recover, are there any chances to
restore anything from a filesystem backup, manually ?

.........

Background:
- our team is working on a project for a small company.
- there will be one server and 3-4 clients
- It is the first time we use PostgreSQL.
- Little space.
- Many people around.
- No UPS (yet).
- unpredictable,untrained,desperate,'crazy' users

Possible problems:
- the users will need time to learn how to use the program
( until then they could do many mistakes, such as turning off the
server before leaving the office - without issueing the proper
shutdown command, but simply pushing the power switch. )
- someone might stumble over the cable and unplug the server by mistake.

Questions:
How resistent is PostgreSQL to such shocks?

Are there alternative solutions to handling unexpected shutdowns
automatically? (except deleting postmaster.pid at boot time)

Best wishes,

Adrian Maier
(am@fx.ro)

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Adrian Maier (#7)
Re: postmaster.pid

am@fx.ro writes:

i think that somewhere in the docs the administators
should be advised to delete postmaster.pid automatically at boot
time ( for example at the same time when the files in /tmp are
deleted ).
This way, in case of an unexpected shutdown, postgresql would be
able to restart without any manual intervention.

Well, this is a *BAD IDEA*. Suppose, for example, your data is corrupt in
some special way, an due to your removal of the pid file, postmaster tries to
recover the database automatically and probably destroys all or data part of
the data. You would like to have been able to do a filseystem level backup
first...

This is nonsense. Removal of the postmaster.pid file won't make any
difference one way or the other to recoverability of the data.

Ooopss... it seems like i am too optimistic about such situations.

The real risk of having a script that automatically removes the
postmaster.pid file is that the script might get run after the
postmaster has started.

Even then, you're not necessarily hosed; but you no longer have any
protection against accidentally starting a second postmaster in the same
database directory. (Which would be disastrous: the two postmasters
won't know about each other and will make unsynchronized changes in the
database.)

Note also that under most circumstances, a stale postmaster.pid file
should not prevent the postmaster from starting (because it will ignore
the old .pid file if it can see that there is no process with that PID
alive anymore). The case where you lose is only when there is another
process running that by chance has the same PID that was assigned to the
old postmaster on the system's previous uptime cycle. The postmaster
can't tell that such a process isn't really a conflicting postmaster,
so it gives up for safety's sake.

If you can be absolutely certain that your script will *only* get run
early in system boot, then having it remove postmaster.pid is arguably
a reasonable thing to do. (Putting "rm postmaster.pid" into the startup
script for the postmaster itself would not be reasonable, since you
might well use that script to restart the postmaster --- with the rm in
place, you've just fried the interlock against starting two postmasters.)

Whether the benefits outweigh the risks is up to you to decide.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jean-Luc Lachance (#6)
Re: Field with default not being set on copy from.

Jean-Luc Lachance <jllachan@nsd.ca> writes:

It seems that a field declared with a default is not being set when
executing copy from with less field then the table has.

I think you are mistaken. This works in 7.3:

regression=# create table foo (f1 int, f2 text, f3 int default 42);
CREATE TABLE
regression=# \copy foo(f1,f2) from stdin
1 first
2 second
\.
regression=# select * from foo;
f1 | f2 | f3
----+--------+----
1 | first | 42
2 | second | 42
(2 rows)

regards, tom lane

#10Jean-Luc Lachance
jllachan@nsd.ca
In reply to: Adrian Maier (#1)
Re: Field with default not being set on copy from.

Tom,

I am sure you are right.

The last part of my message should have read:

I am running 7.2.3

Time to plan an upgrade...

JLL

Tom Lane wrote:

Show quoted text

Jean-Luc Lachance <jllachan@nsd.ca> writes:

It seems that a field declared with a default is not being set when
executing copy from with less field then the table has.

I think you are mistaken. This works in 7.3:

regression=# create table foo (f1 int, f2 text, f3 int default 42);
CREATE TABLE
regression=# \copy foo(f1,f2) from stdin
1 first
2 second
\.
regression=# select * from foo;
f1 | f2 | f3
----+--------+----
1 | first | 42
2 | second | 42
(2 rows)

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jean-Luc Lachance (#10)
Re: Field with default not being set on copy from.

Jean-Luc Lachance <jllachan@nsd.ca> writes:

The last part of my message should have read:
I am running 7.2.3

Ah. In that case, you are right: COPY fills missing fields with nulls,
not default values. This is fixed in 7.3.

regards, tom lane

#12Dan Langille
dan@langille.org
In reply to: Tom Lane (#8)
Re: postmaster.pid

On Fri, 3 Jan 2003, Tom Lane wrote:

Note also that under most circumstances, a stale postmaster.pid file
should not prevent the postmaster from starting (because it will ignore
the old .pid file if it can see that there is no process with that PID
alive anymore). The case where you lose is only when there is another
process running that by chance has the same PID that was assigned to the
old postmaster on the system's previous uptime cycle. The postmaster
can't tell that such a process isn't really a conflicting postmaster,
so it gives up for safety's sake.

This is a situation which I've often wondered about, for other scripts,
not PostgreSQL. I've not found a happy solution yet.

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Dan Langille (#12)
Re: postmaster.pid

Dan Langille <dan@langille.org> writes:

On Fri, 3 Jan 2003, Tom Lane wrote:

Note also that under most circumstances, a stale postmaster.pid file
should not prevent the postmaster from starting (because it will ignore
the old .pid file if it can see that there is no process with that PID
alive anymore). The case where you lose is only when there is another
process running that by chance has the same PID that was assigned to the
old postmaster on the system's previous uptime cycle. The postmaster
can't tell that such a process isn't really a conflicting postmaster,
so it gives up for safety's sake.

This is a situation which I've often wondered about, for other scripts,
not PostgreSQL. I've not found a happy solution yet.

Yeah, if you search the archives you will find previous discussions of
how the check for a pre-existing postmaster could be made more resistant
to false matches. It seems to be a hard problem to solve in a way
that's both portable and 100% safe (while false positives are annoying,
false negatives are completely not acceptable). AFAIR all the
alternative methods that we've heard about have their own downsides.

It's worth noting that Postgres is not the only program with this issue.
Sendmail, for example, uses the same pidfile lock method, and I have
seen sendmail fail to restart after a system crash because it was fooled
by another process with matching pid.

regards, tom lane

In reply to: Tom Lane (#8)
Re: postmaster.pid

On Fri, Jan 03, 2003 at 04:03:47PM -0500, Tom Lane wrote:

If you can be absolutely certain that your script will *only* get run
early in system boot, then having it remove postmaster.pid is arguably
a reasonable thing to do. (Putting "rm postmaster.pid" into the startup
script for the postmaster itself would not be reasonable, since you
might well use that script to restart the postmaster --- with the rm in
place, you've just fried the interlock against starting two postmasters.)

Whether the benefits outweigh the risks is up to you to decide.

Then, deleting postmaster.pid at boot time (when it is impossible to have
any postmaster running) is an acceptable solution after all,
at least for me.

Thank you all for your replies.

--
Adrian Maier
(am@fx.ro)

#15Kevin Brown
kevin@sysexperts.com
In reply to: Tom Lane (#13)
Re: postmaster.pid

Sorry for jumping in late on this...

Tom Lane wrote:

Dan Langille <dan@langille.org> writes:

On Fri, 3 Jan 2003, Tom Lane wrote:

Note also that under most circumstances, a stale postmaster.pid file
should not prevent the postmaster from starting (because it will ignore
the old .pid file if it can see that there is no process with that PID
alive anymore). The case where you lose is only when there is another
process running that by chance has the same PID that was assigned to the
old postmaster on the system's previous uptime cycle. The postmaster
can't tell that such a process isn't really a conflicting postmaster,
so it gives up for safety's sake.

This is a situation which I've often wondered about, for other scripts,
not PostgreSQL. I've not found a happy solution yet.

Yeah, if you search the archives you will find previous discussions of
how the check for a pre-existing postmaster could be made more resistant
to false matches. It seems to be a hard problem to solve in a way
that's both portable and 100% safe (while false positives are annoying,
false negatives are completely not acceptable). AFAIR all the
alternative methods that we've heard about have their own downsides.

I assume one of those alternatives was for the postmaster to open and
lock a predefined file in $PGDATA (say, postmaster.lock) using fcntl
or flock style locking? File locking is such a basic service that I
can't imagine any of the OSes PostgreSQL currently supports not
supporting it.

PID files are useful for administrative purposes but the various Unix
tricks used to "lock" and serialize access to files (hard links, for
instance) are (or should be!) no longer necessary. Fcntl locking is
specified in POSIX.1 (or so says the Linux fcntl(2) manpage), so
that's what I'd go with by default.

Since only one postmaster can run per $PGDATA directory, it seems
reasonable to have the postmaster obtain an exclusive lock on a file
(via fcntl or flock on Unix platforms) in that directory. No need to
explicitly unlock the file on exit, either: the OS will take care of
that (the OS is broken if it doesn't). If it fails to acquire the
lock, then another postmaster must be running. The only way to
sabotage this mechanism is by deleting the lock file, which is why it
would be desirable to obtain a lock on the $PGDATA directory itself if
possible and reasonable.

Thoughts?

--
Kevin Brown kevin@sysexperts.com

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Brown (#15)
Re: postmaster.pid

Kevin Brown <kevin@sysexperts.com> writes:

Tom Lane wrote:

Yeah, if you search the archives you will find previous discussions of
how the check for a pre-existing postmaster could be made more resistant
to false matches. It seems to be a hard problem to solve in a way
that's both portable and 100% safe (while false positives are annoying,
false negatives are completely not acceptable). AFAIR all the
alternative methods that we've heard about have their own downsides.

I assume one of those alternatives was for the postmaster to open and
lock a predefined file in $PGDATA (say, postmaster.lock) using fcntl
or flock style locking?

Yes, that was discussed. I think the primary objection was that it's
very non-robust if the $PGDATA directory is mounted via NFS. (Quite
a few of us think that if you run a database over NFS, you deserve to
lose ;-( ... but there seem to be more than a few people out there doing
it anyway.)

Also, the fact that you even had to mention two different ways of doing
it is prima facie evidence that there are portability issues...

regards, tom lane

#17Kevin Brown
kevin@sysexperts.com
In reply to: Tom Lane (#16)
Re: postmaster.pid

Tom Lane wrote:

Kevin Brown <kevin@sysexperts.com> writes:

Tom Lane wrote:

Yeah, if you search the archives you will find previous discussions of
how the check for a pre-existing postmaster could be made more resistant
to false matches. It seems to be a hard problem to solve in a way
that's both portable and 100% safe (while false positives are annoying,
false negatives are completely not acceptable). AFAIR all the
alternative methods that we've heard about have their own downsides.

I assume one of those alternatives was for the postmaster to open and
lock a predefined file in $PGDATA (say, postmaster.lock) using fcntl
or flock style locking?

Yes, that was discussed. I think the primary objection was that it's
very non-robust if the $PGDATA directory is mounted via NFS. (Quite
a few of us think that if you run a database over NFS, you deserve to
lose ;-( ... but there seem to be more than a few people out there doing
it anyway.)

Oh, my stomach. I was afraid of that. If necessary you can put the
lock file elsewhere, I suppose...

Also, the fact that you even had to mention two different ways of doing
it is prima facie evidence that there are portability issues...

Well, that doesn't necessarily follow, but even if it did, we have
autoconf, we can very easily select a method as appropriate based on
the results of testing the platform from within configure (I mean,
what else is autoconf really for?). But my bet is that fcntl style
locking is supported on all the platforms PostgreSQL supports because
it's a POSIX.1 standard.

But even if fcntl locking isn't supported on all the platforms
PostgreSQL supports, it should be no harder a problem than which
method of syncing a file to disk to use...

--
Kevin Brown kevin@sysexperts.com

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Brown (#17)
Re: postmaster.pid

Kevin Brown <kevin@sysexperts.com> writes:

Tom Lane wrote:

Also, the fact that you even had to mention two different ways of doing
it is prima facie evidence that there are portability issues...

Well, that doesn't necessarily follow, but even if it did, we have
autoconf, we can very easily select a method as appropriate based on
the results of testing the platform from within configure (I mean,
what else is autoconf really for?).

True. Probably we could combine it with the old logic as a fallback.
Are you volunteering to do the work?

regards, tom lane

#19ahoward
ahoward@fsl.noaa.gov
In reply to: Tom Lane (#16)
Re: postmaster.pid

tgl@sss.pgh.pa.us (Tom Lane) wrote in message news:<29250.1043126140@sss.pgh.pa.us>...

Yes, that was discussed. I think the primary objection was that it's
very non-robust if the $PGDATA directory is mounted via NFS. (Quite
a few of us think that if you run a database over NFS, you deserve to
lose ;-( ... but there seem to be more than a few people out there doing
it anyway.)

can anyone speak with _authority_ on this issue please. i've been
researching it for a week and their seems to be much misunderstanding
surrounding the issue, even the developers of linux nfs seem to
disagree on the semantics of sync/lock issues on nfs!

my understanding is that, using a netapp with nvram and nfs3, if one

* exported the PGDATA filesystem as sync
* mounted the file system as sync

there should not be ANY issues using postgresql against an nfs
filesystem with the possible exception of rpc/lock issues (anyone?
anyone?). furthermore, performance will most likely INCREASE over a
local disk since writes to nvram, even network attached nvram, can be
MUCH fast than writes to, for example, and IDE harddrive.

-a

#20Medi Montaseri
medi.montaseri@intransa.com
In reply to: Dan Langille (#12)
Re: postmaster.pid

I validate my pid by examining /proc/pid , something as simple as

if [ -d `cat postmaster.pid` ]
then
echo postmaster is running
else
echo postmaster is not running
fi

Tom Lane wrote:

Show quoted text

Kevin Brown <kevin@sysexperts.com> writes:

Tom Lane wrote:

Yeah, if you search the archives you will find previous discussions of
how the check for a pre-existing postmaster could be made more resistant
to false matches. It seems to be a hard problem to solve in a way
that's both portable and 100% safe (while false positives are annoying,
false negatives are completely not acceptable). AFAIR all the
alternative methods that we've heard about have their own downsides.

I assume one of those alternatives was for the postmaster to open and
lock a predefined file in $PGDATA (say, postmaster.lock) using fcntl
or flock style locking?

Yes, that was discussed. I think the primary objection was that it's
very non-robust if the $PGDATA directory is mounted via NFS. (Quite
a few of us think that if you run a database over NFS, you deserve to
lose ;-( ... but there seem to be more than a few people out there doing
it anyway.)

Also, the fact that you even had to mention two different ways of doing
it is prima facie evidence that there are portability issues...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#21Kevin Brown
kevin@sysexperts.com
In reply to: Tom Lane (#18)
#22Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Tom Lane (#16)
#23Bruce Momjian
bruce@momjian.us
In reply to: Kevin Brown (#21)