pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

Started by Tom Laneover 10 years ago5 messagescomitters
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Perform an immediate shutdown if the postmaster.pid file is removed.

The postmaster now checks every minute or so (worst case, at most two
minutes) that postmaster.pid is still there and still contains its own PID.
If not, it performs an immediate shutdown, as though it had received
SIGQUIT.

The original goal behind this change was to ensure that failed buildfarm
runs would get fully cleaned up, even if the test scripts had left a
postmaster running, which is not an infrequent occurrence. When the
buildfarm script removes a test postmaster's $PGDATA directory, its next
check on postmaster.pid will fail and cause it to exit. Previously, manual
intervention was often needed to get rid of such orphaned postmasters,
since they'd block new test postmasters from obtaining the expected socket
address.

However, by checking postmaster.pid and not something else, we can provide
additional robustness: manual removal of postmaster.pid is a frequent DBA
mistake, and now we can at least limit the damage that will ensue if a new
postmaster is started while the old one is still alive.

Back-patch to all supported branches, since we won't get the desired
improvement in buildfarm reliability otherwise.

Branch
------
REL9_3_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/31bc563b9be306623c5e9a52816b432945fa6df9

Modified Files
--------------
src/backend/postmaster/postmaster.c | 52 ++++++++++++++++++++------
src/backend/utils/init/miscinit.c | 70 +++++++++++++++++++++++++++++++++++
src/include/miscadmin.h | 1 +
3 files changed, 112 insertions(+), 11 deletions(-)

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#2Thom Brown
thom@linux.com
In reply to: Tom Lane (#1)
Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

On 6 October 2015 at 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Perform an immediate shutdown if the postmaster.pid file is removed.

The postmaster now checks every minute or so (worst case, at most two
minutes) that postmaster.pid is still there and still contains its own PID.
If not, it performs an immediate shutdown, as though it had received
SIGQUIT.

The original goal behind this change was to ensure that failed buildfarm
runs would get fully cleaned up, even if the test scripts had left a
postmaster running, which is not an infrequent occurrence. When the
buildfarm script removes a test postmaster's $PGDATA directory, its next
check on postmaster.pid will fail and cause it to exit. Previously, manual
intervention was often needed to get rid of such orphaned postmasters,
since they'd block new test postmasters from obtaining the expected socket
address.

However, by checking postmaster.pid and not something else, we can provide
additional robustness: manual removal of postmaster.pid is a frequent DBA
mistake, and now we can at least limit the damage that will ensue if a new
postmaster is started while the old one is still alive.

Back-patch to all supported branches, since we won't get the desired
improvement in buildfarm reliability otherwise.

Branch
------
REL9_3_STABLE

Details
-------
http://git.postgresql.org/pg/commitdiff/31bc563b9be306623c5e9a52816b432945fa6df9

Modified Files
--------------
src/backend/postmaster/postmaster.c | 52 ++++++++++++++++++++------
src/backend/utils/init/miscinit.c | 70 +++++++++++++++++++++++++++++++++++
src/include/miscadmin.h | 1 +
3 files changed, 112 insertions(+), 11 deletions(-)

The log contains a misleading output following the removal of the pid file:

2015-10-09 15:39:32 BST [31507]: [4-1] user=,db=,client= LOG: could
not open file "postmaster.pid": No such file or directory
2015-10-09 15:39:32 BST [31507]: [5-1] user=,db=,client= LOG:
performing immediate shutdown because data directory lock file is
invalid
2015-10-09 15:39:32 BST [31507]: [6-1] user=,db=,client= LOG:
received immediate shutdown request
2015-10-09 15:39:32 BST [31556]: [1-1] user=,db=,client= WARNING:
terminating connection because of crash of another server process
2015-10-09 15:39:32 BST [31556]: [2-1] user=,db=,client= DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2015-10-09 15:39:32 BST [31556]: [3-1] user=,db=,client= HINT: In a
moment you should be able to reconnect to the database and repeat your
command.

Is this anything we need to worry about?

--
Thom

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thom Brown (#2)
Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

Thom Brown <thom@linux.com> writes:

The log contains a misleading output following the removal of the pid file:

2015-10-09 15:39:32 BST [31507]: [4-1] user=,db=,client= LOG: could
not open file "postmaster.pid": No such file or directory
2015-10-09 15:39:32 BST [31507]: [5-1] user=,db=,client= LOG:
performing immediate shutdown because data directory lock file is
invalid
2015-10-09 15:39:32 BST [31507]: [6-1] user=,db=,client= LOG:
received immediate shutdown request
2015-10-09 15:39:32 BST [31556]: [1-1] user=,db=,client= WARNING:
terminating connection because of crash of another server process
2015-10-09 15:39:32 BST [31556]: [2-1] user=,db=,client= DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2015-10-09 15:39:32 BST [31556]: [3-1] user=,db=,client= HINT: In a
moment you should be able to reconnect to the database and repeat your
command.

Looks as-expected to me. We're forcing a panic stop.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Tom Lane (#3)
Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

Tom Lane wrote:

Thom Brown <thom@linux.com> writes:

The log contains a misleading output following the removal of the pid file:

2015-10-09 15:39:32 BST [31507]: [4-1] user=,db=,client= LOG: could
not open file "postmaster.pid": No such file or directory
2015-10-09 15:39:32 BST [31507]: [5-1] user=,db=,client= LOG:
performing immediate shutdown because data directory lock file is
invalid
2015-10-09 15:39:32 BST [31507]: [6-1] user=,db=,client= LOG:
received immediate shutdown request
2015-10-09 15:39:32 BST [31556]: [1-1] user=,db=,client= WARNING:
terminating connection because of crash of another server process
2015-10-09 15:39:32 BST [31556]: [2-1] user=,db=,client= DETAIL: The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally
and possibly corrupted shared memory.
2015-10-09 15:39:32 BST [31556]: [3-1] user=,db=,client= HINT: In a
moment you should be able to reconnect to the database and repeat your
command.

Looks as-expected to me. We're forcing a panic stop.

I think he's complaining that the final HINT is misleading.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#4)
Re: pgsql: Perform an immediate shutdown if the postmaster.pid file is remo

Alvaro Herrera <alvherre@2ndquadrant.com> writes:

Tom Lane wrote:

Looks as-expected to me. We're forcing a panic stop.

I think he's complaining that the final HINT is misleading.

Well, all the particular backend knows is that it got SIGQUIT.
Maybe we should rewrite the message text for that entirely, but
that didn't seem in-scope for this patch.

regards, tom lane

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers