Immediate shutdown during recovery

Started by Fujii Masaoabout 17 years ago4 messages
#1Fujii Masao
masao.fujii@gmail.com

Hi,

The immediate shutdown (pg_ctl -m i stop) might not be able to
kill the startup process during archive recovery. It's because
the startup process calls system() which ignores SIGQUIT for
executing the restore_command. So, only the startup process
might survive the immediate shutdown and continue redoing up
to the end. Is this desirable behavior? This sounds odd for me.

In order to prevent the surviving, I think that the startup process
should check whether postmaster is still alive periodically. This
idea is already adopted in the archiver process which also calls
system() for executing archive_command.

What is your opinion?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Fujii Masao (#1)
Re: Immediate shutdown during recovery

Hi,

On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

The immediate shutdown (pg_ctl -m i stop) might not be able to
kill the startup process during archive recovery. It's because
the startup process calls system() which ignores SIGQUIT for
executing the restore_command. So, only the startup process
might survive the immediate shutdown and continue redoing up
to the end. Is this desirable behavior? This sounds odd for me.

In RestoreArchivedFile(), there is the following code as the safeguard
against the termination of restore_command by signal. But the
safeguard might not work if restore_command defines its own signal
handler for SIGQUIT like pg_standby.

signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

ereport(signaled ? FATAL : DEBUG2,
(errmsg("could not restore file \"%s\" from archive: return code %d",
xlogfname, rc)));

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

#3Simon Riggs
simon@2ndQuadrant.com
In reply to: Fujii Masao (#2)
Re: Immediate shutdown during recovery

On Fri, 2008-11-28 at 19:53 +0900, Fujii Masao wrote:

Hi,

On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

The immediate shutdown (pg_ctl -m i stop) might not be able to
kill the startup process during archive recovery. It's because
the startup process calls system() which ignores SIGQUIT for
executing the restore_command. So, only the startup process
might survive the immediate shutdown and continue redoing up
to the end. Is this desirable behavior? This sounds odd for me.

In RestoreArchivedFile(), there is the following code as the safeguard
against the termination of restore_command by signal. But the
safeguard might not work if restore_command defines its own signal
handler for SIGQUIT like pg_standby.

signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

ereport(signaled ? FATAL : DEBUG2,
(errmsg("could not restore file \"%s\" from archive: return code %d",
xlogfname, rc)));

Agree there is an existing problem.

Suggest we fix it after the main patches are committed.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

#4Fujii Masao
masao.fujii@gmail.com
In reply to: Simon Riggs (#3)
Re: Immediate shutdown during recovery

Hello,

On Sat, Nov 29, 2008 at 12:40 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, 2008-11-28 at 19:53 +0900, Fujii Masao wrote:

Hi,

On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao.fujii@gmail.com> wrote:

Hi,

The immediate shutdown (pg_ctl -m i stop) might not be able to
kill the startup process during archive recovery. It's because
the startup process calls system() which ignores SIGQUIT for
executing the restore_command. So, only the startup process
might survive the immediate shutdown and continue redoing up
to the end. Is this desirable behavior? This sounds odd for me.

In RestoreArchivedFile(), there is the following code as the safeguard
against the termination of restore_command by signal. But the
safeguard might not work if restore_command defines its own signal
handler for SIGQUIT like pg_standby.

signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;

ereport(signaled ? FATAL : DEBUG2,
(errmsg("could not restore file \"%s\" from archive: return code %d",
xlogfname, rc)));

Agree there is an existing problem.

Suggest we fix it after the main patches are committed.

OK, thanks.

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center