Weird failure with latches in curculio on v15

andres@anarazel.de

over 3 years ago

In reply to: Michael Paquier (#1)

Re: Weird failure with latches in curculio on v15

Hi,

On 2023-02-01 10:53:17 +0900, Michael Paquier wrote:

While browsing the buildfarm, I have noticed this failure on curcilio:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-02-01%2001%3A05%3A17

The test that has reported a failure is the check on the archive
module callback:
# Failed test 'check shutdown callback of shell archive module'
# at t/020_archive_status.pl line 248.
# Looks like you failed 1 test of 17.
[02:28:06] t/020_archive_status.pl ..............
Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/17 subtests

Looking closer, this is a result of an assertion failure in the latch
code:
2023-02-01 02:28:05.615 CET [6961:8] LOG: received fast shutdown request
2023-02-01 02:28:05.615 CET [6961:9] LOG: aborting any active transactions
2023-02-01 02:28:05.616 CET [30681:9] LOG: process 30681 releasing ProcSignal slot 33, but it contains 0
TRAP: FailedAssertion("latch->owner_pid == MyProcPid", File: "latch.c", Line: 451, PID: 30681)

Given the ProcSignal LOG message before it, I don't think this is about
latches.

The information available in standby2.log shows that 30681 is the
startup process. I am not sure what all that means, yet.

Thoughts or comments welcome.

Perhaps a wild write overwriting shared memory state?

Greetings,

Andres Freund

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-02-01%2001:05:17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-01-11%2011:16:40
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2022-11-22%2012:19:21
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-11-17%2021:47:02

thomas.munro@gmail.com

over 3 years ago

In reply to: Andres Freund (#2)

Re: Weird failure with latches in curculio on v15

My database off assertion failures has four like that, all 15 and master:

It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
handler which is allowed to call that while in_restore_command is
true.

Here's a different one, some kind of latch corruption in the WAL
writer under 017_shm:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2022-01-20%2016:26:54
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2022-01-20%2016:26:54

andres@anarazel.de

over 3 years ago

In reply to: Thomas Munro (#3)

Re: Weird failure with latches in curculio on v15

Hi,

On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:

My database off assertion failures has four like that, all 15 and master:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-02-01%2001:05:17
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2023-01-11%2011:16:40
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2022-11-22%2012:19:21
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-11-17%2021:47:02

It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
handler which is allowed to call that while in_restore_command is
true.

Ugh, no wonder we're getting crashes. This whole business seems bogus as
hell.

RestoreArchivedFile():
...
/*
* Check signals before restore command and reset afterwards.
*/
PreRestoreCommand();

/*
* Copy xlog from archival storage to XLOGDIR
*/
ret = shell_restore(xlogfname, xlogpath, lastRestartPointFname);

PostRestoreCommand();

/* SIGTERM: set flag to abort redo and exit */
static void
StartupProcShutdownHandler(SIGNAL_ARGS)
{
int save_errno = errno;

if (in_restore_command)
proc_exit(1);
else
shutdown_requested = true;
WakeupRecovery();

errno = save_errno;
}

Where PreRestoreCommand()/PostRestoreCommand() set in_restore_command.

There's *a lot* of stuff happening inside shell_restore() that's not
compatible with doing proc_exit() inside a signal handler. We're
allocating memory! Interact with stdout.

There's also the fact that system() isn't signal safe, but that's a much
less likely problematic issue.

This appears to have gotten worse over a sequence of commits. The
following commits each added something betwen PreRestoreCommand() and
PostRestoreCommand().

commit 1b06d7bac901e5fd20bba597188bae2882bf954b
Author: Fujii Masao <fujii@postgresql.org>
Date: 2021-11-22 10:28:21 +0900

Report wait events for local shell commands like archive_command.

added pgstat_report_wait_start/end. Unlikely to cause big issues, but
not good.

commit 7fed801135bae14d63b11ee4a10f6083767046d8
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: 2022-08-29 13:55:38 -0400

Clean up inconsistent use of fflush().

Made it a bit worse by adding an fflush(). That certainly seems like it
could cause hangs.

commit 9a740f81eb02e04179d78f3df2ce671276c27b07
Author: Michael Paquier <michael@paquier.xyz>
Date: 2023-01-16 16:31:43 +0900

Refactor code in charge of running shell-based recovery commands

which completely broke the mechanism. We suddenly run the entirety of
shell_restore(), which does pallocs etc to build the string passed to
system, and raises errors, all within a section in which a signal
handler can invoke proc_exit(). That's just completely broken.

Sorry, but particularly in this area, you got to be a heck of a lot more
careful.

I don't see a choice but to revert the recent changes. They need a
fairly large rewrite.

Greetings,

Andres Freund

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#4)

Re: Weird failure with latches in curculio on v15

Andres Freund <andres@anarazel.de> writes:

On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:

It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
handler which is allowed to call that while in_restore_command is
true.

Ugh, no wonder we're getting crashes. This whole business seems bogus as
hell.

Indeed :-(

I don't see a choice but to revert the recent changes. They need a
fairly large rewrite.

9a740f81e clearly made things a lot worse, but it wasn't great
before. Can we see a way forward to removing the problem entirely?

The fundamental issue is that we have no good way to break out
of system(), and I think the original idea was that
in_restore_command would be set *only* for the duration of the
system() call. That's clearly been lost sight of completely,
but maybe as a stopgap we could try to get back to that.

regards, tom lane

nathandbossart@gmail.com

over 3 years ago

In reply to: Tom Lane (#5)

Re: Weird failure with latches in curculio on v15

On Wed, Feb 01, 2023 at 10:12:26AM -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:

It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
handler which is allowed to call that while in_restore_command is
true.

Ugh, no wonder we're getting crashes. This whole business seems bogus as
hell.

Indeed :-(

Ugh. My bad.

The fundamental issue is that we have no good way to break out
of system(), and I think the original idea was that
in_restore_command would be set *only* for the duration of the
system() call. That's clearly been lost sight of completely,
but maybe as a stopgap we could try to get back to that.

+1. I'll produce some patches.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#5)

Re: Weird failure with latches in curculio on v15

Hi,

On 2023-02-01 10:12:26 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-02-01 16:21:16 +1300, Thomas Munro wrote:

It's always in proc_exit() in StartupProcShutdownHandler(), a SIGTERM
handler which is allowed to call that while in_restore_command is
true.

Ugh, no wonder we're getting crashes. This whole business seems bogus as
hell.

Indeed :-(

I don't see a choice but to revert the recent changes. They need a
fairly large rewrite.

9a740f81e clearly made things a lot worse, but it wasn't great
before. Can we see a way forward to removing the problem entirely?

Yea, I think we can - we should stop relying on system(). If we instead
run the command properly as a subprocess, we don't need to do bad things
in the signal handler anymore.

The fundamental issue is that we have no good way to break out
of system(), and I think the original idea was that
in_restore_command would be set *only* for the duration of the
system() call. That's clearly been lost sight of completely,
but maybe as a stopgap we could try to get back to that.

We could push the functions setting in_restore_command down into
ExecuteRecoveryCommand(). But I don't think that'd end up necessarily
being right either - we'd now use the mechanism in places we previously
didn't (cleanup/end commands).

And there's just plenty other stuff in the 14bdb3f13de 9a740f81eb0 that
doesn't look right:
- We now have two places open-coding what BuildRestoreCommand did

- I'm doubtful that the new shell_* functions are the base for a good
API to abstract restoring files

- the error message for a failed restore command seems to have gotten
worse:
could not restore file \"%s\" from archive: %s"
->
"%s \"%s\": %s", commandName, command

- shell_* imo is not a good namespace for something called from xlog.c,
xlogarchive.c. I realize the intention is that shell_archive.c is
going to be its own "restore module", but for now it imo looks odd

- The comment moved out of RestoreArchivedFile() doesn't seems less
useful at its new location

- explanation of why we use GetOldestRestartPoint() is halfway lost

My name is listed as the first Reviewed-by, but I certainly haven't done
any meaningful review of these patches. I just replied to top-level
email proposing "recovery modules".

Greetings,

Andres Freund

robertmhaas@gmail.com

over 3 years ago

In reply to: Andres Freund (#7)

Re: Weird failure with latches in curculio on v15

On Wed, Feb 1, 2023 at 11:58 AM Andres Freund <andres@anarazel.de> wrote:

9a740f81e clearly made things a lot worse, but it wasn't great
before. Can we see a way forward to removing the problem entirely?

Yea, I think we can - we should stop relying on system(). If we instead
run the command properly as a subprocess, we don't need to do bad things
in the signal handler anymore.

I like the idea of not relying on system(). In most respects, doing
fork() + exec() ourselves seems superior. We can control where the
output goes, what we do while waiting, etc. But system() runs the
command through the shell, so that for example you don't have to
invent your own way of splitting a string into words to be passed to
exec[whatever](). I've never understood how you're supposed to get
that behavior other than by calling system().

--
Robert Haas
EDB: http://www.enterprisedb.com

andres@anarazel.de

over 3 years ago

In reply to: Robert Haas (#8)

Re: Weird failure with latches in curculio on v15

Hi,

On 2023-02-01 12:08:24 -0500, Robert Haas wrote:

On Wed, Feb 1, 2023 at 11:58 AM Andres Freund <andres@anarazel.de> wrote:

9a740f81e clearly made things a lot worse, but it wasn't great
before. Can we see a way forward to removing the problem entirely?

Yea, I think we can - we should stop relying on system(). If we instead
run the command properly as a subprocess, we don't need to do bad things
in the signal handler anymore.

I like the idea of not relying on system(). In most respects, doing
fork() + exec() ourselves seems superior. We can control where the
output goes, what we do while waiting, etc. But system() runs the
command through the shell, so that for example you don't have to
invent your own way of splitting a string into words to be passed to
exec[whatever](). I've never understood how you're supposed to get
that behavior other than by calling system().

We could just exec the shell in the forked process, using -c to invoke
the command. That should give us pretty much the same efficiency as
system(), with a lot more control.

I think we already do that somewhere. <dig>. Ah, yes, spawn_process() in
pg_regress.c. I suspect we couldn't use exec for restore_command etc,
as I think it's not uncommon to use && in the command.

Perhaps we should abstract the relevant pieces of spawn_process() that
into something more general? The OS specifics are sufficiently
complicated that I don't think it'd be good to have multiple copies.

It's too bad that we have the history of passing things to shell,
otherwise we could define a common argument handling of the GUC and just
execve ourselves, but that ship has sailed.

Greetings,

Andres Freund

#10

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#9)

Re: Weird failure with latches in curculio on v15

Andres Freund <andres@anarazel.de> writes:

On 2023-02-01 12:08:24 -0500, Robert Haas wrote:

I like the idea of not relying on system(). In most respects, doing
fork() + exec() ourselves seems superior. We can control where the
output goes, what we do while waiting, etc. But system() runs the
command through the shell, so that for example you don't have to
invent your own way of splitting a string into words to be passed to
exec[whatever](). I've never understood how you're supposed to get
that behavior other than by calling system().

We could just exec the shell in the forked process, using -c to invoke
the command. That should give us pretty much the same efficiency as
system(), with a lot more control.

The main thing that system() brings to the table is platform-specific
knowledge of where the shell is. I'm not very sure that we want to
wire in "/bin/sh".

regards, tom lane

#11

nathandbossart@gmail.com

over 3 years ago

In reply to: Andres Freund (#7)

Re: Weird failure with latches in curculio on v15

On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote:

On 2023-02-01 10:12:26 -0500, Tom Lane wrote:

The fundamental issue is that we have no good way to break out
of system(), and I think the original idea was that
in_restore_command would be set *only* for the duration of the
system() call. That's clearly been lost sight of completely,
but maybe as a stopgap we could try to get back to that.

We could push the functions setting in_restore_command down into
ExecuteRecoveryCommand(). But I don't think that'd end up necessarily
being right either - we'd now use the mechanism in places we previously
didn't (cleanup/end commands).

Right, we'd only want to set it for restore_command. I think that's
doable.

And there's just plenty other stuff in the 14bdb3f13de 9a740f81eb0 that
doesn't look right:
- We now have two places open-coding what BuildRestoreCommand did

This was done because BuildRestoreCommand() had become a thin wrapper
around replace_percent_placeholders(). I can add it back if you don't
think this was the right decision.

- I'm doubtful that the new shell_* functions are the base for a good
API to abstract restoring files

Why?

- the error message for a failed restore command seems to have gotten
worse:
could not restore file \"%s\" from archive: %s"
->
"%s \"%s\": %s", commandName, command

Okay, I'll work on improving this message.

- shell_* imo is not a good namespace for something called from xlog.c,
xlogarchive.c. I realize the intention is that shell_archive.c is
going to be its own "restore module", but for now it imo looks odd

What do you propose instead? FWIW this should go away with recovery
modules. This is just an intermediate state to simplify those patches.

- The comment moved out of RestoreArchivedFile() doesn't seems less
useful at its new location

Where do you think it should go?

- explanation of why we use GetOldestRestartPoint() is halfway lost

Okay, I'll work on adding more context here.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#12

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#10)

Re: Weird failure with latches in curculio on v15

Hi,

On 2023-02-01 12:27:19 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

On 2023-02-01 12:08:24 -0500, Robert Haas wrote:

I like the idea of not relying on system(). In most respects, doing
fork() + exec() ourselves seems superior. We can control where the
output goes, what we do while waiting, etc. But system() runs the
command through the shell, so that for example you don't have to
invent your own way of splitting a string into words to be passed to
exec[whatever](). I've never understood how you're supposed to get
that behavior other than by calling system().

We could just exec the shell in the forked process, using -c to invoke
the command. That should give us pretty much the same efficiency as
system(), with a lot more control.

The main thing that system() brings to the table is platform-specific
knowledge of where the shell is. I'm not very sure that we want to
wire in "/bin/sh".

We seem to be doing OK with using SHELLPROG in pg_regress, which just
seems to be using $SHELL from the build environment.

Greetings,

Andres Freund

#13

nathandbossart@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#11)

Re: Weird failure with latches in curculio on v15

On Wed, Feb 01, 2023 at 09:58:06AM -0800, Nathan Bossart wrote:

On Wed, Feb 01, 2023 at 08:58:01AM -0800, Andres Freund wrote:

On 2023-02-01 10:12:26 -0500, Tom Lane wrote:

The fundamental issue is that we have no good way to break out
of system(), and I think the original idea was that
in_restore_command would be set *only* for the duration of the
system() call. That's clearly been lost sight of completely,
but maybe as a stopgap we could try to get back to that.

We could push the functions setting in_restore_command down into
ExecuteRecoveryCommand(). But I don't think that'd end up necessarily
being right either - we'd now use the mechanism in places we previously
didn't (cleanup/end commands).

Right, we'd only want to set it for restore_command. I think that's
doable.

Here is a first draft for the proposed stopgap fix. If we want to proceed
with this, I can provide patches for the back branches.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#14

michael@paquier.xyz

over 3 years ago

In reply to: Andres Freund (#12)

Re: Weird failure with latches in curculio on v15

On Wed, Feb 01, 2023 at 10:18:27AM -0800, Andres Freund wrote:

On 2023-02-01 12:27:19 -0500, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:
The main thing that system() brings to the table is platform-specific
knowledge of where the shell is. I'm not very sure that we want to
wire in "/bin/sh".

We seem to be doing OK with using SHELLPROG in pg_regress, which just
seems to be using $SHELL from the build environment.

It looks like this had better centralize a bit more of the logic from
pg_regress.c if that were to happen, to avoid more fuzzy logic with
WIN32. That becomes invasive for a back-patch.

By the way, there is something that's itching me a bit here. 9a740f8
has enlarged by a lot the window between PreRestoreCommand() and
PostRestoreCommand(), however curculio has reported a failure on
REL_15_STABLE, where we only manipulate my_wait_event_info while the
flag is on. Or I am getting that right that there is no way out of it
unless we remove the dependency to system() even in the back-branches?
Could there be an extra missing piece here?
--
Michael

#15

michael@paquier.xyz

over 3 years ago

In reply to: Nathan Bossart (#13)

Re: Weird failure with latches in curculio on v15

On Wed, Feb 01, 2023 at 02:35:55PM -0800, Nathan Bossart wrote:

Here is a first draft for the proposed stopgap fix. If we want to proceed
with this, I can provide patches for the back branches.

+	/*
+	 * PreRestoreCommand() is used to tell the SIGTERM handler for the startup
+	 * process that it is okay to proc_exit() right away on SIGTERM.  This is
+	 * done for the duration of the system() call because there isn't a good
+	 * way to break out while it is executing.  Since we might call proc_exit()
+	 * in a signal handler here, it is extremely important that nothing but the
+	 * system() call happens between the calls to PreRestoreCommand() and
+	 * PostRestoreCommand().  Any additional code must go before or after this
+	 * section.
+	 */
+	if (exitOnSigterm)
+		PreRestoreCommand();
+
rc = system(command);
+
+	if (exitOnSigterm)
+		PostRestoreCommand();
+
pgstat_report_wait_end();

Hmm. Isn't that something that we should also document in startup.c
where both routines are defined? If we begin to use
PreRestoreCommand() and PostRestoreCommand() in more code paths in the
future, that could be again an issue. That looks enough to me to
reduce the window back to what it was before 9a740f8, as exitOnSigterm
is only used for restore_command. There is a different approach
possible here: rely more on wait_event_info rather than failOnSignal
and exitOnSigterm to decide which code path should do what.

Andres Freund wrote:

- the error message for a failed restore command seems to have gotten
worse:
could not restore file \"%s\" from archive: %s"
->
"%s \"%s\": %s", commandName, command

IMO, we don't lose any context with this method: the command type and
the command string itself are the bits actually relevant. Perhaps
something like that would be more intuitive? One idea:
"could not execute command for %s: %s", commandName, command

- shell_* imo is not a good namespace for something called from xlog.c,
xlogarchive.c. I realize the intention is that shell_archive.c is
going to be its own "restore module", but for now it imo looks odd

shell_restore.c does not sound that bad to me, FWIW. The parallel
with the archive counterparts is here. My recent history is not that
good when it comes to naming, based on the feedback I received,
though.

And there's just plenty other stuff in the 14bdb3f13de 9a740f81eb0 that
doesn't look right:
- We now have two places open-coding what BuildRestoreCommand did

Yeah, BuildRestoreCommand() was just a small wrapper on top of the new
percentrepl.c, making it rather irrelevant at this stage, IMO. For
the two code paths where it was called.

- The comment moved out of RestoreArchivedFile() doesn't seems less
useful at its new location

We are talking about that:
- /*
- * Remember, we rollforward UNTIL the restore fails so failure here is
- * just part of the process... that makes it difficult to determine
- * whether the restore failed because there isn't an archive to restore,
- * or because the administrator has specified the restore program
- * incorrectly. We have to assume the former.
- *
- * However, if the failure was due to any sort of signal, it's best to
- * punt and abort recovery. (If we "return false" here, upper levels will
- * assume that recovery is complete and start up the database!) It's
- * essential to abort on child SIGINT and SIGQUIT, because per spec
- * system() ignores SIGINT and SIGQUIT while waiting; if we see one of
- * those it's a good bet we should have gotten it too.
- *
- * On SIGTERM, assume we have received a fast shutdown request, and exit
- * cleanly. It's pure chance whether we receive the SIGTERM first, or the
- * child process. If we receive it first, the signal handler will call
- * proc_exit, otherwise we do it here. If we or the child process received
- * SIGTERM for any other reason than a fast shutdown request, postmaster
- * will perform an immediate shutdown when it sees us exiting
- * unexpectedly.
- *
- * We treat hard shell errors such as "command not found" as fatal, too.
- */

The failure processing is stuck within the way we build and handle the
command given down to system(), so keeping this in shell_restore.c (or
whatever name you think would be a better fit) makes sense to me.
Now, thinking a bit more of this, we could just push the description
down to ExecuteRecoveryCommand(), that actually does the work,
adaptinh the comment based on the refactored internals of the
routine.
--
Michael

#16

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Michael Paquier (#15)

Re: Weird failure with latches in curculio on v15

Michael Paquier <michael@paquier.xyz> writes:

Hmm. Isn't that something that we should also document in startup.c
where both routines are defined? If we begin to use
PreRestoreCommand() and PostRestoreCommand() in more code paths in the
future, that could be again an issue.

I was vaguely wondering about removing both of those functions
in favor of an integrated function that does a system() call
with those things before and after it.

regards, tom lane

#17

michael@paquier.xyz

over 3 years ago

In reply to: Tom Lane (#16)

Re: Weird failure with latches in curculio on v15

On Wed, Feb 01, 2023 at 09:34:44PM -0500, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

Hmm. Isn't that something that we should also document in startup.c
where both routines are defined? If we begin to use
PreRestoreCommand() and PostRestoreCommand() in more code paths in the
future, that could be again an issue.

I was vaguely wondering about removing both of those functions
in favor of an integrated function that does a system() call
with those things before and after it.

It seems to me that this is pretty much the same as storing
in_restore_command in shell_restore.c, and that for recovery modules
this comes down to the addition of an extra callback called in
startup.c to check if the flag is up or not. Now the patch is doing
things the opposite way: like on HEAD, store the flag in startup.c but
switch it at will with the routines in startup.c. I find the approach
of the patch a bit more intuitive, TBH, as that makes the interface
simpler for other recovery modules that may want to switch the flag
back-and-forth, and I suspect that there may be cases in recovery
modules where we'd still want to switch the flag, but not necessarily
link it to system().
--
Michael

#18

nathandbossart@gmail.com

over 3 years ago

In reply to: Michael Paquier (#17)

Re: Weird failure with latches in curculio on v15

On Thu, Feb 02, 2023 at 01:24:15PM +0900, Michael Paquier wrote:

On Wed, Feb 01, 2023 at 09:34:44PM -0500, Tom Lane wrote:

I was vaguely wondering about removing both of those functions
in favor of an integrated function that does a system() call
with those things before and after it.

It seems to me that this is pretty much the same as storing
in_restore_command in shell_restore.c, and that for recovery modules
this comes down to the addition of an extra callback called in
startup.c to check if the flag is up or not. Now the patch is doing
things the opposite way: like on HEAD, store the flag in startup.c but
switch it at will with the routines in startup.c. I find the approach
of the patch a bit more intuitive, TBH, as that makes the interface
simpler for other recovery modules that may want to switch the flag
back-and-forth, and I suspect that there may be cases in recovery
modules where we'd still want to switch the flag, but not necessarily
link it to system().

Hm. I don't know if we want to encourage further use of
in_restore_command since it seems to be prone to misuse. Here's a v2 that
demonstrateѕ Tom's idea (bikeshedding on names and comments is welcome). I
personally like this approach a bit more.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#19

robertmhaas@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#18)

Re: Weird failure with latches in curculio on v15

On Thu, Feb 2, 2023 at 3:10 PM Nathan Bossart <nathandbossart@gmail.com> wrote:

Hm. I don't know if we want to encourage further use of
in_restore_command since it seems to be prone to misuse. Here's a v2 that
demonstrateѕ Tom's idea (bikeshedding on names and comments is welcome). I
personally like this approach a bit more.

+       /*
+        * When exitOnSigterm is set and we are in the startup process, use the
+        * special wrapper for system() that enables exiting immediately upon
+        * receiving SIGTERM.  This ensures we can break out of system() if
+        * required.
+        */

This comment, for me, raises more questions than it answers. Why do we
only do this in the startup process?

Also, and this part is not the fault of this patch but a defect of the
pre-existing comments, under what circumstances do we not want to exit
when we get a SIGTERM? It's standard behavior for PostgreSQL backends
to exit when they receive SIGTERM, so the question isn't why we
sometimes exit immediately but why we ever don't. The existing code
calls ExecuteRecoveryCommand with exitOnSigterm true in some cases and
false in other cases, and AFAICS there are zero words of comments
explaining the reasoning.

+       if (exitOnSigterm && MyBackendType == B_STARTUP)
+               rc = RunInterruptibleShellCommand(command);
+       else
+               rc = system(command);

And this looks like pure magic. I'm all in favor of not relying on
system(), but using it under some opaque set of conditions and
otherwise doing something else is not the way. At the very least this
needs to be explained a whole lot better.

--
Robert Haas
EDB: http://www.enterprisedb.com

#20

[0]: /messages/by-id/499047FE.9090407@enterprisedb.com

nathandbossart@gmail.com

over 3 years ago

In reply to: Robert Haas (#19)

Re: Weird failure with latches in curculio on v15

On Thu, Feb 02, 2023 at 04:14:54PM -0500, Robert Haas wrote:

+       /*
+        * When exitOnSigterm is set and we are in the startup process, use the
+        * special wrapper for system() that enables exiting immediately upon
+        * receiving SIGTERM.  This ensures we can break out of system() if
+        * required.
+        */

This comment, for me, raises more questions than it answers. Why do we
only do this in the startup process?

Currently, this functionality only exists in the startup process because it
is only used for restore_command. More below...

Also, and this part is not the fault of this patch but a defect of the
pre-existing comments, under what circumstances do we not want to exit
when we get a SIGTERM? It's standard behavior for PostgreSQL backends
to exit when they receive SIGTERM, so the question isn't why we
sometimes exit immediately but why we ever don't. The existing code
calls ExecuteRecoveryCommand with exitOnSigterm true in some cases and
false in other cases, and AFAICS there are zero words of comments
explaining the reasoning.

I've been digging into the history here. This e-mail seems to have the
most context [0]/messages/by-id/499047FE.9090407@enterprisedb.com. IIUC this was intended to prevent "fast" shutdowns from
escalating to "immediate" shutdowns because the restore command died
unexpectedly. This doesn't apply to archive_cleanup_command because we
don't FATAL if it dies unexpectedly. It seems like this idea should apply
to recovery_end_command, too, but AFAICT it doesn't use the same approach.
My guess is that this hasn't come up because it's less likely that both 1)
recovery_end_command is used and 2) someone initiates shutdown while it is
running.

BTW the relevant commits are cdd46c7 (added SIGTERM handling for
restore_command), 9e403c2 (added recovery_end_command), and c21ac0b (added
what is today called archive_cleanup_command).

+       if (exitOnSigterm && MyBackendType == B_STARTUP)
+               rc = RunInterruptibleShellCommand(command);
+       else
+               rc = system(command);
And this looks like pure magic. I'm all in favor of not relying on
system(), but using it under some opaque set of conditions and
otherwise doing something else is not the way. At the very least this
needs to be explained a whole lot better.

If we applied this exit-on-SIGTERM behavior to recovery_end_command, I
think we could combine failOnSignal and exitOnSigterm into one flag, and
then it might be a little easier to explain what is going on. In any case,
I agree that this deserves a lengthy explanation, which I'll continue to
work on.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#21

nathandbossart@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#20)

#22

nathandbossart@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#21)

#23

andres@anarazel.de

over 3 years ago

In reply to: Michael Paquier (#14)

#24

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#23)

#25

thomas.munro@gmail.com

over 3 years ago

In reply to: Tom Lane (#24)

#26

andres@anarazel.de

over 3 years ago

In reply to: Nathan Bossart (#21)

#27

thomas.munro@gmail.com

over 3 years ago

In reply to: Andres Freund (#26)

#28

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#24)

#29

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Thomas Munro (#27)

#30

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Andres Freund (#28)

#31

andres@anarazel.de

over 3 years ago

In reply to: Thomas Munro (#25)

#32

andres@anarazel.de

over 3 years ago

In reply to: Tom Lane (#30)

#33

thomas.munro@gmail.com

over 3 years ago

In reply to: Andres Freund (#32)

#34

andres@anarazel.de

over 3 years ago

In reply to: Thomas Munro (#33)

#35

nathandbossart@gmail.com

over 3 years ago

In reply to: Andres Freund (#28)

#36

michael@paquier.xyz

over 3 years ago

In reply to: Nathan Bossart (#35)

#37

andres@anarazel.de

over 3 years ago

In reply to: Nathan Bossart (#35)

#38

andres@anarazel.de

over 3 years ago

In reply to: Michael Paquier (#15)

#39

nathandbossart@gmail.com

over 3 years ago

In reply to: Andres Freund (#38)

#40

michael@paquier.xyz

over 3 years ago

In reply to: Nathan Bossart (#39)

#41

andres@anarazel.de

over 3 years ago

In reply to: Nathan Bossart (#39)

#42

nathandbossart@gmail.com

over 3 years ago

In reply to: Michael Paquier (#40)

#43

andres@anarazel.de

over 3 years ago

In reply to: Nathan Bossart (#42)

#44

michael@paquier.xyz

over 3 years ago

In reply to: Michael Paquier (#40)

#45

nathandbossart@gmail.com

over 3 years ago

In reply to: Andres Freund (#43)

#46

andres@anarazel.de

over 3 years ago

In reply to: Nathan Bossart (#45)

#47

nathandbossart@gmail.com

over 3 years ago

In reply to: Andres Freund (#46)

#48

robertmhaas@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#47)

#49

nathandbossart@gmail.com

over 3 years ago

In reply to: Robert Haas (#48)

#50

robertmhaas@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#49)

#51

nathandbossart@gmail.com

over 3 years ago

In reply to: Robert Haas (#50)

#52

michael@paquier.xyz

over 3 years ago

In reply to: Nathan Bossart (#51)

#53

nathandbossart@gmail.com

over 3 years ago

In reply to: Michael Paquier (#52)

#54

robertmhaas@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#53)

#55

tgl@sss.pgh.pa.us

over 3 years ago

In reply to: Robert Haas (#54)

#56

robertmhaas@gmail.com

over 3 years ago

In reply to: Tom Lane (#55)

#57

nathandbossart@gmail.com

over 3 years ago

In reply to: Robert Haas (#56)

#58

andres@anarazel.de

over 3 years ago

In reply to: Robert Haas (#56)

#59

nathandbossart@gmail.com

over 3 years ago

In reply to: Andres Freund (#58)

#60

robertmhaas@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#57)

#61

robertmhaas@gmail.com

over 3 years ago

In reply to: Andres Freund (#58)

#62

andres@anarazel.de

over 3 years ago

In reply to: Robert Haas (#61)

#63

nathandbossart@gmail.com

over 3 years ago

In reply to: Robert Haas (#60)

#64

robertmhaas@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#63)

#65

robertmhaas@gmail.com

over 3 years ago

In reply to: Andres Freund (#62)

#66

andres@anarazel.de

over 3 years ago

In reply to: Robert Haas (#65)

#67

robertmhaas@gmail.com

over 3 years ago

In reply to: Andres Freund (#66)

#68

michael@paquier.xyz

over 3 years ago

In reply to: Robert Haas (#67)

#69

Stephen Frost

sfrost@snowman.net

over 3 years ago

In reply to: Michael Paquier (#68)

#70

michael@paquier.xyz

over 3 years ago

In reply to: Nathan Bossart (#59)

#71

thomas.munro@gmail.com

over 3 years ago

In reply to: Michael Paquier (#70)

#72

nathandbossart@gmail.com

over 3 years ago

In reply to: Michael Paquier (#70)

#73

thomas.munro@gmail.com

over 3 years ago

In reply to: Nathan Bossart (#72)

#74

nathandbossart@gmail.com

over 3 years ago

In reply to: Thomas Munro (#73)

#75

andres@anarazel.de

over 3 years ago

In reply to: Robert Haas (#67)

#76

nathandbossart@gmail.com

about 3 years ago

In reply to: Andres Freund (#75)

#77

Stephen Frost

sfrost@snowman.net

about 3 years ago

In reply to: Nathan Bossart (#76)

#78