[PATCH] Don't call ereport(ERROR) from recovery target GUC assign hooks
The five recovery target GUC assign hooks -- assign_recovery_target,
assign_recovery_target_lsn, assign_recovery_target_name,
assign_recovery_target_time, and assign_recovery_target_xid -- all call
error_multiple_recovery_targets() when a second conflicting target is
detected, which invokes ereport(ERROR). The GUC README
(src/backend/utils/misc/README) is explicit: "There is no provision for
a failure result code. assign_hooks should never fail." Raising an
error from an assign hook leaves guc.c's internal state inconsistent
before the abort, because the abort handling path was not designed to
run mid-assign.
The code acknowledges this with an XXX comment that has been there for
years:
XXX this code is broken by design. Throwing an error from a GUC
assign hook breaks fundamental assumptions of guc.c. So long as
all the variables for which this can happen are PGC_POSTMASTER, the
consequences are limited, since we'd just abort postmaster startup
anyway. Nonetheless it's likely that we have odd behaviors such as
unexpected GUC ordering dependencies.
The "limited consequences" argument is true enough that this hasn't
caused visible failures in practice, but fixing a known contract
violation seems worthwhile.
The fix is to remove the conflict check from all five assign hooks and
detect conflicts in validateRecoveryParameters() instead. That
function already runs after all GUCs have been loaded (called from
InitWalRecovery() in the startup process), so it can safely read each
GUC's current value via GetConfigOption() and count how many are
non-empty. If more than one is set, it reports FATAL, consistent with
the other validation errors already in that function.
This changes when the error fires: it now happens in the startup process
rather than in the postmaster's ProcessConfigFile. The outcome is the
same (server does not start), but guc.c's state is no longer disturbed.
There is one secondary behavioral change: when recovery is not actually
requested (ArchiveRecoveryRequested is false), validateRecoveryParameters
returns early and never checks for conflicts, so conflicting recovery
target settings are silently ignored. The old code would reject them
even then, since assign hooks fire unconditionally during ProcessConfigFile.
I think the new behavior is arguably more correct -- those GUCs have no
effect when recovery is not requested, so there is no reason to treat
their values as an error. The existing TAP test in 003_recovery_targets.pl
already covers the conflict-in-recovery case; this patch adds a test for
the new behavior (conflicting targets accepted outside recovery).
Patch attached.
Attachments:
0001-Don-t-call-ereport-ERROR-from-recovery-target-GUC-as.patchapplication/octet-stream; name=0001-Don-t-call-ereport-ERROR-from-recovery-target-GUC-as.patchDownload+66-44
Hi Shin,
Thanks for taking this one on. The XXX comment has been there for a
long time and fixing a known contract violation is worthwhile. The
mechanical part of the patch, moving the conflict check from the
assign hooks into validateRecoveryParameters(), looks right to me.
A few things worth discussing before this gets a committer's eye:
1. Behavioral change when recovery is not requested.
The patch's commit message notes that conflicting recovery target
settings are now silently accepted when ArchiveRecoveryRequested is
false, and argues this is "arguably more correct" because those GUCs
have no effect outside recovery. I am not sure I agree. Today a
misconfigured postgresql.conf (say, both recovery_target_time and
recovery_target_xid present) is caught immediately at postmaster
start. After the patch, that same misconfiguration boots
successfully, and the operator only finds out later when they add
recovery.signal and the startup process FATALs. That is a real
downgrade in error-detection timing.
Is it feasible to keep the early-detection behavior without violating
the assign-hook contract? One option: emit a WARNING (not ERROR)
from the assign hook, or defer the check to a post-config-file pass
(GUC_check_hook chain, or a one-shot check in PostmasterMain before
the startup process is forked). I do not have a strong opinion on
the mechanism, but I think the user-visible behavior (conflicting
recovery_target_* settings are caught at server start) is worth
preserving if possible.
If preserving that is not feasible, I think the commit message should
flag the timing change more prominently, since it is a user-visible
change to when postgres refuses to start.
2. Test coverage of the conflict-in-recovery path.
The existing 003_recovery_targets.pl has a test (not modified by
this patch, as far as I can tell) that exercises the multiple-targets
rejection. With v1, that rejection fires from
validateRecoveryParameters() rather than from an assign hook, so the
error message source changes even if the text is identical. CFBot
is green, so the existing regex must still match, but it would be
good to confirm which test covers this and to verify the test's
assertions actually validate the new FATAL path, not just "server
failed to start for some reason".
3. errdetail wording.
errdetail("At most one of \"recovery_target\", ..., "
"\"recovery_target_xid\" may be set.")
The error-message style guide (doc/src/sgml/error-style-guide.sgml)
says to avoid "may" because it reads as permission rather than
ability. Suggest "can be set" instead. Trivial, but since we are
already touching this message.
4. Test cleanup fragility.
The new test case does:
$node_primary->append_conf('postgresql.conf', "recovery_target_name = ...
recovery_target_time = ...");
$node_primary->start;
...
ALTER SYSTEM RESET recovery_target_name;
ALTER SYSTEM RESET recovery_target_time;
$node_primary->restart;
append_conf permanently extends postgresql.conf. The ALTER SYSTEM
RESET writes to postgresql.auto.conf which takes precedence, so the
stale postgresql.conf lines are effectively masked for subsequent
test steps, but they remain in the file, and any later test that
relies on a predictable postgresql.conf content (for example a test
that inspects ConfigFileVar or does its own append) could be
confused. Using ALTER SYSTEM SET for the setup would be cleaner,
or using a dedicated temporary cluster for this one case. Also,
the ok() predicate `defined $primary_pid && $primary_pid ne ''` is
redundant. safe_psql would have died earlier on start failure.
5. Nit: block comment.
The new block inside validateRecoveryParameters() could benefit from
noting that ArchiveRecoveryRequested is guaranteed true at this
point (because of the early return above it), so a reader does not
have to scroll up to confirm.
Nothing here is a blocker. I think the overall direction is right.
The behavioral change question in item 1 is the main design decision
I would want the author's thinking on.
Thanks,
Greg
On Mon, Apr 13, 2026 at 5:21 PM JoongHyuk Shin <sjh910805@gmail.com> wrote:
The "limited consequences" argument is true enough that this hasn't
caused visible failures in practice, but fixing a known contract
violation seems worthwhile.
+1
The fix is to remove the conflict check from all five assign hooks and
detect conflicts in validateRecoveryParameters() instead. That
function already runs after all GUCs have been loaded (called from
InitWalRecovery() in the startup process), so it can safely read each
GUC's current value via GetConfigOption() and count how many are
non-empty. If more than one is set, it reports FATAL, consistent with
the other validation errors already in that function.
In the master, when the following two recovery targets are specified,
the recovery target assign hook detects that multiple targets were given
and reports an error. With the patch, however, the same settings do not
raise an error, recoveryTarget is set to RECOVERY_TARGET_UNSET, and
recovery unexpectedly proceeds with no target. Could this be a bug
in the patch?
recovery_target_xid = '9999'
recovery_target_time = ''
Regards,
--
Fujii Masao