Improve checks for GUC recovery_target_xid
Hackers,
Currently check_recovery_target_xid() converts invalid values to 0. So,
for example, the following configuration added to postgresql.conf
followed by a startup:
recovery_target_xid = 'bogus'
recovery_target_xid = '1.1'
recovery_target_xid = '0'
... does not generate an error but recovery does not complete. There are
many values that can prevent recovery from completing but we should at
least catch obvious misconfiguration by the user.
The origin of the problem is that we do not perform a range check in the
GUC value passed-in for recovery_target_xid. This commit improves the
situation by using adding end checking to strtou64() and by providing
stricter range checks. Some test cases are added for the cases of an
incorrect or a lower-bound timeline value, checking the sanity of the
reports based on the contents of the server logs.
Also add a comment that truncation of the input value is expected since
users will generally be using the output from pg_current_xact_id() (or
similar) to set recovery_target_xid (just as our tests do).
This patch was promised in [1]/messages/by-id/cf04b7c6-7774-4ffb-86f5-ca85462d5fd6@pgbackrest.org -- here at last!
Regards,
-David
[1]: /messages/by-id/cf04b7c6-7774-4ffb-86f5-ca85462d5fd6@pgbackrest.org
/messages/by-id/cf04b7c6-7774-4ffb-86f5-ca85462d5fd6@pgbackrest.org
Attachments:
recovery-target-xid-v1.patchtext/plain; charset=UTF-8; name=recovery-target-xid-v1.patchDownload+54-3
Hi David,
The approach LGTM and checked the previous patch too. I have a few things to add.
The following grammar can be changed by adding "without epoch must be greater than or equal to %u"
+ GUC_check_errdetail("\"%s\" without epoch must greater than or equal to %u.",
+ "recovery_target_xid",
+ FirstNormalTransactionId);
Secondly,
The comment on the lower-bound XID test says # Timeline target out of min range — should be # XID target out of min range.
+# Timeline target out of min range
+$node_standby->append_conf('postgresql.conf',
+ "recovery_target_xid = '0'");
+
When it comes to *endp validations I suppose the validation passes when we provide recovery_target_xid = '-1'. This passes the endp validation and FirstNormalTransactionId checks. Is it a valid approach to allow negative values to this GUC ?
When -1 is provided the following checks allow them to be a valid GUC.
+ /*
+ * This cast will remove the epoch, if any
+ */
+ xid = (TransactionId) strtou64(*newval, &endp, 0);
+
+ if (*endp != '\0' || errno == EINVAL || errno == ERANGE)
+ {
+ GUC_check_errdetail("\"%s\" is not a valid number.",
+ "recovery_target_xid");
+ return false;
+ }
Regards.
On Fri, Feb 20, 2026 at 2:42 PM David Steele <david@pgbackrest.org> wrote:
Hackers,
Currently check_recovery_target_xid() converts invalid values to 0. So,
for example, the following configuration added to postgresql.conf
followed by a startup:recovery_target_xid = 'bogus'
recovery_target_xid = '1.1'
recovery_target_xid = '0'... does not generate an error but recovery does not complete. There are
many values that can prevent recovery from completing but we should at
least catch obvious misconfiguration by the user.
+1
The origin of the problem is that we do not perform a range check in the
GUC value passed-in for recovery_target_xid. This commit improves the
situation by using adding end checking to strtou64() and by providing
stricter range checks. Some test cases are added for the cases of an
incorrect or a lower-bound timeline value, checking the sanity of the
reports based on the contents of the server logs.Also add a comment that truncation of the input value is expected since
users will generally be using the output from pg_current_xact_id() (or
similar) to set recovery_target_xid (just as our tests do).This patch was promised in [1] -- here at last!
Thanks for the patch!
+ GUC_check_errdetail("\"%s\" without epoch must greater than or equal to %u.",
"must greater" shiould be "must be greater"?
"without epoch" seems not necessary to me.
+ /*
+ * This cast will remove the epoch, if any
+ */
+ xid = (TransactionId) strtou64(*newval, &endp, 0);
Would it be better to use strtouint32_strict() instead of strtou64()?
That would allow us to detect invalid XID values larger than 2^32 and
report an error, similar to what pg_resetwal -x does.
Regards,
--
Fujii Masao
Thanks for having a look at this!
On 2/26/26 14:20, Hüseyin Demir wrote:
The following grammar can be changed by adding "without epoch must be greater than or equal to %u" + GUC_check_errdetail("\"%s\" without epoch must greater than or equal to %u.", + "recovery_target_xid", + FirstNormalTransactionId);
Oops - fixed!
The comment on the lower-bound XID test says # Timeline target out of min range — should be # XID target out of min range.
I have fixed this and made the comments more consistent overall.
When it comes to *endp validations I suppose the validation passes when we provide recovery_target_xid = '-1'. This passes the endp validation and FirstNormalTransactionId checks. Is it a valid approach to allow negative values to this GUC ?
When -1 is provided the following checks allow them to be a valid GUC.
Yeah, -1 should not be allowed here. I've updated the code to error on
negative numbers but probably we should import strtou64_strict from the
front end code or use strtou32_strict, though that needs to be discussed
separately.
Thanks,
-David
Attachments:
recovery-target-xid-v2.patchtext/plain; charset=UTF-8; name=recovery-target-xid-v2.patchDownload+63-3
On 2/27/26 08:12, Fujii Masao wrote:
On Fri, Feb 20, 2026 at 2:42 PM David Steele <david@pgbackrest.org> wrote:
+ GUC_check_errdetail("\"%s\" without epoch must greater than or equal to %u.",
"must greater" shiould be "must be greater"?
Fixed in v2 attached to the prior email.
"without epoch" seems not necessary to me.
I guess that depends on whether or not we error if the epoch is present,
see below.
+ /* + * This cast will remove the epoch, if any + */ + xid = (TransactionId) strtou64(*newval, &endp, 0);Would it be better to use strtouint32_strict() instead of strtou64()?
That would allow us to detect invalid XID values larger than 2^32 and
report an error, similar to what pg_resetwal -x does.
This was my first instinct, but it causes our integration tests to fail
because pg_current_xact_id() returns the xid with epoch. You can fix
this by casting pg_current_xact_id()::xid but this seems like a pretty
big change in usage. I'm OK with it but we'd definitely need to update
the documentation to match.
What do you think?
Regards
-David
On Wed, Mar 4, 2026 at 2:11 PM David Steele <david@pgbackrest.org> wrote:
On 2/27/26 08:12, Fujii Masao wrote:
On Fri, Feb 20, 2026 at 2:42 PM David Steele <david@pgbackrest.org> wrote:
+ GUC_check_errdetail("\"%s\" without epoch must greater than or equal to %u.",
"must greater" shiould be "must be greater"?
Fixed in v2 attached to the prior email.
"without epoch" seems not necessary to me.
I guess that depends on whether or not we error if the epoch is present,
see below.+ /* + * This cast will remove the epoch, if any + */ + xid = (TransactionId) strtou64(*newval, &endp, 0);Would it be better to use strtouint32_strict() instead of strtou64()?
That would allow us to detect invalid XID values larger than 2^32 and
report an error, similar to what pg_resetwal -x does.This was my first instinct, but it causes our integration tests to fail
because pg_current_xact_id() returns the xid with epoch. You can fix
this by casting pg_current_xact_id()::xid but this seems like a pretty
big change in usage. I'm OK with it but we'd definitely need to update
the documentation to match.What do you think?
I see your point, and I'm fine with the patch.
My earlier comment was based on my misunderstanding. I thought that only
32-bit transaction IDs were allowed for recovery_target_xid, and therefore
values larger than 2^32 should be rejected.
However, recovery_target_xid currently accepts both 32-bit XIDs and 64-bit
full transaction IDs (epoch + 32-bit XID). When a 64-bit value is specified,
only the 32-bit XID portion is used as the recovery target.
Given this behavior, your change seems to make sense.
Regarding the regression test, if the purpose is to verify the GUC hook
for recovery_target_xid, it might be simpler to test whether
"ALTER SYSTEM SET recovery_target_xid TO ..." succeeds or fails as expected
as follows, rather than starting the server with that setting. That said,
since recovery_target_timeline is already tested in a similar way, I understand
why you followed the same pattern here. So I'm ok with the current approach.
my ($result, $stdout, $stderr) = $node_primary->psql('postgres',
"ALTER SYSTEM SET recovery_target_xid TO 'bogus'");
like(
$stderr,
qr/is not a valid number/,
"invalid recovery_target_xid (bogus value)");
If we think it's better to use ALTER SYSTEM SET for testing invalid
recovery_target_xxx settings to keep the regression tests simpler,
we can revisit this later and address it in a separate patch.
Regards,
--
Fujii Masao
On Thu, Mar 5, 2026 at 12:41 AM Fujii Masao <masao.fujii@gmail.com> wrote:
My earlier comment was based on my misunderstanding. I thought that only
32-bit transaction IDs were allowed for recovery_target_xid, and therefore
values larger than 2^32 should be rejected.However, recovery_target_xid currently accepts both 32-bit XIDs and 64-bit
full transaction IDs (epoch + 32-bit XID). When a 64-bit value is specified,
only the 32-bit XID portion is used as the recovery target.
Given this behavior, your change seems to make sense.
I'm tempted to clarify this behavior by adding something like
the following text to the description of recovery_target_xid
in config.sgml...:
-------------------------------
The value can be specified as either a 32-bit transaction ID or a 64-bit
transaction ID (consisting of an epoch and a 32-bit ID), such as the value
returned by pg_current_xact_id(). When a 64-bit transaction ID is provided,
only its 32-bit transaction ID portion is used as the recovery target.
For example, the values 4294968296 (epoch 1) and 8589935592 (epoch 2)
both refer to the same 32-bit transaction ID, 1000.
The effective transaction ID (the 32-bit portion) must be greater than
or equal to 3.
-------------------------------
Regards,
--
Fujii Masao
On 3/4/26 22:41, Fujii Masao wrote:
Regarding the regression test, if the purpose is to verify the GUC hook
for recovery_target_xid, it might be simpler to test whether
"ALTER SYSTEM SET recovery_target_xid TO ..." succeeds or fails as expected
as follows, rather than starting the server with that setting. That said,
since recovery_target_timeline is already tested in a similar way, I understand
why you followed the same pattern here. So I'm ok with the current approach.
I wrote the tests for recovery_target_timeline but I was not too
satisfied with them because starting Postgres is fairly expensive.
my ($result, $stdout, $stderr) = $node_primary->psql('postgres',
"ALTER SYSTEM SET recovery_target_xid TO 'bogus'");
like(
$stderr,
qr/is not a valid number/,
"invalid recovery_target_xid (bogus value)");If we think it's better to use ALTER SYSTEM SET for testing invalid
recovery_target_xxx settings to keep the regression tests simpler,
we can revisit this later and address it in a separate patch.
I've updated the patch to do it this way. Not only is it faster but you
get a better message when the expected value is incorrect.
I can update the tests for recovery_target_timeline in a separate patch.
Regards,
-David
Attachments:
recovery-target-xid-v3.patchtext/plain; charset=UTF-8; name=recovery-target-xid-v3.patchDownload+71-3
On 3/4/26 23:19, Fujii Masao wrote:
On Thu, Mar 5, 2026 at 12:41 AM Fujii Masao <masao.fujii@gmail.com> wrote:
My earlier comment was based on my misunderstanding. I thought that only
32-bit transaction IDs were allowed for recovery_target_xid, and therefore
values larger than 2^32 should be rejected.However, recovery_target_xid currently accepts both 32-bit XIDs and 64-bit
full transaction IDs (epoch + 32-bit XID). When a 64-bit value is specified,
only the 32-bit XID portion is used as the recovery target.
Given this behavior, your change seems to make sense.I'm tempted to clarify this behavior by adding something like
the following text to the description of recovery_target_xid
in config.sgml...:-------------------------------
The value can be specified as either a 32-bit transaction ID or a 64-bit
transaction ID (consisting of an epoch and a 32-bit ID), such as the value
returned by pg_current_xact_id(). When a 64-bit transaction ID is provided,
only its 32-bit transaction ID portion is used as the recovery target.
For example, the values 4294968296 (epoch 1) and 8589935592 (epoch 2)
both refer to the same 32-bit transaction ID, 1000.The effective transaction ID (the 32-bit portion) must be greater than
or equal to 3.
-------------------------------
+1. I added this to the v3 patch.
Regards,
-David
On Thu, Mar 05, 2026 at 03:40:44AM +0000, David Steele wrote:
I wrote the tests for recovery_target_timeline but I was not too satisfied
with them because starting Postgres is fairly expensive.
+# Invalid recovery_target_xid tests
+$node_standby = PostgreSQL::Test::Cluster->new('standby_10');
+$node_standby->init_from_backup($node_primary, 'my_backup',
+ has_restoring => 1);
+$node_standby->start;
+
+my ($result, $stdout, $stderr) = $node_primary->psql('postgres',
+ "ALTER SYSTEM SET recovery_target_xid TO 'bogus'");
+like(
+ $stderr,
+ qr/is not a valid number/,
+ "invalid recovery_target_xid (bogus value)");
Smart move to rely on ALTER SYSTEM to check how the GUC callback is
reacting on incorrect input values. Why do you need to create and
start a new standby if it is not used, then?
--
Michael
On 3/5/26 11:03, Michael Paquier wrote:
On Thu, Mar 05, 2026 at 03:40:44AM +0000, David Steele wrote:
I wrote the tests for recovery_target_timeline but I was not too satisfied
with them because starting Postgres is fairly expensive.+# Invalid recovery_target_xid tests +$node_standby = PostgreSQL::Test::Cluster->new('standby_10'); +$node_standby->init_from_backup($node_primary, 'my_backup', + has_restoring => 1); +$node_standby->start; + +my ($result, $stdout, $stderr) = $node_primary->psql('postgres', + "ALTER SYSTEM SET recovery_target_xid TO 'bogus'"); +like( + $stderr, + qr/is not a valid number/, + "invalid recovery_target_xid (bogus value)");Smart move to rely on ALTER SYSTEM to check how the GUC callback is
reacting on incorrect input values. Why do you need to create and
start a new standby if it is not used, then?
The prior standby is not running because of the invalid config. I
figured it was better to start clean but when I update the
recovery_target_timeline tests I was planning to use the same standby
for all the new tests.
Regards,
-David
On Thu, Mar 5, 2026 at 1:21 PM David Steele <david@pgbackrest.org> wrote:
The prior standby is not running because of the invalid config. I
figured it was better to start clean but when I update the
recovery_target_timeline tests I was planning to use the same standby
for all the new tests.
Alternatively, we can use $node_primary, since ALTER SYSTEM SET with
invalid recovery_target_timeline or recovery_target_xid does not
affect the primary.
Regards,
--
Fujii Masao
On 3/5/26 12:03, Fujii Masao wrote:
On Thu, Mar 5, 2026 at 1:21 PM David Steele <david@pgbackrest.org> wrote:
The prior standby is not running because of the invalid config. I
figured it was better to start clean but when I update the
recovery_target_timeline tests I was planning to use the same standby
for all the new tests.Alternatively, we can use $node_primary, since ALTER SYSTEM SET with
invalid recovery_target_timeline or recovery_target_xid does not
affect the primary.
Well, as it turns out I was using the primary after all because I copied
your example and forgot to update the host. Seems weird to set these
GUCs on the primary but as long as we get the expected errors I don't
suppose it matters.
Regards,
-David
Attachments:
recovery-target-xid-v4.patchtext/plain; charset=UTF-8; name=recovery-target-xid-v4.patchDownload+66-3
On Thu, Mar 5, 2026 at 5:15 PM David Steele <david@pgbackrest.org> wrote:
On 3/5/26 12:03, Fujii Masao wrote:
On Thu, Mar 5, 2026 at 1:21 PM David Steele <david@pgbackrest.org> wrote:
The prior standby is not running because of the invalid config. I
figured it was better to start clean but when I update the
recovery_target_timeline tests I was planning to use the same standby
for all the new tests.Alternatively, we can use $node_primary, since ALTER SYSTEM SET with
invalid recovery_target_timeline or recovery_target_xid does not
affect the primary.Well, as it turns out I was using the primary after all because I copied
your example and forgot to update the host. Seems weird to set these
GUCs on the primary but as long as we get the expected errors I don't
suppose it matters.
Thanks for updating the patch! I've pushed the patch.
Regards,
--
Fujii Masao
On 3/5/26 19:42, Fujii Masao wrote:
On Thu, Mar 5, 2026 at 5:15 PM David Steele <david@pgbackrest.org> wrote:
On 3/5/26 12:03, Fujii Masao wrote:
On Thu, Mar 5, 2026 at 1:21 PM David Steele <david@pgbackrest.org> wrote:
The prior standby is not running because of the invalid config. I
figured it was better to start clean but when I update the
recovery_target_timeline tests I was planning to use the same standby
for all the new tests.Alternatively, we can use $node_primary, since ALTER SYSTEM SET with
invalid recovery_target_timeline or recovery_target_xid does not
affect the primary.Well, as it turns out I was using the primary after all because I copied
your example and forgot to update the host. Seems weird to set these
GUCs on the primary but as long as we get the expected errors I don't
suppose it matters.Thanks for updating the patch! I've pushed the patch.
Excellent, thank you!
Attached are the test changes for recovery_target_timeline. I can start
a new thread and add it to the next CF if you like, but since it is just
test changes maybe we can fast track it.
Regards,
-David
Attachments:
recovery-target-timeline-v1.patchtext/plain; charset=UTF-8; name=recovery-target-timeline-v1.patchDownload+20-49
On Thu, Mar 5, 2026 at 10:32 PM David Steele <david@pgbackrest.org> wrote:
On 3/5/26 19:42, Fujii Masao wrote:
On Thu, Mar 5, 2026 at 5:15 PM David Steele <david@pgbackrest.org> wrote:
On 3/5/26 12:03, Fujii Masao wrote:
On Thu, Mar 5, 2026 at 1:21 PM David Steele <david@pgbackrest.org> wrote:
The prior standby is not running because of the invalid config. I
figured it was better to start clean but when I update the
recovery_target_timeline tests I was planning to use the same standby
for all the new tests.Alternatively, we can use $node_primary, since ALTER SYSTEM SET with
invalid recovery_target_timeline or recovery_target_xid does not
affect the primary.Well, as it turns out I was using the primary after all because I copied
your example and forgot to update the host. Seems weird to set these
GUCs on the primary but as long as we get the expected errors I don't
suppose it matters.Thanks for updating the patch! I've pushed the patch.
Excellent, thank you!
Attached are the test changes for recovery_target_timeline. I can start
a new thread and add it to the next CF if you like, but since it is just
test changes maybe we can fast track it.
Yes, let's discuss and review the patch in this thread.
Thanks for the patch! It looks good to me. Barring any objections, I
will commit it.
Regards,
--
Fujii Masao
On Fri, Mar 06, 2026 at 12:04:00AM +0900, Fujii Masao wrote:
Thanks for the patch! It looks good to me. Barring any objections, I
will commit it.
Thanks.
--
Michael
On Fri, Mar 6, 2026 at 3:15 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Mar 06, 2026 at 12:04:00AM +0900, Fujii Masao wrote:
Thanks for the patch! It looks good to me. Barring any objections, I
will commit it.Thanks.
I've pushed the patch. Thanks!
Regards,
--
Fujii Masao
On 3/6/26 14:05, Fujii Masao wrote:
On Fri, Mar 6, 2026 at 3:15 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Mar 06, 2026 at 12:04:00AM +0900, Fujii Masao wrote:
Thanks for the patch! It looks good to me. Barring any objections, I
will commit it.Thanks.
I've pushed the patch. Thanks!
Thank you and great idea on ALTER SYSTEM. I've been hesitant to add more
tests in this area because they are so expensive but now I feel much
better about it. But that's the last for this CF since there is more
important stuff to be done.
Regards,
-David