Improve verification of recovery_target_timeline GUC.
Hackers,
Currently check_recovery_target_timeline() converts any value that is
not current, latest, or a valid integer to 0. So for example:
recovery_target_timeline = 'currrent'
results in the following error:
FATAL: 22023: recovery target timeline 0 does not exist
Since there is no range checking for uint32 (but there is a cast from
unsigned long) this setting:
recovery_target_timeline = '9999999999'
results in the following error (on 64-bit):
FATAL: 22023: recovery target timeline 1410065407 does not exist
Improve by adding endptr checking to catch conversion errors and add
range checking to exclude values < 2 and greater than UINT_MAX.
I discovered this while testing on Postgres versions < 12 where
recovery_target_timeline = 'current'
sets the target timeline to 'latest', which is a pretty surprising
result. I'm only noting it in case somebody is searching the archives
since the versions in question are EOL. But that odd result made me
wonder if Postgres >= 12 had a similar problem. While it is not quite so
confusing as Postgres < 12 I think it still bears improving.
The tests are probably excessive but I needed something to show that the
verification works as expected.
Regards,
-David
Attachments:
timeline-log-v1.patchtext/plain; charset=UTF-8; name=timeline-log-v1.patchDownload+3-4
On Thu, Jan 23, 2025 at 02:53:39PM +0000, David Steele wrote:
Currently check_recovery_target_timeline() converts any value that is not
current, latest, or a valid integer to 0. So for example:recovery_target_timeline = 'currrent'
results in the following error:
FATAL: 22023: recovery target timeline 0 does not exist
Indeed, that's confusing. There is nothing telling you what's
actually wrong here, particularly if there is a typo of any kind in
the parameter.
I discovered this while testing on Postgres versions < 12 where
The tests are probably excessive but I needed something to show that the
verification works as expected.
Even with your patch, specifying an incorrect name results in a
complaint about a timeline of 0. Wouldn't it be better to strengthen
the parsing in check_recovery_target_timeline() and/or the error
message reported?
--
Michael
On 1/24/25 01:44, Michael Paquier wrote:
On Thu, Jan 23, 2025 at 02:53:39PM +0000, David Steele wrote:
I discovered this while testing on Postgres versions < 12 where
The tests are probably excessive but I needed something to show that the
verification works as expected.Even with your patch, specifying an incorrect name results in a
complaint about a timeline of 0. Wouldn't it be better to strengthen
the parsing in check_recovery_target_timeline() and/or the error
message reported?
I attached the wrong patch. Oops!
Regards,
-David
Attachments:
timeline-check-v1.patchtext/plain; charset=UTF-8; name=timeline-check-v1.patchDownload+72-3
On Fri, Jan 24, 2025 at 01:36:45PM +0000, David Steele wrote:
I attached the wrong patch. Oops!
Thanks for the new patch.
+ timeline = strtoull(*newval, &endp, 0); + + if (*endp != '\0' || errno == EINVAL || errno == ERANGE) { GUC_check_errdetail("\"recovery_target_timeline\" is not a valid number."); return false; }
Why not using strtou64() here? That's more portable depending on
SIZEOF_LONG (aka the 4 bytes on WIN32 vs 8 bytes for the rest of the
world).
+ + if (timeline < 2 || timeline > UINT_MAX) + {
A recovery_target_timeline of 1 is a valid value, isn't it?
+ GUC_check_errdetail( + "\"recovery_target_timeline\" must be between 2 and 4,294,967,295."); + return false; + }
I would suggest to rewrite that as \"%s\" must be between %u and %u,
which would be more generic for translations in case there is an
overlap with something else.
+$logfile = slurp_file($node_standby->logfile()); +ok($logfile =~ qr/must be between 2 and 4,294,967,295/, + 'target timelime out of max range');
These sequences of tests could be improved:
- Let's use start locations for the logs slurped, reducing the scope
of the logs scanned.
- Perhaps it would be better to rely on wait_for_log() to make sure
that the expected log contents are generated without any kind of race
condition?
--
Michael
On 2/14/25 02:42, Michael Paquier wrote:
On Fri, Jan 24, 2025 at 01:36:45PM +0000, David Steele wrote:
+ timeline = strtoull(*newval, &endp, 0); + + if (*endp != '\0' || errno == EINVAL || errno == ERANGE) { GUC_check_errdetail("\"recovery_target_timeline\" is not a valid number."); return false; }Why not using strtou64() here? That's more portable depending on
SIZEOF_LONG (aka the 4 bytes on WIN32 vs 8 bytes for the rest of the
world).
Done.
+ + if (timeline < 2 || timeline > UINT_MAX) + {A recovery_target_timeline of 1 is a valid value, isn't it?
+ GUC_check_errdetail( + "\"recovery_target_timeline\" must be between 2 and 4,294,967,295."); + return false; + }I would suggest to rewrite that as \"%s\" must be between %u and %u,
which would be more generic for translations in case there is an
overlap with something else.
Done. This means there are no commas in the upper bound but I don't
think it's a big deal and it more closely matches other GUC messages.
+$logfile = slurp_file($node_standby->logfile()); +ok($logfile =~ qr/must be between 2 and 4,294,967,295/, + 'target timelime out of max range');These sequences of tests could be improved:
- Let's use start locations for the logs slurped, reducing the scope
of the logs scanned.
Done.
- Perhaps it would be better to rely on wait_for_log() to make sure
that the expected log contents are generated without any kind of race
condition?
Done.
I'm also now using a single cluster for all three tests rather than
creating a new one for each test. This is definitely more efficient.
Having said that, I think these tests are awfully expensive for a single
GUC. Unit tests would definitely be preferable but that's not an option
for GUCs, so far as I know.
Thanks,
-David
Attachments:
timeline-check-v2.patchtext/plain; charset=UTF-8; name=timeline-check-v2.patchDownload+64-4
On Thu, Apr 24, 2025 at 03:34:29PM +0000, David Steele wrote:
Done. This means there are no commas in the upper bound but I don't think
it's a big deal and it more closely matches other GUC messages.
One thing that I have double-checked is if we have similar messages
for GUCs that are defined as strings while requiring some range
checks. While we have similar messages (tablesample code is one,
opclass a second), we don't have similar thing for GUCs. I'd bet that
this would be a win in the long-run anyway.
I'm also now using a single cluster for all three tests rather than creating
a new one for each test. This is definitely more efficient.Having said that, I think these tests are awfully expensive for a single
GUC. Unit tests would definitely be preferable but that's not an option for
GUCs, so far as I know.
On my laptop, 003_recovery_targets.pl increases from 8.2s to 8.7s,
which seems OK here for the coverage we have.
Undoing the changes in xlogrecovery.c causes the tests to fail, so we
are good.
"invalid recovery startup fails" is used three times repeatedly for
the tests with bogus and out-of-bound values. I'd suggest to make
these more verbose, with three extras "bogus value", "lower bound
check" and "upper bound check" added.
Side thing noted while reading the area: check_recovery_target_xid()
does not have any GUC_check_errdetail() giving details for the EINVAL
and ERANGE cases. Your addition for recovery_target_timeline is a
good idea, so perhaps we could do the same there? No need to do that,
that's just something I've noticed in passing..
And you are following the fat-comma convention for the command lines,
thanks for doing that. Note that I'm not planning to do anything here
for v18 now that we are in feature freeze, these would be for v19 once
the branch opens.
--
Michael
On 4/24/25 20:12, Michael Paquier wrote:
On Thu, Apr 24, 2025 at 03:34:29PM +0000, David Steele wrote:
Having said that, I think these tests are awfully expensive for a single
GUC. Unit tests would definitely be preferable but that's not an option for
GUCs, so far as I know.On my laptop, 003_recovery_targets.pl increases from 8.2s to 8.7s,
which seems OK here for the coverage we have.
Multiply half a second by similar tests on all the GUCs and it would add
up to a lot. But no argument here on keeping the tests.
"invalid recovery startup fails" is used three times repeatedly for
the tests with bogus and out-of-bound values. I'd suggest to make
these more verbose, with three extras "bogus value", "lower bound
check" and "upper bound check" added.
I have updated the labels to be more descriptive.
Side thing noted while reading the area: check_recovery_target_xid()
does not have any GUC_check_errdetail() giving details for the EINVAL
and ERANGE cases. Your addition for recovery_target_timeline is a
good idea, so perhaps we could do the same there? No need to do that,
that's just something I've noticed in passing..
I don't want to add that to this commit but I have added a note to my
list of PG improvements to make when I have time.
And you are following the fat-comma convention for the command lines,
thanks for doing that.
Just trying to follow the convention from existing tests, but you are
welcome!
Note that I'm not planning to do anything here
for v18 now that we are in feature freeze, these would be for v19 once
the branch opens.
That was my expectation. I just had some time to get this patch updated
so took the opportunity.
Regards,
-David
Attachments:
timeline-check-v3.patchtext/plain; charset=UTF-8; name=timeline-check-v3.patchDownload+64-4
On Fri, Apr 25, 2025 at 01:50:16PM +0000, David Steele wrote:
That was my expectation. I just had some time to get this patch updated so
took the opportunity.
And well, now is the time to get this patch done. So, applied after
an extra lookup, with a switch from UINT_MAX to PG_UINT32_MAX in the
code.
Thanks for your patience!
--
Michael
On 7/2/25 22:24, Michael Paquier wrote:
On Fri, Apr 25, 2025 at 01:50:16PM +0000, David Steele wrote:
That was my expectation. I just had some time to get this patch updated so
took the opportunity.And well, now is the time to get this patch done. So, applied after
an extra lookup, with a switch from UINT_MAX to PG_UINT32_MAX in the
code.
Excellent, thank you! I'm also planning a similar improvement for
check_recovery_target_xid as we discussed up thread.
Regards,
-David
On Fri, Feb 20, 2026 at 03:25:30AM +0000, David Steele wrote:
I noticed that when you replaced UINT_MAX with PG_UINT32_MAX you missed one
instance. Fixed in the attached.
Added pgsql-hackers in CC for awareness, with your proposed patch
attached, David.
It looks so. Will fix, thanks!
--
Michael
Attachments:
uint32.patchtext/plain; charset=us-asciiDownload+1-1
Import Notes
Reply to msg id not found: 7c1eb8fe-7a19-43d0-84cb-9840d8bcdd14@pgbackrest.org