Use strtoi64() in pgbench, replacing its open-coded implementation
Here's a small patch to replace the int64 parsing code in pgbench with a
call to strtoi64(). Makes it a little simpler.
Spotted this while grepping for all the different integer parsing
functions we have. We could probably consolidate them some more, we
still have quite a different integer-parsing routines in the backend and
in the frontend. But this is one small, straightforward step in that
direction.
- Heikki
Attachments:
0001-Use-strtoi64-in-pgbench-replacing-its-open-coded-imp.patchtext/x-patch; charset=UTF-8; name=0001-Use-strtoi64-in-pgbench-replacing-its-open-coded-imp.patchDownload
From 507f2abcea1b111c332c168acf1052e7bffd7249 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date: Wed, 19 Nov 2025 17:37:04 +0200
Subject: [PATCH 1/1] Use strtoi64() in pgbench, replacing its open-coded
implementation
Makes the code a little simpler.
The old implementation accepted trailing whitespace, but that seemed
unnecessary. Firstly, its sibling function for parsing decimals,
strtodouble(), does not accept trailing whitespace. Secondly, none of
the callers can pass a string with trailing whitespace to it.
Discussion: XXX
---
src/bin/pgbench/pgbench.c | 78 +++++++++------------------------------
1 file changed, 17 insertions(+), 61 deletions(-)
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index a425176ecdc..a9c36d06980 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -982,13 +982,17 @@ usage(void)
progname, progname, PACKAGE_BUGREPORT, PACKAGE_NAME, PACKAGE_URL);
}
-/* return whether str matches "^\s*[-+]?[0-9]+$" */
+/*
+ * Return whether str matches "^\s*[-+]?[0-9]+$"
+ *
+ * This should agree with strtoint64() on what's accepted, ignoring overflows.
+ */
static bool
is_an_int(const char *str)
{
const char *ptr = str;
- /* skip leading spaces; cast is consistent with strtoint64 */
+ /* skip leading spaces */
while (*ptr && isspace((unsigned char) *ptr))
ptr++;
@@ -1012,9 +1016,6 @@ is_an_int(const char *str)
/*
* strtoint64 -- convert a string to 64-bit integer
*
- * This function is a slightly modified version of pg_strtoint64() from
- * src/backend/utils/adt/numutils.c.
- *
* The function returns whether the conversion worked, and if so
* "*result" is set to the result.
*
@@ -1023,71 +1024,26 @@ is_an_int(const char *str)
bool
strtoint64(const char *str, bool errorOK, int64 *result)
{
- const char *ptr = str;
- int64 tmp = 0;
- bool neg = false;
-
- /*
- * Do our own scan, rather than relying on sscanf which might be broken
- * for long long.
- *
- * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
- * value as a negative number.
- */
-
- /* skip leading spaces */
- while (*ptr && isspace((unsigned char) *ptr))
- ptr++;
-
- /* handle sign */
- if (*ptr == '-')
- {
- ptr++;
- neg = true;
- }
- else if (*ptr == '+')
- ptr++;
+ char *end;
- /* require at least one digit */
- if (unlikely(!isdigit((unsigned char) *ptr)))
- goto invalid_syntax;
+ errno = 0;
+ *result = strtoi64(str, &end, 10);
- /* process digits */
- while (*ptr && isdigit((unsigned char) *ptr))
+ if (unlikely(errno != 0))
{
- int8 digit = (*ptr++ - '0');
-
- if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
- unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
- goto out_of_range;
+ if (!errorOK)
+ pg_log_error("value \"%s\" is out of range for type bigint", str);
+ return false;
}
- /* allow trailing whitespace, but not other trailing chars */
- while (*ptr != '\0' && isspace((unsigned char) *ptr))
- ptr++;
-
- if (unlikely(*ptr != '\0'))
- goto invalid_syntax;
-
- if (!neg)
+ if (unlikely(end == str || *end != '\0'))
{
- if (unlikely(tmp == PG_INT64_MIN))
- goto out_of_range;
- tmp = -tmp;
+ if (!errorOK)
+ pg_log_error("invalid input syntax for type bigint: \"%s\"", str);
+ return false;
}
- *result = tmp;
return true;
-
-out_of_range:
- if (!errorOK)
- pg_log_error("value \"%s\" is out of range for type bigint", str);
- return false;
-
-invalid_syntax:
- if (!errorOK)
- pg_log_error("invalid input syntax for type bigint: \"%s\"", str);
- return false;
}
/* convert string to double, detecting overflows/underflows */
--
2.47.3
Hi Heikki:
On Thu, Nov 20, 2025 at 8:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's a small patch to replace the int64 parsing code in pgbench with a
call to strtoi64(). Makes it a little simpler.Spotted this while grepping for all the different integer parsing
functions we have. We could probably consolidate them some more, we
still have quite a different integer-parsing routines in the backend and
in the frontend. But this is one small, straightforward step in that
direction.
I wrote a small program to test your patch and found that for strings like "
12 ", it does not handle the trailing spaces and considers the input
invalid. However, the original "strtoint64" function processes the trailing
spaces correctly. Below is the small program I used:
#include <stdio.h>
#include <errno.h>
typedef unsigned long long int64;
#define strtoi64(str, endptr, base) ((int64) strtol(str, endptr, base))
int main(int argc, char **argv)
{
unsigned long long result;
char *end;
char *str = argv[1];
result = strtoi64(str, &end, 10);
if (errno != 0)
{
printf("%s\n", strerror(errno));
return 1;
}
if (end == str || *end != '\0')
{
printf("invalid input syntax for type bigint: \"%s\"\n", str);
return 1;
}
return 0;
}
When running ./test " 12 ", the output is:
invalid input syntax for type bigint: " 12 "
$ uname -a
Linux dev 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64
x86_64 x86_64 GNU/Linux
Regards,
Shi Yuefei
Hi,
On Thu, Nov 20, 2025 at 8:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's a small patch to replace the int64 parsing code in pgbench with a
call to strtoi64(). Makes it a little simpler.
+1 on this simplification – it definitely makes the code cleaner.
One small note: the updated code doesn’t handle trailing spaces in the
input string. Should we consider this a concern?
Neil Chen <carpenter.nail.cz@gmail.com> writes:
+1 on this simplification – it definitely makes the code cleaner.
One small note: the updated code doesn’t handle trailing spaces in the
input string. Should we consider this a concern?
Heikki's draft commit message addresses that point:
The old implementation accepted trailing whitespace, but that seemed
unnecessary. Firstly, its sibling function for parsing decimals,
strtodouble(), does not accept trailing whitespace. Secondly, none of
the callers can pass a string with trailing whitespace to it.
I didn't try to verify the latter assertion, but if it's true,
we don't need the extra complication.
regards, tom lane
On Thu, Nov 20, 2025 at 9:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki's draft commit message addresses that point:
The old implementation accepted trailing whitespace, but that seemed
unnecessary. Firstly, its sibling function for parsing decimals,
strtodouble(), does not accept trailing whitespace. Secondly, none of
the callers can pass a string with trailing whitespace to it.
My mistake – Heikki is absolutely right.
Looking at the two call sites of the function: one filters out trailing
spaces within the 'is_an_int' function, and the other in exprscan.l won’t
pass strings with trailing spaces either.
I didn't try to verify the latter assertion, but if it's true,
we don't need the extra complication.
make sense
On Nov 19, 2025, at 23:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's a small patch to replace the int64 parsing code in pgbench with a call to strtoi64(). Makes it a little simpler.
Spotted this while grepping for all the different integer parsing functions we have. We could probably consolidate them some more, we still have quite a different integer-parsing routines in the backend and in the frontend. But this is one small, straightforward step in that direction.
- Heikki
<0001-Use-strtoi64-in-pgbench-replacing-its-open-coded-imp.patch>
I have no concern if we decide to no longer support tailing spaces, while I still got a couple of small comments:
1
```
-/* return whether str matches "^\s*[-+]?[0-9]+$" */
+/*
+ * Return whether str matches "^\s*[-+]?[0-9]+$"
+ *
+ * This should agree with strtoint64() on what's accepted, ignoring overflows.
+ */
static bool
is_an_int(const char *str)
```
Here you added a comment saying "ignoring overflows”, yes, is_an_int() doesn’t check if the integer overflows.
But looking at where the function is called:
```
else if (is_an_int(var->svalue))
{
/* if it looks like an int, it must be an int without overflow */
int64 iv;
if (!strtoint64(var->svalue, false, &iv))
return false;
setIntValue(&var->value, iv);
}
```
The comment says “it must be an int without overflow”, so this comment should be updated as well.
2
```
+ if (unlikely(errno != 0))
{
- int8 digit = (*ptr++ - '0');
-
- if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
- unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
- goto out_of_range;
+ if (!errorOK)
+ pg_log_error("value \"%s\" is out of range for type bigint", str);
+ return false;
}
```
Here we log an “out out range” error when errno is not 0, which is inaccurate, we should check ERANGE.
strtoi64() maps to strtol()/strtoll(), the functions could return more errors than ERANGE.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 20/11/2025 05:36, Chao Li wrote:
I have no concern if we decide to no longer support tailing spaces, while I still got a couple of small comments:
1 ``` -/* return whether str matches "^\s*[-+]?[0-9]+$" */ +/* + * Return whether str matches "^\s*[-+]?[0-9]+$" + * + * This should agree with strtoint64() on what's accepted, ignoring overflows. + */ static bool is_an_int(const char *str) ```Here you added a comment saying "ignoring overflows”, yes, is_an_int() doesn’t check if the integer overflows.
But looking at where the function is called:
```
else if (is_an_int(var->svalue))
{
/* if it looks like an int, it must be an int without overflow */
int64 iv;if (!strtoint64(var->svalue, false, &iv))
return false;setIntValue(&var->value, iv);
}
```The comment says “it must be an int without overflow”, so this comment should be updated as well.
Hmm, I don't think it's wrong as it is, and this patch doesn't change
that behavior. That comment is a little vague though.
How about the following phrasing:
/*
* If it looks like an integer, treat it as such. If it turns out to be
* too large for 'int64', return failure rather than fall back to 'double'.
*/
I don't feel the urge to refactor this myself right now, but we probably
could simplify this further. For example, I wonder if we should remove
is_an_int() altogether and rely on strtoi64() to return failure if the
input does't look like a integer. Also, strtodouble() is never called
with "errorOk != false".
2 ``` + if (unlikely(errno != 0)) { - int8 digit = (*ptr++ - '0'); - - if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) || - unlikely(pg_sub_s64_overflow(tmp, digit, &tmp))) - goto out_of_range; + if (!errorOK) + pg_log_error("value \"%s\" is out of range for type bigint", str); + return false; } ```Here we log an “out out range” error when errno is not 0, which is inaccurate, we should check ERANGE.
strtoi64() maps to strtol()/strtoll(), the functions could return more errors than ERANGE.
Good point. The existing strtodouble() function, which uses strtod(),
has the same issue (per POSIX spec at
https://pubs.opengroup.org/onlinepubs/000095399/functions/strtod.html):
These functions may fail if:
[EINVAL]
[CX] [Option Start] No conversion could be performed. [Option End]
Fixed that and committed. Thanks for the review!
- Heikki
On 2025-Nov-21, Heikki Linnakangas wrote:
I don't feel the urge to refactor this myself right now, but we probably
could simplify this further. For example, I wonder if we should remove
is_an_int() altogether and rely on strtoi64() to return failure if the input
does't look like a integer.
I had the same thought -- is_an_int() is not doing anything useful and
it would be better to get rid of it. If we do have an integer-looking
that doesn't fit in int64, then maybe treating it as a double is not
wrong. (I suppose if we wanted to have numeric values beyond int64
range and not lose precision, we would have to add separate support for
that.)
--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/