refactoring - share str2*int64 functions
Hello Tom,
Yep, but ISTM that it is down to 32 bits,
Only on 32-bit-long machines, which are a dwindling minority (except
for Windows, which I don't really care about).So the third short is now always 0. Hmmm. I'll propose another option over
the week-end.I suppose we could put pg_strtouint64 somewhere where pgbench can use it,
but TBH I don't think it's worth the trouble. The set of people using
the --random-seed=int option at all is darn near empty, I suspect,
and the documentation only says you can write an int there.
Although I agree it is not worth a lot of trouble, and even if I don't do
Windows, I think it valuable that the behavior is the same on all
platform. The attached match shares pg_str2*int64 functions between
frontend and backend by moving them to "common/", which avoids some code
duplication.
This is more refactoring, and it fixes the behavior change on 32 bit
architectures.
--
Fabien.
As usual, it is better with the attachement. Sorry for the noise.
Yep, but ISTM that it is down to 32 bits,
Only on 32-bit-long machines, which are a dwindling minority (except
for Windows, which I don't really care about).So the third short is now always 0. Hmmm. I'll propose another option over
the week-end.I suppose we could put pg_strtouint64 somewhere where pgbench can use it,
but TBH I don't think it's worth the trouble. The set of people using
the --random-seed=int option at all is darn near empty, I suspect,
and the documentation only says you can write an int there.Although I agree it is not worth a lot of trouble, and even if I don't do
Windows, I think it valuable that the behavior is the same on all platform.
The attached match shares pg_str2*int64 functions between frontend and
backend by moving them to "common/", which avoids some code duplication.This is more refactoring, and it fixes the behavior change on 32 bit
architectures.
--
Fabien.
Attachments:
share-strtoX64-functions-1.patchtext/x-diff; name=share-strtoX64-functions-1.patchDownload+137-235
Although I agree it is not worth a lot of trouble, and even if I don't do
Windows, I think it valuable that the behavior is the same on all platform.
The attached match shares pg_str2*int64 functions between frontend and
backend by moving them to "common/", which avoids some code duplication.This is more refactoring, and it fixes the behavior change on 32 bit
architectures.
V2 is a rebase.
--
Fabien.
Attachments:
share-strtoX64-functions-2.patchtext/x-diff; name=share-strtoX64-functions-2.patchDownload+137-235
On Fri, May 24, 2019 at 3:23 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Although I agree it is not worth a lot of trouble, and even if I don't do
Windows, I think it valuable that the behavior is the same on all platform.
The attached match shares pg_str2*int64 functions between frontend and
backend by moving them to "common/", which avoids some code duplication.This is more refactoring, and it fixes the behavior change on 32 bit
architectures.V2 is a rebase.
Hi Fabien,
Here's some semi-automated feedback, noted while going through
failures on cfbot.cputube.org. You have a stray editor file
src/backend/parser/parse_node.c.~1~. Something is failing to compile
while doing the temp-install in make check-world, which probably
indicates that some test or contrib module is using the interface you
changed?
--
Thomas Munro
https://enterprisedb.com
On Mon, Jul 8, 2019 at 3:22 PM Thomas Munro <thomas.munro@gmail.com> wrote:
Here's some semi-automated feedback, noted while going through
failures on cfbot.cputube.org. You have a stray editor file
src/backend/parser/parse_node.c.~1~. Something is failing to compile
while doing the temp-install in make check-world, which probably
indicates that some test or contrib module is using the interface you
changed?
Please disregard the the comment about the ".~1~" file, my mistake.
As for the check-world failure, it's here:
pg_stat_statements.c:1024:11: error: implicit declaration of function
'pg_strtouint64' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
rows = pg_strtouint64(completionTag + 5, NULL, 10);
^
Apparently it needs to include common/string.h.
--
Thomas Munro
https://enterprisedb.com
Hello Thomas,
pg_stat_statements.c:1024:11: error: implicit declaration of function
'pg_strtouint64' is invalid in C99
[-Werror,-Wimplicit-function-declaration]
rows = pg_strtouint64(completionTag + 5, NULL, 10);
^
Apparently it needs to include common/string.h.
Yep, I gathered that as well, but did not act promptly on your help.
Thanks for it!
Here is the updated patch on which I checked "make check-world".
--
Fabien.
Attachments:
share-strtoX64-functions-3.patchtext/x-diff; name=share-strtoX64-functions-3.patchDownload+138-235
On Sun, Jul 14, 2019 at 3:22 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Here is the updated patch on which I checked "make check-world".
Thanks! So, we're moving pg_strtouint64() to a place where frontend
code can use it, and getting rid of some duplication. I like it. I
wanted this once before myself[1]/messages/by-id/CAEepm=2KeC8xDbEWgDTDObXGqPHFW4kcD7BZXR6NMfiHjjnKhQ@mail.gmail.com.
+extern bool pg_strtoint64(const char *str, bool errorOK, int64 *result);
+extern uint64 pg_strtouint64(const char *str, char **endptr, int base);
One of these things is not like the other. Let's see... the int64
version is used only by pgbench and is being promoted to common where
it can be used by more code. With a name like that, wouldn't it make
sense to bring it into line with the uint64 interface, and then move
pgbench's error reporting stuff back into pgbench? The uint64 one
derives its shape from the family of standard functions like strtol()
so I think it wins.
[1]: /messages/by-id/CAEepm=2KeC8xDbEWgDTDObXGqPHFW4kcD7BZXR6NMfiHjjnKhQ@mail.gmail.com
--
Thomas Munro
https://enterprisedb.com
Hello Thomas,
+extern bool pg_strtoint64(const char *str, bool errorOK, int64 *result); +extern uint64 pg_strtouint64(const char *str, char **endptr, int base);One of these things is not like the other.
Indeed.
I agree that it is unfortunate, and it was bothering me a little as well.
Let's see... the int64 version is used only by pgbench and is being
promoted to common where it can be used by more code.
No and yes.
The pgbench code was a copy of server-side internal "scanint8", so it is
used both by pgbench and the server-side handling of "int8", it is used
significantly, taking advantage of its versatile error reporting feature
on both sides.
With a name like that, wouldn't it make sense to bring it into line with
the uint64 interface, and then move pgbench's error reporting stuff back
into pgbench?
That would need moving the server-side error handling as well, which I
would not really be happy with.
The uint64 one derives its shape from the family of standard functions
like strtol() so I think it wins.
Yep, it cannot be changed either.
I do not think that changing the error handling capability is appropriate,
it is really a feature of the function. The function could try to use an
internal pg_strtoint64 which would look like the other unsigned version,
but then it would not differentiate the various error conditions (out of
range vs syntax error).
The compromise I can offer is to change the name of the first one, say to
"pg_scanint8" to reflect its former backend name. Attached a v4 which does
a renaming so as to avoid the name similarity but signature difference. I
also made both error messages identical.
--
Fabien.
Attachments:
share-strtoX64-functions-4.patchtext/x-diff; name=share-strtoX64-functions-4.patchDownload+138-235
On Mon, Jul 15, 2019 at 5:08 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
The compromise I can offer is to change the name of the first one, say to
"pg_scanint8" to reflect its former backend name. Attached a v4 which does
a renaming so as to avoid the name similarity but signature difference. I
also made both error messages identical.
Cool. I'm not exactly sure when we should include 'pg_' in identifier
names. It seems to be used for functions/macros that wrap or replace
something else with a similar name, like pg_pwrite(),
pg_attribute_noreturn(), ... In this case it's just our own code that
we're moving, so I'm wondering if we should just call it scanint8().
If you agree, I think this is ready to commit.
--
Thomas Munro
https://enterprisedb.com
On 2019-Jul-16, Thomas Munro wrote:
On Mon, Jul 15, 2019 at 5:08 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
The compromise I can offer is to change the name of the first one, say to
"pg_scanint8" to reflect its former backend name. Attached a v4 which does
a renaming so as to avoid the name similarity but signature difference. I
also made both error messages identical.Cool. I'm not exactly sure when we should include 'pg_' in identifier
names. It seems to be used for functions/macros that wrap or replace
something else with a similar name, like pg_pwrite(),
pg_attribute_noreturn(), ... In this case it's just our own code that
we're moving, so I'm wondering if we should just call it scanint8().
Isn't it annoying that pg_strtouint64() has an implementation that
suggests that it ought to be in src/port? The fact that the signatures
are so different suggests to me that we should indeed put them separate.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jul 16, 2019 at 11:16:31AM +1200, Thomas Munro wrote:
Cool. I'm not exactly sure when we should include 'pg_' in identifier
names. It seems to be used for functions/macros that wrap or replace
something else with a similar name, like pg_pwrite(),
pg_attribute_noreturn(), ... In this case it's just our own code that
we're moving, so I'm wondering if we should just call it scanint8().
FWIW, I was looking forward to putting my hands on this patch and try
to get it merged so as we can get rid of those duplications. Here are
some comments.
+#ifdef FRONTEND
+ fprintf(stderr,
+ "invalid input syntax for type %s: \"%s\"\n",
"bigint", str);
+#else
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("invalid input syntax for type %s: \"%s\"",
+ "bigint", str)));
+#endif
Have you looked at using the wrapper pg_log_error() here?
+extern bool pg_scanint8(const char *str, bool errorOK, int64
*result);
+extern uint64 pg_strtouint64(const char *str, char **endptr, int
base);
Hmm. With this patch we have strtoint and pg_strtouint64, which makes
the whole set inconsistent.
+
#endif /* COMMON_STRING_H */
Noise diff..
numutils.c also has pg_strtoint16 and pg_strtoint32, so the locations
become rather inconsistent with inconsistent APIs for the manipulation
of int2 and int4 fields, and scanint8 is just a derivative of the same
logic. We have two categories of routines here:
- The wrappers on top of strtol and strtoul & co, which are named
respectively strtoint and pg_strtouint64 with the patch. The naming
part is inconsistent, and we only handle uint64 and int32. We don't
need to worry about int64 and uint32 because they are not used?
That's fine by me, but at least let's have a consistent naming.
Prefixing the functions with pg_* is a better practice in my opinion
as we will unlikely run into conflicts this way.
- The str->integer conversion routines, which actually have very
similar characteristics to the strtol families as they remove trailing
whitespaces first, check for a sign, etc, except that they work only
on base 10. And here we get into a state where pg_scanint8 should be
actually called pg_strtoint64, with an interface inconsistent with its
int32/int16 relatives now only in the backend. Could we consider more
consolidation here please? Like moving the whole set to src/common/?
If you agree, I think this is ready to commit.
Thomas, are you planning to look at this patch as committer? I had it
in my agenda, and was planning to look at it sooner than later. Now
if you are on it, I won't step on your toes.
--
Michael
On Tue, Jul 16, 2019 at 7:11 PM Michael Paquier <michael@paquier.xyz> wrote:
Thomas, are you planning to look at this patch as committer? I had it
in my agenda, and was planning to look at it sooner than later. Now
if you are on it, I won't step on your toes.
Hi Michael, please go ahead, it's all yours.
--
Thomas Munro
https://enterprisedb.com
Hello Thomas,
On Mon, Jul 15, 2019 at 5:08 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
The compromise I can offer is to change the name of the first one, say to
"pg_scanint8" to reflect its former backend name. Attached a v4 which does
a renaming so as to avoid the name similarity but signature difference. I
also made both error messages identical.Cool. I'm not exactly sure when we should include 'pg_' in identifier
names. It seems to be used for functions/macros that wrap or replace
something else with a similar name, like pg_pwrite(),
pg_attribute_noreturn(), ... In this case it's just our own code that
we're moving, so I'm wondering if we should just call it scanint8().
I added the pg_ prefix as a poor man's namespace because the function can
be used by external tools (eg contribs), so as to avoid potential name
conflicts.
I agree that such conflicts are less probable if the name does not replace
something existing.
If you agree, I think this is ready to commit.
It can be removed, or not. So you do as you feel.
--
Fabien.
On Jul 16, 2019, at 3:30 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Cool. I'm not exactly sure when we should include 'pg_' in identifier
names. It seems to be used for functions/macros that wrap or replace
something else with a similar name, like pg_pwrite(),
pg_attribute_noreturn(), ... In this case it's just our own code that
we're moving, so I'm wondering if we should just call it scanint8().I added the pg_ prefix as a poor man's namespace because the function can be used by external tools (eg contribs), so as to avoid potential name conflicts.
Yeah, I think if we are going to expose it to front end code there is a good argument for some kind of prefix that makes it sound PostgreSQL-related.
...Robert
Robert Haas <robertmhaas@gmail.com> writes:
On Jul 16, 2019, at 3:30 AM, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
Cool. I'm not exactly sure when we should include 'pg_' in identifier
names.
I added the pg_ prefix as a poor man's namespace because the function can be used by external tools (eg contribs), so as to avoid potential name conflicts.
Yeah, I think if we are going to expose it to front end code there is a good argument for some kind of prefix that makes it sound PostgreSQL-related.
Yeah, I'd tend to err in favor of including "pg_". We might get away
without that as long as the name is never exposed to non-PG code, but
for stuff that's going into src/common/ or src/port/ I think that's
a risky assumption to make.
I'm also in agreement with Michael's comments in
<20190716071144.GF1439@paquier.xyz> that this would be a good time
to bring some consistency to the naming of related functions.
regards, tom lane
Hi,
On 2019-07-15 07:08:42 +0200, Fabien COELHO wrote:
I do not think that changing the error handling capability is appropriate,
it is really a feature of the function. The function could try to use an
internal pg_strtoint64 which would look like the other unsigned version, but
then it would not differentiate the various error conditions (out of range
vs syntax error).
The compromise I can offer is to change the name of the first one, say to
"pg_scanint8" to reflect its former backend name. Attached a v4 which does a
renaming so as to avoid the name similarity but signature difference. I also
made both error messages identical.
I think the interface of that function is not that good, and the "scan"
in the name isn't great for discoverability (for one it's a different
naming than pg_strtoint16 etc), and the *8 meaning 64bit is confusing
enough in the backend, we definitely shouldn't extend that to frontend
code.
Referencing "bigint" and "input syntax" from frontend code imo doesn't
make a lot of sense. And int8in is the only caller that uses
errorOK=False anyway, so there's currently no need for the frontend
error strings afaict.
ISTM that something like
typedef enum StrToIntConversion
{
STRTOINT_OK = 0,
STRTOINT_SYNTAX_ERROR = 1,
STRTOINT_OUT_OF_RANGE = 2
} StrToIntConversion;
StrToIntConversion pg_strtoint64(const char *str, int64 *result);
would make more sense.
There is the issue that there already is pg_strtoint16 and
pg_strtoint32, which do not have the option to not raise an error. I'd
probably name the non-error throwing ones something like pg_strtointNN_e
(for extended, or error handling), and have pg_strtointNN wrappers that
just handle the errors, or reverse the naming (which'd cause a bit of
churn, but not that much).
That'd also make the code for pg_strtointNN a bit nicer, because we'd
not need the gotos anymore, they're just there to avoid redundant error
messages - which'd not be an issue if the error handling were just a
switch in a separate function. E.g.
int32
pg_strtoint32(const char *s)
{
int32 result;
switch (pg_strtoint32_e(s, &result))
{
case STRTOINT_OK:
return result;
case STRTOINT_SYNTAX_ERROR:
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
errmsg("value \"%s\" is out of range for type %s",
s, "integer")));
case STRTOINT_OUT_OF_RANGE:
ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input syntax for type %s: \"%s\"",
"integer", s)));
}
return 0; /* keep compiler quiet */
}
which does seem nicer than what we have right now.
Greetings,
Andres Freund
Hi,
On 2019-07-16 16:11:44 +0900, Michael Paquier wrote:
numutils.c also has pg_strtoint16 and pg_strtoint32, so the locations
become rather inconsistent with inconsistent APIs for the manipulation
of int2 and int4 fields, and scanint8 is just a derivative of the same
logic. We have two categories of routines here:
- The wrappers on top of strtol and strtoul & co, which are named
respectively strtoint and pg_strtouint64 with the patch. The naming
part is inconsistent, and we only handle uint64 and int32. We don't
need to worry about int64 and uint32 because they are not used?
That's fine by me, but at least let's have a consistent naming.
Yea, consistent naming seems like a strong requirement
here. Additionally I think we should just provide a consistent set
rather than what's needed just now. That'll just lead to people
inventing their own again down the line.
Prefixing the functions with pg_* is a better practice in my opinion
as we will unlikely run into conflicts this way.
+1
- The str->integer conversion routines, which actually have very
similar characteristics to the strtol families as they remove trailing
whitespaces first, check for a sign, etc, except that they work only
on base 10.
There's afaict neither a caller that needs the base argument at the
moment, nor one in the tree previously. I'd argue for just making
pg_strtouint64's API consistent.
I'd probably also just use the implementation we have for signed
integers (minus the relevant negation and overflow checks, obviously) -
it's a lot faster, and I think there's value in keeping the
implementations in sync.
Greetings,
Andres Freund
On Tue, Jul 16, 2019 at 01:18:38PM -0700, Andres Freund wrote:
Hi,
On 2019-07-16 16:11:44 +0900, Michael Paquier wrote:
Yea, consistent naming seems like a strong requirement
here. Additionally I think we should just provide a consistent set
rather than what's needed just now. That'll just lead to people
inventing their own again down the line.
Agreed. The first versions of pg_rewind in the tree have been using
copy_file_range(), which has been introduced in Linux.
- The str->integer conversion routines, which actually have very
similar characteristics to the strtol families as they remove trailing
whitespaces first, check for a sign, etc, except that they work only
on base 10.There's afaict neither a caller that needs the base argument at the
moment, nor one in the tree previously. I'd argue for just making
pg_strtouint64's API consistent.
Good point, indeed, this could be much more simplified. I have not
paid attention at that part.
I'd probably also just use the implementation we have for signed
integers (minus the relevant negation and overflow checks, obviously) -
it's a lot faster, and I think there's value in keeping the
implementations in sync.
You mean that it is much faster than the set of wrappers for strtol
than we have? Is that because we don't care about the base?
--
Michael
On Tue, Jul 16, 2019 at 01:04:38PM -0700, Andres Freund wrote:
There is the issue that there already is pg_strtoint16 and
pg_strtoint32, which do not have the option to not raise an error. I'd
probably name the non-error throwing ones something like pg_strtointNN_e
(for extended, or error handling), and have pg_strtointNN wrappers that
just handle the errors, or reverse the naming (which'd cause a bit of
churn, but not that much).That'd also make the code for pg_strtointNN a bit nicer, because we'd
not need the gotos anymore, they're just there to avoid redundant error
messages - which'd not be an issue if the error handling were just a
switch in a separate function. E.g.
Agreed on that. I am wondering if we should use a common wrapper for
all the internal functions taking in input a set of bits16 flags to
control its behavior and put all that into common/script.c:
- Issue an error.
- Check for signedness.
- Base length: 16, 32 or 64.
This would have the advantage to move the error string generation, the
trailing whitespace check, sign handling and such in a single place.
We could have the internal routine return uint64 which is casted
afterwards to a proper result depending on what we use. (Perhaps
that's what you actually meant?)
I would also rather not touch the strtol wrappers that we have able to
handle the base. There is nothing in the tree using it, but likely
there are extensions relying on it. Switching all the existing
callers in the tree to the new routines sounds good to me of course.
Consolidating all that still needs more work, so for now I am
switching the patch as waiting on author.
--
Michael
Bonjour Michaᅵl,
FWIW, I was looking forward to putting my hands on this patch and try
to get it merged so as we can get rid of those duplications. Here are
some comments.+#ifdef FRONTEND + fprintf(stderr, + "invalid input syntax for type %s: \"%s\"\n", "bigint", str); +#else + ereport(ERROR, + (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION), + errmsg("invalid input syntax for type %s: \"%s\"", + "bigint", str))); +#endif Have you looked at using the wrapper pg_log_error() here?
I have not.
I have simply merged the two implementations (pgbench & backend) as they
were.
+extern bool pg_scanint8(const char *str, bool errorOK, int64 *result); +extern uint64 pg_strtouint64(const char *str, char **endptr, int base);
Hmm. With this patch we have strtoint and pg_strtouint64, which makes
the whole set inconsistent.
I understand that you mean bits vs bytes? Indeed it can bite!
+
#endif /* COMMON_STRING_H */
Noise diff..
Indeed.
numutils.c also has pg_strtoint16 and pg_strtoint32, so the locations
become rather inconsistent with inconsistent APIs for the manipulation
of int2 and int4 fields, and scanint8 is just a derivative of the same
logic. We have two categories of routines here:
Yep, but the int2/4 functions are not used elsewhere.
- The wrappers on top of strtol and strtoul & co, which are named
respectively strtoint and pg_strtouint64 with the patch. The naming
part is inconsistent, and we only handle uint64 and int32. We don't
need to worry about int64 and uint32 because they are not used?
Indeed, it seems that they are not needed/used by client code, AFAICT.
That's fine by me, but at least let's have a consistent naming.
Ok.
Prefixing the functions with pg_* is a better practice in my opinion
as we will unlikely run into conflicts this way.
Ok.
- The str->integer conversion routines, which actually have very
similar characteristics to the strtol families as they remove trailing
whitespaces first, check for a sign, etc, except that they work only
on base 10. And here we get into a state where pg_scanint8 should be
actually called pg_strtoint64,
I just removed that:-)
ISTM that the issue is that the error handling of these functions is
pretty different.
with an interface inconsistent with its int32/int16 relatives now only
in the backend.
We can, but I'm not at ease with changing the error handling approach.
Could we consider more consolidation here please? Like moving the whole
set to src/common/?
My initial plan was simply to remove direct code duplications between
front-end and back-end, not to re-engineer the full set of string to int
conversion functions:-)
On the re-engineering front: Given the various points on the thread, ISTM
that there should probably be two behaviors for str to signed/unsigned
int{16,32,64}, and having only one kind of signature for all types would
be definitely better.
One low-level one that does the conversion or return an error.
Another higher-level one which possibly adds an error message (stderr for
front-end, log for back-end).
One choice is whether there are two functions (the higher one calling the
lower one and adding the messages) or just one with a boolean to trigger
the messages. I do not have a strong opinion. Maybe one function would be
simpler. Andres boolean-compatible enum return looks like a good option.
Overall, this leads to something like:
enum { STRTOINT_OK, STRTOINT_OVERFLOW_ERROR, STRTOINT_SYNTAX_ERROR }
pg_strto{,u}int{8?,16,32,64}
(const char * string, const TYPE * result, const bool verbose);
--
Fabien.