Allowing printf("%m") only where it actually works
<digression>
For amusement's sake, I was playing around with NetBSD-current (9-to-be)
today, and tried to compile Postgres on it. It works OK --- and I can
even confirm that our new code for using ARM v8 CRC instructions works
there --- but I got a boatload of compile warnings like this:
latch.c:1180:4: warning: %m is only allowed in syslog(3) like functions [-Wformat=]
ereport(ERROR,
^~~~~~~
A bit of googling turned up the patch that caused this [1]https://mail-index.netbsd.org/tech-userlevel/2015/08/21/msg009282.html, which was
soon followed by some well-reasoned push-back [2]https://mail-index.netbsd.org/tech-userlevel/2015/10/23/msg009371.html; but the warning's
still there, so evidently the forces of bullheadedness won. I was
ready to discount the whole thing as being another badly designed
no-wonder-gcc-upstream-won't-take-it compiler warning, when I noticed that
the last few warnings in my output were pointing out a live bug, to wit
using %m with plain old printf rather than elog/ereport. So I fixed
that [3]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a13b47a59ffce6f3c13c8b777738a3aab1db10d3, but I'm thinking that we need to take a bit more care here.
</digression>
Looking around, we have circa seventy-five functions declared with
pg_attribute_printf in our tree right now, and of those, *only* the
elog/ereport support functions can be relied on to process %m correctly.
However, anyone who's accustomed to working in backend code is likely to
not think hard about using %m in an error message, as indeed the authors
and reviewers of pg_verify_checksums did not. Worse yet, such cases
actually will work as long as you're testing on glibc platforms, only
to fail everywhere else.
So I think we need to try to make provisions for getting compiler warnings
when %m is used in a function that doesn't support it. gcc on Linux-ish
platforms isn't going to be very helpful with this, but that doesn't mean
that we should confuse %m-supporting and not-%m-supporting functions,
as we do right now.
Hence, I think we need something roughly like the attached patch, which
arranges to use "gnu_printf" (if available) as the format archetype for
the elog/ereport functions, and plain "printf" for all the rest. With
some additional hackery not included here, this can be ju-jitsu'd into
compiling warning-free on NetBSD-current. (The basic idea is to extend
PGAC_C_PRINTF_ARCHETYPE so it will select __syslog__ as the archetype
if available; but then you need some hack to suppress the follow-on
warnings complained of in [2]https://mail-index.netbsd.org/tech-userlevel/2015/10/23/msg009371.html. I haven't decided what's the least ugly
solution for the latter, so I'm not proposing such a patch yet.)
What I'm mainly concerned about at this stage is what effects this'll
have on Windows builds. The existing comment for PGAC_C_PRINTF_ARCHETYPE
claims that using gnu_printf silences complaints about "%lld" and related
formats on Windows, but I wonder whether that is still true on Windows
versions we still support. As I mentioned in [4]/messages/by-id/13103.1526749980@sss.pgh.pa.us, I don't think we really
work any longer on platforms that don't use "%lld" for "long long" values,
and it seems that Windows does accept that in post-XP versions --- but has
gcc gotten the word?
If this does work as desired on Windows, then that would be a fairly
mainstream platform that can produce warnings about wrong uses of %m,
even if gcc-on-Linux doesn't. If worst comes to worst, somebody could
crank up a buildfarm machine with a newer NetBSD release.
Anyway, I don't feel a need to cram this into v11, since I just fixed
the live bugs of this ilk in HEAD; it seems like it can wait for v12.
So I'll add this to the next commitfest.
regards, tom lane
[1]: https://mail-index.netbsd.org/tech-userlevel/2015/08/21/msg009282.html
[2]: https://mail-index.netbsd.org/tech-userlevel/2015/10/23/msg009371.html
[3]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=a13b47a59ffce6f3c13c8b777738a3aab1db10d3
[4]: /messages/by-id/13103.1526749980@sss.pgh.pa.us
Attachments:
get-compiler-warnings-for-misuse-of-percent-m-1.patchtext/x-diff; charset=us-ascii; name=get-compiler-warnings-for-misuse-of-percent-m-1.patchDownload+29-25
On Mon, May 21, 2018 at 12:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
For amusement's sake, I was playing around with NetBSD-current (9-to-be)
today, and tried to compile Postgres on it. It works OK --- and I can
even confirm that our new code for using ARM v8 CRC instructions works
Excellent news.
there --- but I got a boatload of compile warnings like this:
latch.c:1180:4: warning: %m is only allowed in syslog(3) like functions [-Wformat=]
ereport(ERROR,
^~~~~~~A bit of googling turned up the patch that caused this [1], which was
soon followed by some well-reasoned push-back [2]; but the warning's
still there, so evidently the forces of bullheadedness won. I was
ready to discount the whole thing as being another badly designed
no-wonder-gcc-upstream-won't-take-it compiler warning, when I noticed that
the last few warnings in my output were pointing out a live bug, to wit
using %m with plain old printf rather than elog/ereport. So I fixed
that [3], but I'm thinking that we need to take a bit more care here.
I tried this on macOS and FreeBSD using GCC and Clang: both accept
printf("%m") without warning and then just print out "m". It'll be
interesting to see if the NetBSD patch/idea travels further or some
other solution can be found. I've raised this on the freebsd-hackers
list, let's see... I bet there's other software out there that just
prints out "m" when things go wrong. It's arguably something that
you'd want the complier to understand as a C dialect thing.
--
Thomas Munro
http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes:
I tried this on macOS and FreeBSD using GCC and Clang: both accept
printf("%m") without warning and then just print out "m". It'll be
interesting to see if the NetBSD patch/idea travels further or some
other solution can be found. I've raised this on the freebsd-hackers
list, let's see... I bet there's other software out there that just
prints out "m" when things go wrong. It's arguably something that
you'd want the complier to understand as a C dialect thing.
Yeah. ISTM that the netbsd guys didn't get this quite right. The
gcc docs are perfectly clear about what they think the semantics should
be:
The parameter archetype determines how the format string is
interpreted, and should be printf, scanf, strftime, gnu_printf,
gnu_scanf, gnu_strftime or strfmon. ... archetype values such as
printf refer to the formats accepted by the system’s C runtime
library, while values prefixed with ‘gnu_’ always refer to the formats
accepted by the GNU C Library.
Therefore, what netbsd should have done was leave the semantics of
"gnu_printf" alone (because glibc undoubtedly does take %m), and just emit
a warning with the "printf" archetype --- which is supposed to describe
the behavior of the platform's stdio, which in their case is known not
to take %m. If they'd done it that way then they'd have a patch that gcc
upstream certainly ought to accept, because it follows gcc's own stated
semantics for the check. This would also partially resolve the complaint
Roy Marples had in the thread I alluded to, ie he could use "gnu_printf"
to describe a function that accepts %m. (There might still need to be
some work to be done to avoid bogus -Wmissing-format-attribute complaints,
not sure.) I'm not sure whether a separate archetype for syslog is really
needed. Formally you could say that distinguishing syslog from GNU printf
is wise, but I'm not sure I see a practical need for it.
regards, tom lane
... and, while we're thinking about this, how can we prevent the reverse
problem of using strerror(errno) where you should have used %m? That
is evidently not academic either, cf 81256cd.
I am wondering whether the elog/ereport macros can locally define some
version of "errno" that would cause a compile failure if it's referenced
within the macro args. But I'm too tired to work it out in detail.
regards, tom lane
On Mon, May 21, 2018 at 4:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... and, while we're thinking about this, how can we prevent the reverse
problem of using strerror(errno) where you should have used %m? That
is evidently not academic either, cf 81256cd.I am wondering whether the elog/ereport macros can locally define some
version of "errno" that would cause a compile failure if it's referenced
within the macro args. But I'm too tired to work it out in detail.
Here's an experimental way to do that, if you don't mind depending on
gory details of libc implementations (ie knowledge of what it expands
too). Not sure how to avoid that since it's a macro on all modern
systems, and we don't have a way to temporarily redefine a macro. If
you enable it for just ereport(), it compiles cleanly after 81256cd
(but fails on earlier commits). If you enable it for elog() too then
it finds problems with exec.c.
Another idea: if there are any systems in the build farm where it
isn't a macro as permitted by the standard (#ifndef errno), you could
perhaps define it as something uncompilable and then undefined it at
the end of the scope, so we'd at least have a post-commit canary.
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
prevent-errno.patchapplication/octet-stream; name=prevent-errno.patchDownload+15-0
Thomas Munro <thomas.munro@enterprisedb.com> writes:
On Mon, May 21, 2018 at 4:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I am wondering whether the elog/ereport macros can locally define some
version of "errno" that would cause a compile failure if it's referenced
within the macro args. But I'm too tired to work it out in detail.
Here's an experimental way to do that, if you don't mind depending on
gory details of libc implementations (ie knowledge of what it expands
too). Not sure how to avoid that since it's a macro on all modern
systems, and we don't have a way to temporarily redefine a macro. If
you enable it for just ereport(), it compiles cleanly after 81256cd
(but fails on earlier commits). If you enable it for elog() too then
it finds problems with exec.c.
Hmm ... that's pretty duff code in exec.c, isn't it. Aside from the
question of errno unsafety, it's using elog where it really ought to be
using ereport, it's not taking any thought for the reported SQLSTATE,
etc. I'm hesitant to mess with it mere hours before the beta wrap,
but we really oughta improve that.
I noticed another can of worms here, too: on Windows, doesn't use of
GetLastError() in elog/ereport have exactly the same hazard as errno?
Or is there some reason to think it can't change value during errstart()?
regards, tom lane
On Mon, May 21, 2018 at 2:41 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
I've raised this on the freebsd-hackers
list, let's see... I bet there's other software out there that just
prints out "m" when things go wrong. It's arguably something that
you'd want the complier to understand as a C dialect thing.
Result: FreeBSD printf() now supports %m, and an issue is being raised
with clang about lack of warnings on OSes that don't grok %m, and lack
of warnings when you say -Wformat-non-iso on any OS. Let's see how
that goes.
--
Thomas Munro
http://www.enterprisedb.com
I wrote:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
Here's an experimental way to do that, if you don't mind depending on
gory details of libc implementations (ie knowledge of what it expands
too). Not sure how to avoid that since it's a macro on all modern
systems, and we don't have a way to temporarily redefine a macro. If
you enable it for just ereport(), it compiles cleanly after 81256cd
(but fails on earlier commits). If you enable it for elog() too then
it finds problems with exec.c.
Hmm ... that's pretty duff code in exec.c, isn't it. Aside from the
question of errno unsafety, it's using elog where it really ought to be
using ereport, it's not taking any thought for the reported SQLSTATE,
etc. I'm hesitant to mess with it mere hours before the beta wrap,
but we really oughta improve that.
I wrote up a patch that makes src/common/exec.c do error reporting more
like other frontend/backend-common files (attached). Now that I've done
so, though, I'm having second thoughts. The thing that I don't like
about this is that it doubles the number of translatable strings created
by this file. While there's not *that* many of them, translators have to
deal with each one several times because this file is included by several
different frontend programs. So that seems like a rather high price to
pay to deal with what, at present, is a purely hypothetical hazard.
(Basically what this would protect against is elog_start changing errno,
which it doesn't.) Improving the errcode situation is somewhat useful,
but still maybe it's not worth the cost.
Another approach we could consider is keeping exec.c's one-off approach
to error handling and letting it redefine pg_prevent_errno_in_scope() as
empty. But that's ugly.
Or we could make the affected call sites work like this:
int save_errno = errno;
log_error(_("could not identify current directory: %s"),
strerror(save_errno));
which on the whole might be the most expedient thing.
regards, tom lane
Attachments:
make-exec.c-do-error-reporting-like-everyplace-else.patchtext/x-diff; charset=us-ascii; name=make-exec.c-do-error-reporting-like-everyplace-else.patchDownload+170-179
On Sun, May 27, 2018 at 4:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... So that seems like a rather high price to
pay to deal with what, at present, is a purely hypothetical hazard.
(Basically what this would protect against is elog_start changing errno,
which it doesn't.)
Hmm. It looks like errstart() preserves errno to protect %m not from
itself, but from the caller's other arguments to the elog facility.
That seems reasonable, but do we really need to prohibit direct use of
errno in expressions? The only rogue actor likely to trash errno is
you, the caller. I mean, if you call elog(LOG, "foo %d %d", errno,
fsync(bar)) it's obviously UB and your own fault, but who would do
anything like that? Or maybe I misunderstood the motivation.
Another approach we could consider is keeping exec.c's one-off approach
to error handling and letting it redefine pg_prevent_errno_in_scope() as
empty. But that's ugly.Or we could make the affected call sites work like this:
int save_errno = errno;
log_error(_("could not identify current directory: %s"),
strerror(save_errno));which on the whole might be the most expedient thing.
That was what I was going to propose, until I started wondering why we
need to do anything here.
--
Thomas Munro
http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes:
On Sun, May 27, 2018 at 4:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
(Basically what this would protect against is elog_start changing errno,
which it doesn't.)
Hmm. It looks like errstart() preserves errno to protect %m not from
itself, but from the caller's other arguments to the elog facility.
That seems reasonable, but do we really need to prohibit direct use of
errno in expressions? The only rogue actor likely to trash errno is
you, the caller. I mean, if you call elog(LOG, "foo %d %d", errno,
fsync(bar)) it's obviously UB and your own fault, but who would do
anything like that? Or maybe I misunderstood the motivation.
Right, the concern is really about things like
elog(..., f(x), strerror(errno));
If f() trashes errno --- perhaps only some of the time --- then this
is problematic. It's especially problematic because whether f() is
evaluated before or after strerror() is platform-dependent. So even
if the original author had tested things thoroughly, it might break
for somebody else.
The cases in exec.c all seem safe enough, but we have lots of examples
in the backend where we call nontrivial functions in the arguments of
elog/ereport. It doesn't take much to make one nontrivial either. If
memory serves, malloc() can trash errno on some platforms such as macOS,
so even just a palloc creates a hazard of a hard-to-reproduce problem.
At least in the case of ereport, all it takes to create a hazard is
more than one sub-function, eg this is risky:
ereport(..., errmsg(..., strerror(errno)), errdetail(...));
because errdetail() might run first and malloc some memory for its
constructed string.
So I think a blanket policy of "don't trust errno within the arguments"
is a good idea, even though it might be safe to violate it in the
existing cases in exec.c.
regards, tom lane
I wrote:
... It doesn't take much to make one nontrivial either. If
memory serves, malloc() can trash errno on some platforms such as macOS,
so even just a palloc creates a hazard of a hard-to-reproduce problem.
After digging around in the archives, the closest thing that we seem to
know for certain is that some versions of free() can trash errno, cf
/messages/by-id/E1UcmpR-0004nn-2i@wrigleys.postgresql.org
as a result of possibly calling madvise() which might or might not
succeed. But in the light of that knowledge, how much do you want
to bet that malloc() can't change errno? And there's the possibility
that a called function does both a palloc and a pfree ...
regards, tom lane
On Sun, May 27, 2018 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
At least in the case of ereport, all it takes to create a hazard is
more than one sub-function, eg this is risky:ereport(..., errmsg(..., strerror(errno)), errdetail(...));
because errdetail() might run first and malloc some memory for its
constructed string.So I think a blanket policy of "don't trust errno within the arguments"
is a good idea, even though it might be safe to violate it in the
existing cases in exec.c.
Right, malloc() is a hazard I didn't think about. I see that my local
malloc() makes an effort to save and restore errno around syscalls,
but even if all allocators were so thoughtful, which apparently they
aren't, there is also the problem that malloc itself can deliberately
set errno to ENOMEM per spec. I take your more general point that you
can't rely on anything we didn't write not trashing errno, even libc.
On Tue, May 22, 2018 at 4:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
I noticed another can of worms here, too: on Windows, doesn't use of
GetLastError() in elog/ereport have exactly the same hazard as errno?
Or is there some reason to think it can't change value during errstart()?
Yeah, on Windows the same must apply, not in errstart() itself but any
time you pass more than one value to elog() using expressions that
call functions we can't audit for last-error-stomping.
Out of curiosity I tried adding a GetLastError variable for Windows
(to hide the function of that name and break callers) to the earlier
experimental patch (attached). I had to give it an initial value to
get rid of a warning about an unused variable (by my reading of the
documentation, __pragma(warning(suppress:4101)) can be used in macros
(unlike #pragma) and should shut that warning up, but it doesn't work
for me, not sure why). Of course that produces many errors since we
do that all over the place:
https://ci.appveyor.com/project/macdice/postgres/build/1.0.184
--
Thomas Munro
http://www.enterprisedb.com
Attachments:
prevent-errno-v2.patchapplication/octet-stream; name=prevent-errno-v2.patchDownload+19-0
On 2018-May-27, Thomas Munro wrote:
Out of curiosity I tried adding a GetLastError variable for Windows
(to hide the function of that name and break callers) to the earlier
experimental patch (attached). I had to give it an initial value to
get rid of a warning about an unused variable (by my reading of the
documentation, __pragma(warning(suppress:4101)) can be used in macros
(unlike #pragma) and should shut that warning up, but it doesn't work
for me, not sure why). Of course that produces many errors since we
do that all over the place:https://ci.appveyor.com/project/macdice/postgres/build/1.0.184
Ouch.
This seems to say that we oughta assign GetLastError() to saved_errno
during errstart, then use %m in the errmsg() instead.
--
�lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
This seems to say that we oughta assign GetLastError() to saved_errno
during errstart, then use %m in the errmsg() instead.
No, because in some parts of the code, errno does mean something,
even in Windows builds.
I think the right fix is to leave %m alone, and instead:
(1) save GetLastError's result along with errno during errstart.
(2) provide a function, say errlastwinerror(), to be used where
elog/ereport calls currently call GetLastError():
elog(ERROR, "something went wrong: error code %lu",
errlastwinerror());
What it does, of course, is to reach into the current error stack
level and retrieve the saved result.
We could use some hack along the line of what Thomas suggested
to enforce that this function is used rather than GetLastError().
It's amusing to speculate about whether we could actually cause
GetLastError() appearing within elog/ereport to get mapped into
this function, thus removing the need to change any code in call
sites. But I suspect there's no adequately compiler-independent
way to do that.
I wonder whether we need to back-patch this. I don't recall
seeing any field reports of misleading Windows error codes,
but I wonder how many people even look those up ...
regards, tom lane
In the hopes of getting the cfbot un-stuck (it's currently trying to
test a known-not-to-work patch), here are updated versions of the two
live patches we have in this thread.
0001 is the patch I originally proposed to adjust printf archetypes.
0002 is Thomas's patch to blow up on errno references in ereport/elog,
plus changes in src/common/exec.c to prevent that from blowing up.
(I went with the minimum-footprint way, for now; making exec.c's
error handling generally nicer seems like a task for another day.)
I think 0002 is probably pushable, really. The unresolved issue about
0001 is whether it will create a spate of warnings on Windows builds,
and if so what to do about it. We might learn something from the
cfbot about that, but I think the full buildfarm is going to be the
only really authoritative answer.
There's also the matter of providing similar safety for GetLastError
calls, but I think that deserves to be a separate patch ... and I don't
really want to take point on it since I lack a Windows machine.
regards, tom lane
Attachments:
0001-get-compiler-warnings-for-misuse-of-percent-m-2.patchtext/x-diff; charset=us-ascii; name=0001-get-compiler-warnings-for-misuse-of-percent-m-2.patchDownload+54-53
0002-prevent-errno-in-ereport-3.patchtext/x-diff; charset=us-ascii; name=0002-prevent-errno-in-ereport-3.patchDownload+42-6
I wrote:
I think 0002 is probably pushable, really. The unresolved issue about
0001 is whether it will create a spate of warnings on Windows builds,
and if so what to do about it. We might learn something from the
cfbot about that, but I think the full buildfarm is going to be the
only really authoritative answer.
Ah, cfbot has a run already, and it reports no warnings or errors in
its Windows build.
At this point I'm inclined to push both of those patches so we can
see what the buildfarm makes of them.
regards, tom lane
I wrote:
At this point I'm inclined to push both of those patches so we can
see what the buildfarm makes of them.
So I did that, and while not all of the buildfarm has reported in,
enough of it has that I think we can draw conclusions. The only member
that's showing any new warnings, AFAICT, is jacana (gcc 4.9 on Windows).
It had no format-related warnings yesterday, but now it has a boatload
of 'em, and it appears that every single one traces to not believing
that printf and friends understand 'l' and 'z' length modifiers.
The reason for this seems to be that we unconditionally replace the
printf function family with snprintf.c on Windows, and port.h causes
those functions to be marked with pg_attribute_printf, which this
patch caused to mean just "printf" not "gnu_printf". So this gcc
evidently thinks the platform printf doesn't know 'l' and 'z'
(which may or may not be true in reality, but it's irrelevant)
and it complains.
There are also interesting warnings showing up in elog.c, such as
Aug 11 14:26:32 c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/utils/error/elog.c:807:2: warning: function might be possible candidate for 'ms_printf' format attribute [-Wsuggest-attribute=format]
I think what is happening here is that gcc notices that those functions
call appendStringInfoVA, which is now annotated with the printf
archetype not gnu_printf, so it decides that maybe we marked the elog.c
functions with the wrong archetype. I have no idea why it's suggesting
"ms_printf" though --- I can find nothing on the web that even admits
that gcc knows such an archetype.
So this is all pretty much of a mess. If we annotate the elog functions
differently from printf's annotation then we risk getting these complaints
in elog.c, but if we don't do that then we can't really describe their
semantics correctly. We could possibly mark the replacement snprintf
functions with gnu_printf, but that's a lie with respect to the very
point at issue about %m. Unless we were to teach snprintf.c about %m
... which probably wouldn't be hard, but I'm not sure I want to go there.
That line of thought leads to deciding that we should treat "printf
doesn't know %m" as a reason to use snprintf.c over the native printf;
and I think we probably do not want to do that, if only because the
native printf is probably more efficient than snprintf.c. (There are
other reasons to question that too: we probably can't tell without a
run-time test in configure, and even if we detect it correctly, gcc
might be misconfigured to believe the opposite thing about %m support
and hence warn, or fail to warn, anyway. clang at least seems to get
this wrong frequently.) But if we do not do such replacement then we
still end up wondering how to mark printf wrapper functions such as
appendStringInfoVA.
At this point I'm inclined to give up and revert 3a60c8ff8. It's not
clear that we can really make the warning situation better, as opposed
to just moving the warnings from one platform to another.
regards, tom lane
[...]
At this point I'm inclined to give up and revert 3a60c8ff8. It's not
clear that we can really make the warning situation better, as opposed
to just moving the warnings from one platform to another.
Indeed, there are hundreds of warnings around "pg_printf_attribute_m"
added with gcc 7.3.0 on by ubuntu 18.04 laptop, thanks to 3a60c8ff.
A revert would help.
--
Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes:
Indeed, there are hundreds of warnings around "pg_printf_attribute_m"
added with gcc 7.3.0 on by ubuntu 18.04 laptop, thanks to 3a60c8ff.
Oh? What warnings exactly? I would not expect any new warnings except
on platforms where gcc believes the local printf is non POSIX compliant,
which certainly ought not happen on any non-obsolete Linux.
regards, tom lane
I wrote:
So this is all pretty much of a mess. If we annotate the elog functions
differently from printf's annotation then we risk getting these complaints
in elog.c, but if we don't do that then we can't really describe their
semantics correctly. We could possibly mark the replacement snprintf
functions with gnu_printf, but that's a lie with respect to the very
point at issue about %m. Unless we were to teach snprintf.c about %m
... which probably wouldn't be hard, but I'm not sure I want to go there.
Actually ... the more I think about this, the less insane that idea seems.
Consider the following approach:
1. Teach src/port/snprintf.c about %m. While I've not written a patch
for this, it looks pretty trivial.
2. Teach configure to test for %m and if it's not there, use the
replacement snprintf. (Note: we're already forcing snprintf replacement
in cross-compiles, so the added run-time test isn't losing anything.)
3. Get rid of elog.c's hand-made substitution of %m strings, and instead
just let it pass the correct errno value down. (We'd likely need to do
some fooling in appendStringInfoVA and related functions to preserve
call-time errno, but that's not complicated, nor expensive.)
4. (Optional) Get rid of strerror(errno) calls in favor of %m, even in
frontend code.
Once we've done this, we have uniform printf semantics across all
platforms, which is kind of nice from a testing standpoint, as well as
being less of a cognitive load for developers. And we can stick with
the existing approach of using the gnu_printf archetype across the board;
that's no longer a lie for the snprintf.c code.
One objection to this is the possible performance advantage of the native
printf functions over snprintf.c. I did a bit of investigation of that
using the attached testbed, and found that the quality of implementation
of the native functions seems pretty variable:
RHEL6's glibc on x86_64 (this is just a comparison point, since we'd not
be replacing glibc's printf anyway):
snprintf time = 756.795 ms total, 0.000756795 ms per iteration
pg_snprintf time = 824.643 ms total, 0.000824643 ms per iteration
macOS High Sierra on x86_64:
snprintf time = 264.071 ms total, 0.000264071 ms per iteration
pg_snprintf time = 348.41 ms total, 0.00034841 ms per iteration
FreeBSD 11.0 on x86_64:
snprintf time = 628.873 ms total, 0.000628873 ms per iteration
pg_snprintf time = 606.684 ms total, 0.000606684 ms per iteration
OpenBSD 6.0 on x86_64 (same hardware as FreeBSD test):
snprintf time = 331.245 ms total, 0.000331245 ms per iteration
pg_snprintf time = 539.849 ms total, 0.000539849 ms per iteration
NetBSD 8.99 on armv6:
snprintf time = 2423.39 ms total, 0.00242339 ms per iteration
pg_snprintf time = 3769.16 ms total, 0.00376916 ms per iteration
So we would be taking a hit on most platforms, but I've not really
seen sprintf as a major component of very many profiles. Moreover,
at least for the elog/ereport use-case, we'd be buying back some
nontrivial part of that hit by getting rid of expand_fmt_string().
Also worth noting is that we've never made any effort at all to
micro-optimize snprintf.c, so maybe there's some gold to be mined
there to reduce the penalty.
A different objection, possibly more serious than the performance
one, is that if we get in the habit of using %m in frontend code
then at some point we'd inevitably back-patch such a usage. (Worse,
it'd pass testing on glibc platforms, only to fail elsewhere.)
I don't see a bulletproof answer to that except to back-patch this
set of changes, which might be a bridge too far.
Aside from the back-patching angle, though, this seems pretty
promising. Thoughts?
regards, tom lane
PS: here's the testbed I used for the above numbers. Feel free to
try other platforms or other test-case formats. Compile this with
something like
gcc -Wall -O2 -I pgsql/src/include -I pgsql/src/port timeprintf.c
(point the -I switches into a configured PG source tree); you might
need to add "-lrt" or some such to get clock_gettime(). Then run
with "./a.out 1000000" or so.