64 bit numbers vs format strings

Started by Thomas Munroover 1 year ago21 messageshackers
Jump to latest
#1Thomas Munro
thomas.munro@gmail.com

Hi,

Having learned some things about gettext based on clues[1]/messages/by-id/b936d2fb-590d-49c3-a615-92c3a88c6c19@eisentraut.org from Peter
E, I decided to see what it would take to expunge all (long long) and
similar casts now that we're using the standard types with system
support.

The short version is tha given uint64 x:

Old: errmsg("hello %llu", (unsigned long long) x)
New: errmsg("hello %" PRIu64, x)

(And all other printf-like interfaces). That d can be x, X, u, etc
and you can put the usual stuff between % and the macro, so it's cut
up slightly differently than our own macros for that stuff.

You can't make your own macros for this, if you want to localise it at
least, because gettext() and the tools that scan your source tree only
treat the ones from <inttypes.h> specially.

65 files changed, 358 insertions(+), 367 deletions(-)

(Not counting the pgbench bits, that's non mechanical.)

Thoughts? I know it's a fair bit of code churn, but not quite as bad
as I was expecting. It would create an occasional back-patching
speed-bump, but it's mechanical at least. I wonder how often bug
fixes could conflict with this stuff. And no doubt .po file churn ...
I don't know what tooling is used for that sort of stuff but I imagine
that automated replacement might go a long way.

[1]: /messages/by-id/b936d2fb-590d-49c3-a615-92c3a88c6c19@eisentraut.org

Attachments:

0001-Use-PRI-64-instead-of-ll-in-format-strings.patchtext/x-patch; charset=US-ASCII; name=0001-Use-PRI-64-instead-of-ll-in-format-strings.patchDownload+358-368
0002-pgbench-Make-set_random_seed-64-bit-everywhere.patchtext/x-patch; charset=US-ASCII; name=0002-pgbench-Make-set_random_seed-64-bit-everywhere.patchDownload+3-8
0003-pgbench-Rationalize-types-in-parseScriptWeight.patchtext/x-patch; charset=US-ASCII; name=0003-pgbench-Rationalize-types-in-parseScriptWeight.patchDownload+4-5
0004-pgbench-Modernize-integer-parsing-routine.patchtext/x-patch; charset=US-ASCII; name=0004-pgbench-Modernize-integer-parsing-routine.patchDownload+11-62
#2Melanie Plageman
melanieplageman@gmail.com
In reply to: Thomas Munro (#1)
Re: 64 bit numbers vs format strings

On Thu, Dec 5, 2024 at 5:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:

Having learned some things about gettext based on clues[1] from Peter
E, I decided to see what it would take to expunge all (long long) and
similar casts now that we're using the standard types with system
support.

The short version is tha given uint64 x:

Old: errmsg("hello %llu", (unsigned long long) x)
New: errmsg("hello %" PRIu64, x)

(And all other printf-like interfaces). That d can be x, X, u, etc
and you can put the usual stuff between % and the macro, so it's cut
up slightly differently than our own macros for that stuff.

Yay!

I didn't look at the pgbench bits and don't know the answer to any of
the questions you asked in your mail (re difficulty introduced when
backporting etc), but big +1 from me on doing this.

So, will this fix the issue that when I do:

uint64 somevar = 0;
errmsg("hello %lu", somevar)

locally on my x86-64 linux machine running ubuntu with whatever gcc-11
comes out of the box, it compiles sans warnings, but in the
mingw_cross_warning task in CI, it warns with:

error: format ‘%lu’ expects argument of type ‘long unsigned int’, but
argument 2 has type ‘uint64’ {aka ‘long long unsigned int’}

- Melanie

#3Thomas Munro
thomas.munro@gmail.com
In reply to: Melanie Plageman (#2)
Re: 64 bit numbers vs format strings

On Fri, Dec 6, 2024 at 1:15 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:

So, will this fix the issue that when I do:

uint64 somevar = 0;
errmsg("hello %lu", somevar)

locally on my x86-64 linux machine running ubuntu with whatever gcc-11
comes out of the box, it compiles sans warnings, but in the
mingw_cross_warning task in CI, it warns with:

error: format ‘%lu’ expects argument of type ‘long unsigned int’, but
argument 2 has type ‘uint64’ {aka ‘long long unsigned int’}

Yes, just never use %l or %ll again for fixed width types.

My experience while hacking that up and pushing it through CI quite a
few times was that only off_t remains painful like that. The
CompileWarnings' mingw_cross_warning test blows up if you use PRId64
and then feed it an off_t, without casting it to (pgoff_t), because it
has sizeof(offt_t) == 4. It may be the only defence we have because:

* The CI Visual Studio build also has sizeof(off_t) == 4 but we
haven't decorated our printf stuff with whatever it needs to check
them so it doesn't care (if it even can, IDK).
* The BF animals testing Visual Studio are the same.
* The CI MinGW build that probably few ever run has sizeof(off_t) ==
8, because it's using meson and it sets -D_FIILE_OFFSET_BITS=64 and
that makes it do that, unlike mingw_cross_warning which is using
configure.
* The BF animal fairywren is the same (you can't tell from the
outside, meson doesn't define or print SIZEOF_OFF_T, maybe it should
to make things less mysterious).

Perhaps there is a clever way to "poison" off_t so that it is
incompatible with int64_t. For example, if it's 64 bit and if we can
figure out that int64_t is really long under the covers, make it long
long, and vice versa, so that you get a warning. But that's a bit
weird and might have other problems. Printing off_t is not too common
though so maybe mingw_cross_warning is enough...

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Thomas Munro (#1)
Re: 64 bit numbers vs format strings

On 2024-Dec-06, Thomas Munro wrote:

And no doubt .po file churn ...
I don't know what tooling is used for that sort of stuff but I imagine
that automated replacement might go a long way.

There's a msgfilter utility that can be used for automated updates. I
documented one usage a few years ago for another mechanical change
nightmare
/messages/by-id/20190428173733.GA12483@alvherre.pgsql
So it sounds like it might not be _that_ difficult.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu. Five minutes later I realize that it's also talking
about food" (Donald Knuth)

#5Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#1)
Re: 64 bit numbers vs format strings

On 05.12.24 23:18, Thomas Munro wrote:

Having learned some things about gettext based on clues[1] from Peter
E, I decided to see what it would take to expunge all (long long) and
similar casts now that we're using the standard types with system
support.

The short version is tha given uint64 x:

Old: errmsg("hello %llu", (unsigned long long) x)
New: errmsg("hello %" PRIu64, x)

I have committed the subset of this patch for pg_checksums.c so that the
translators and whoever else might be affected can try this out at small
scale. (I don't expect any particular problems.) Then we can move on
to the rest in a few weeks, I think.

#6Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#5)
Re: 64 bit numbers vs format strings

On Mon, Mar 3, 2025 at 6:21 AM Peter Eisentraut <peter@eisentraut.org> wrote:

On 05.12.24 23:18, Thomas Munro wrote:

Old: errmsg("hello %llu", (unsigned long long) x)
New: errmsg("hello %" PRIu64, x)

I have committed the subset of this patch for pg_checksums.c so that the
translators and whoever else might be affected can try this out at small
scale. (I don't expect any particular problems.) Then we can move on
to the rest in a few weeks, I think.

Good plan, thanks. Here's a rebase.

I also added one more patch that I expect to be more contentious as it
is a UX change. Why do we display LSNs with a slash? I believe there
are two reasons: (1) Back in 2000 didn't require the existence of a 64
bit type, so XLogRecPtr was a struct holding two uint32 values. The
author could still have used "%08X%08X" for both printf and scanf if
that was the only reason. (2) It initially had a real semantic
division into two parts log ID and log offset, which the author
apparently wanted to convey to readers. I didn't check the full
history but I think at some point our log segments (first 16MB, now
initdb-time variable) superseded the log ID concept, which I think
originally had 4GB segments? (I could also have had something to do
with the abandoned undo system's needs, IDK.) That leads to the idea
of ditching the slash and displaying them in the more obvious (to my
aesthetic, anyway, YMMV):

 SELECT pg_lsn(23783416::numeric);
-  pg_lsn
------------
- 0/16AE7F8
+      pg_lsn
+------------------
+ 00000000016AE7F8

And likewise wherever they appear or are parsed in tools, protocols,
command lines etc.

/me activates flame-proof force field

I realised while contemplating that that my treatment of pgoff_t might
not be quite right in the first patch. It casts off_t (eg from struct
stat) to (pgoff_t) and display as "%" PRId64, which is correct for
Windows where pgoff_t is a typedef to __int64 (actually int64_t would
be more logical, but presumably int64_t is __int64 on that system, not
sure whether that is truly a distinct type according to its native
compiler), but on non-Windows we use the system off_t whose printf
type is unknown to us, and might in theory be a different signed 64
bit type and provoke a warning from GCC/Clang printf attributes.
Perhaps we should define just pgoff_t as int64_t everywhere? There
are no warnings on any of our CI OSes so I assume that those OSes
coincidentally define off_t the same way they define int64_t. That
being the case we could just ignore it for now, but another system
using GCC/Clang printf attributes (eg illumos or the other BSDs) might
not happen to agree. Not done yet.

And one more thing like that: in a couple of places we see warnings on
macOS CI that I'd missed: when printing the result of i64abs() as
PRId64, because it happens to use labs() and it happens to define
int64_t as long long, and when printing a Datum as PRIx64, because
Datum is uintptr_t and it happens to define that as unsigned long. I
suppose we should cast to int64 in the definition of c.h's i64abs()
macro and a couple of similar things, and cast Datum to uint64 in that
one place that wants to print it out. Not done yet, so you can still
see this on macOS CI's build step.

Attachments:

v2-0001-Use-PRI-64-instead-of-ll-in-format-strings-contin.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Use-PRI-64-instead-of-ll-in-format-strings-contin.patchDownload+354-364
v2-0002-pgbench-Make-set_random_seed-64-bit-everywhere.patchtext/x-patch; charset=US-ASCII; name=v2-0002-pgbench-Make-set_random_seed-64-bit-everywhere.patchDownload+3-8
v2-0003-pgbench-Rationalize-types-in-parseScriptWeight.patchtext/x-patch; charset=US-ASCII; name=v2-0003-pgbench-Rationalize-types-in-parseScriptWeight.patchDownload+4-5
v2-0004-pgbench-Modernize-integer-parsing-routine.patchtext/x-patch; charset=US-ASCII; name=v2-0004-pgbench-Modernize-integer-parsing-routine.patchDownload+11-62
v2-0005-Remove-historical-slashes-from-LSN-format.patchtext/x-patch; charset=US-ASCII; name=v2-0005-Remove-historical-slashes-from-LSN-format.patchDownload+1098-1055
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#6)
Re: 64 bit numbers vs format strings

Thomas Munro <thomas.munro@gmail.com> writes:

I also added one more patch that I expect to be more contentious as it
is a UX change. Why do we display LSNs with a slash?

While there's surely little reason to do that anymore, I think the
blast radius of such a change will be vastly greater than is warranted
by aesthetics. It's not only our code that will be affected --- I'm
pretty sure there is a great deal of replication tooling out there
that this will break. Don't expect me to defend you from the
villagers with pitchforks.

regards, tom lane

#8Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#6)
Re: 64 bit numbers vs format strings

On 02.03.25 22:08, Thomas Munro wrote:

And one more thing like that: in a couple of places we see warnings on
macOS CI that I'd missed: when printing the result of i64abs() as
PRId64, because it happens to use labs() and it happens to define
int64_t as long long, and when printing a Datum as PRIx64, because
Datum is uintptr_t and it happens to define that as unsigned long. I
suppose we should cast to int64 in the definition of c.h's i64abs()
macro and a couple of similar things,

agreed

and cast Datum to uint64 in that
one place that wants to print it out.

Since Datum is uintptr_t, it should be printed using the format PRIxPTR.
Then it should work out.

#9Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#6)
Re: 64 bit numbers vs format strings

On 02.03.25 22:08, Thomas Munro wrote:

On Mon, Mar 3, 2025 at 6:21 AM Peter Eisentraut <peter@eisentraut.org> wrote:

On 05.12.24 23:18, Thomas Munro wrote:

Old: errmsg("hello %llu", (unsigned long long) x)
New: errmsg("hello %" PRIu64, x)

I have committed the subset of this patch for pg_checksums.c so that the
translators and whoever else might be affected can try this out at small
scale. (I don't expect any particular problems.) Then we can move on
to the rest in a few weeks, I think.

Good plan, thanks. Here's a rebase.

I think this went ok, and we can proceed here.

I looked through the v2-0001 patch in detail. Most of it is mechanical,
so no problems. I couple of issues you already mentioned:

- correct placeholder for Datum (uintptr_t)

- i64abs() definition needs return cast

- I don't think it's proper to assume that pgoff_t or off_t matches int64_t.

A few additional comments:

- In src/backend/access/transam/xlogreader.c, you change a cast that is
part of an arithmetic expression:

-                                     ((long long) total_len) - gotlen,
+                                     total_len - gotlen,

Is this no longer required to keep the math correct? Both total_len and
gotlen are uint32. Maybe this was meant to convert to signed arithmetic?

- In src/backend/backup/basebackup.c, you change

-static long long int total_checksum_failures;
+static int64 total_checksum_failures;

I don't think it is required, and I don't think it should be encouraged,
to expunge all uses of long long int, or something like that. I think
you should use long long int for "I need a big counter" and int64 when
you want to control the storage layout. Similar to how you might choose
int vs. int32. So I would leave this file alone.

- In src/bin/pg_verifybackup/astreamer_verify.c, you change the
signedness of some arguments, e.g., in member_verify_header():

         report_backup_error(mystreamer->context,
-                           "\"%s\" has size %llu in \"%s\" but size 
%llu in the manifest",
+                           "\"%s\" has size %" PRId64 " in \"%s\" but 
size %" PRId64 " in the manifest",

The first signedness change is correct (member->size is pgoff_t), but
the second is not (m->size is uint64).

I think it might be better to keep this patch as a mechanical change and
fix up the signedness issues separately. (There are actually a few more
that were previously hidden by casts but will now show up with something
like -Wformat-signedness.)

- In src/fe_utils/print.c, there is also a format change in the second
hunk, but if we're going to do that one, we should also make the same
change in the first hunk. Also, in the first hunk, the second format
should be %zu not %zd.

- In src/test/modules/libpq_pipeline/libpq_pipeline.c, you're changing
the shift base from 1LL (signed) to UINT64_C(1) (unsigned). This
appears to be a semantic change separate from this patch? But if this
change is desired, then the signedness of the format argument should
also be adjusted.

About the subsequent pgbench patches:

v2-0002: ok

v2-0003: Again, I'm not sure insisting on int64 use is needed here, and
I don't know that the existing code is incorrect. If we don't like
using "long", we could just switch to "long long" here.

v2-0004: ok

About the LSN format patch, I'm generally sympathetic about this, and I
think I've sort of asked for a similar change some years ago, but it's
probably not worth pursuing for this release (if ever).

#10Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#9)
Re: 64 bit numbers vs format strings

On Mon, Mar 10, 2025 at 10:49 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 02.03.25 22:08, Thomas Munro wrote:

Good plan, thanks. Here's a rebase.

I think this went ok, and we can proceed here.

Cool, I'll post a new patch soon, but first a question about this bit:

- I don't think it's proper to assume that pgoff_t or off_t matches int64_t.

So we should make pgoff_t a typedef for int64_t everywhere. It's a
bit annoying that we have to teach everyone to remember to use PRId64
to print it, though. How reasonable would it be to add an extra
filter into whatever script is used to run xgettext on our source
tree? It could replace a very small number of agreed useful tokens to
match some macros that we would also define in our tree, so that we
could write PRI_PGOFF_T in our messages, but xgettext would see PRId64
and still emit those magic %<PRId64> tokens that GNU/NetBSD/Solaris
gettext() know how to translate on the fly when loading message
catalogues. I'm not sure how many other candidates there would be,
not many (and LSN is obviously an attractive but thorny one...).

(For those who missed the reason why: I believe xgettext only treats
the <inttypes.h> macros with special portability gloves when used
directly, so if you wrapped them in your own macros and did nothing
else, you'd get the fully expanded macros as defined on the system
that runs xgettext. if I understood correctly. Concretely, if that's
a 64 bit glibc system where PRId64 is "ld", the resulting catalogues
wouldn't work on a Windows or 32 bit system where sizeof(long) <
sizeof(int64_t). You might be able to get away with it if you
hijacked those macros as seen by xgettext and made them all "lld"
everywhere since that's at least the right size on all known systems,
but that'd be a bit gross and not in the spirit of this exercise...)

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#10)
Re: 64 bit numbers vs format strings

On 15.03.25 03:42, Thomas Munro wrote:

- I don't think it's proper to assume that pgoff_t or off_t matches int64_t.

So we should make pgoff_t a typedef for int64_t everywhere. It's a
bit annoying that we have to teach everyone to remember to use PRId64
to print it, though.

The ramifications of such a change are not clear to me. I thought
pgoff_t is supposed to be off_t on Unix systems. If we change that,
then how will this affect pointer type arguments, function pointers,
etc. This seems to be a much larger problem than what this thread is
originally about.

I think we should leave pgoff_t the way it is (mostly?) done now: Cast
to long long int and print using %lld.

How reasonable would it be to add an extra
filter into whatever script is used to run xgettext on our source
tree? It could replace a very small number of agreed useful tokens to
match some macros that we would also define in our tree, so that we
could write PRI_PGOFF_T in our messages, but xgettext would see PRId64
and still emit those magic %<PRId64> tokens that GNU/NetBSD/Solaris
gettext() know how to translate on the fly when loading message
catalogues.

This is not really possible. The <PRIxxx> behavior is baked deeply into
the gettext code. (Also note that you don't only need support in
xgettext, which is part of our build system, but also in the runtime
library, which we don't control.)

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: 64 bit numbers vs format strings

Peter Eisentraut <peter@eisentraut.org> writes:

This is not really possible. The <PRIxxx> behavior is baked deeply into
the gettext code. (Also note that you don't only need support in
xgettext, which is part of our build system, but also in the runtime
library, which we don't control.)

Hmm, I find that comment fairly scary. How do we know that the
runtime library actually gets this right on every supported platform?
It's surely not because we test it, because we do not.

regards, tom lane

#13Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#12)
Re: 64 bit numbers vs format strings

On Mon, Mar 17, 2025 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Eisentraut <peter@eisentraut.org> writes:

This is not really possible. The <PRIxxx> behavior is baked deeply into
the gettext code. (Also note that you don't only need support in
xgettext, which is part of our build system, but also in the runtime
library, which we don't control.)

Hmm, I find that comment fairly scary. How do we know that the
runtime library actually gets this right on every supported platform?
It's surely not because we test it, because we do not.

I don't know too much about libintl and its history other than what
I've looked up for these discussions, but I can't find any other
implementations other than Sun's, GNU's and NetBSD's. Sun/Oracle and
NetBSD went out of their way to understand these and other GNUisms. I
am not sure if they should even be called "extensions"... extensions
to what? I guess the historical answer would have been "Sun's
version", but see below for a new development which raises
philosophical questions.

1. Solaris -- the original implementation has special support for the
things GNU's added, and certainly covers this <inttypes.h> stuff:

https://docs.oracle.com/cd/E36784_01/html/E39536/gnkbn.html#ILDEVgnosj

I just tried it out on a cfarm Solaris box (well I thought I already
knew this from an earlier round of discussions about this but wanted
to check again before replying and found my old test program still
there...). Note the "<PRId64>" in the catalogue:

tmunro@s11-sparc:~/gettext-hacking$ uname -a
SunOS s11-sparc.cfarm 5.11 11.4.78.189.2 sun4v sparc sun4v logical-domain
tmunro@s11-sparc:~/gettext-hacking$ tail -5 locales/fr/LC_MESSAGES/messages.po

#: test.c:8
#, c-format
msgid "the answer is %<PRId64>\n"
msgstr "la réponse est %<PRId64>\n"
tmunro@s11-sparc:~/gettext-hacking$ cat test.c
#include <inttypes.h>
#include <libintl.h>
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>

#define GETTEXT_DOMAIN "messages"
#define GETTEXT_OUTPUT_DIR "locales"

int
main()
{
setenv("LANGUAGE", "fr", 1);
setlocale(LC_ALL, "fr_FR.UTF-8");
bindtextdomain(GETTEXT_DOMAIN, GETTEXT_OUTPUT_DIR);
textdomain(GETTEXT_DOMAIN);
printf(gettext("the answer is %" PRId64 "\n"), (int64_t) 42);
}
tmunro@s11-sparc:~/gettext-hacking$ gcc test.c
tmunro@s11-sparc:~/gettext-hacking$ ./a.out
la réponse est 42

You can also see that stuff in the illumos source tree:

https://github.com/illumos/illumos-gate/blob/master/usr/src/lib/libc/port/i18n/gettext_gnu.c

2. NetBSD -- I haven't try it myself (I can send my test program if
you are interested) but it clearly knows about GNU's system-dependent
macros, and its stated goal was to be "based on the specifications
from GNU gettext":

https://wiki.netbsd.org/projects/project/libintl/
https://github.com/NetBSD/src/blob/trunk/lib/libintl/sysdep.c

What aspect of that might not work portably? Are there any other
implementations I'm missing? What standard would an implementation
follow, if it were to exist?

POSIX 2024 also finally standardised gettext() and associated tools.
I don't see these macros mentioned there (after an admittedly cursory
scan of the relevant sections), or for that matter any mention of the
portability problem they solve (perhaps I'll write in about that), but
it doesn't seem to make any sense to deprive ourselves of features
supported by all known implementations that solve a real problem, just
because a standard suddenly appeared retroactively rendering them
"extensions" in some sense. I mean, GNU is clearly functioning as a
of de facto standard of very long standing, which I think the POSIX
discussion[1]https://www.austingroupbugs.net/view.php?id=1122 acknowledged succinctly in the description field "POSIX
defines catgets() but most software rather uses gettext()". I don't
think I've ever seen catgets() in several decades around C and Unix.

(Amusingly the GNU maintainer showed up to say (paraphrasing) "don't
do it", and (paraphrasing) "if you want to solve a problem that we
actually have why don't you add all the missing _l function so we can
write portable multithreaded programs". Hear hear!)

[1]: https://www.austingroupbugs.net/view.php?id=1122

#14Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#11)
Re: 64 bit numbers vs format strings

On Mon, Mar 17, 2025 at 8:03 PM Peter Eisentraut <peter@eisentraut.org> wrote:

On 15.03.25 03:42, Thomas Munro wrote:

So we should make pgoff_t a typedef for int64_t everywhere. It's a
bit annoying that we have to teach everyone to remember to use PRId64
to print it, though.

The ramifications of such a change are not clear to me. I thought
pgoff_t is supposed to be off_t on Unix systems. If we change that,
then how will this affect pointer type arguments, function pointers,
etc. This seems to be a much larger problem than what this thread is
originally about.

I think we should leave pgoff_t the way it is (mostly?) done now: Cast
to long long int and print using %lld.

WFM.

How reasonable would it be to add an extra
filter into whatever script is used to run xgettext on our source
tree? It could replace a very small number of agreed useful tokens to
match some macros that we would also define in our tree, so that we
could write PRI_PGOFF_T in our messages, but xgettext would see PRId64
and still emit those magic %<PRId64> tokens that GNU/NetBSD/Solaris
gettext() know how to translate on the fly when loading message
catalogues.

This is not really possible. The <PRIxxx> behavior is baked deeply into
the gettext code. (Also note that you don't only need support in
xgettext, which is part of our build system, but also in the runtime
library, which we don't control.)

Hmm, but that's why I was asking about filtering the source *before*
xgettext sees it, but it sounds like I may still be confused about how
that works and I'm very happy to abandon that idea and leave those
bits unchanged. Will update the patch shortly to incorporate your
other feedback.

Thanks!

#15Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#13)
Re: 64 bit numbers vs format strings

On Mon, Mar 17, 2025 at 11:52 PM Thomas Munro <thomas.munro@gmail.com> wrote:

tmunro@s11-sparc:~/gettext-hacking$ gcc test.c
tmunro@s11-sparc:~/gettext-hacking$ ./a.out
la réponse est 42

And just to be paranoid, I checked a few more things: the .mo
definitely contains the literal "PRId64" (rearranged as
"^@PRId64^@the answer is %") and it's definitely using gettext() from
libc and not somehow automatically finding a GNU library in some
search path. (And woop, this cfarm Sun box has received the new
preadv()/pwritev() in its libc, that they added for PostgreSQL.)

And since I remembered that I had a NetBSD vagrant VM handy from
investigating Champion's libpq troubles the other day:

[vagrant@netbsd9 gettext-hacking]$ cc test.c -lintl
[vagrant@netbsd9 gettext-hacking]$ ldd a.out
a.out:
-lintl.1 => /usr/lib/libintl.so.1
-lc.12 => /usr/lib/libc.so.12
[vagrant@netbsd9 gettext-hacking]$ ./a.out
la réponse est 42

Not that I had much doubt but I checked that the library is indeed the
NetBSD code and not somehow GNU code, based on clearly identifiable
strings.

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Munro (#13)
Re: 64 bit numbers vs format strings

Thomas Munro <thomas.munro@gmail.com> writes:

On Mon, Mar 17, 2025 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm, I find that comment fairly scary. How do we know that the
runtime library actually gets this right on every supported platform?

I don't know too much about libintl and its history other than what
I've looked up for these discussions, but I can't find any other
implementations other than Sun's, GNU's and NetBSD's. Sun/Oracle and
NetBSD went out of their way to understand these and other GNUisms.

Okay, that reduces the size of the problem considerably.

2. NetBSD -- I haven't try it myself (I can send my test program if
you are interested)

I'd be happy to try it, but I see downthread that you already did,
so that seems unnecessary.

I still wonder if we shouldn't have more than zero testing of our
NLS behavior, but that's just a generalized worry not a concern
over any specific feature.

regards, tom lane

#17Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Munro (#6)
Re: 64 bit numbers vs format strings

On 02.03.25 22:08, Thomas Munro wrote:

And one more thing like that: in a couple of places we see warnings on
macOS CI that I'd missed: when printing the result of i64abs() as
PRId64, because it happens to use labs() and it happens to define
int64_t as long long, [...]. I
suppose we should cast to int64 in the definition of c.h's i64abs()
macro

I have committed a fix for that.

#18Peter Eisentraut
peter_e@gmx.net
In reply to: Peter Eisentraut (#9)
Re: 64 bit numbers vs format strings

On 10.03.25 10:49, Peter Eisentraut wrote:

On 02.03.25 22:08, Thomas Munro wrote:

On Mon, Mar 3, 2025 at 6:21 AM Peter Eisentraut <peter@eisentraut.org>
wrote:

On 05.12.24 23:18, Thomas Munro wrote:

      Old: errmsg("hello %llu", (unsigned long long) x)
      New: errmsg("hello %" PRIu64, x)

I have committed the subset of this patch for pg_checksums.c so that the
translators and whoever else might be affected can try this out at small
scale.  (I don't expect any particular problems.)  Then we can move on
to the rest in a few weeks, I think.

Good plan, thanks.  Here's a rebase.

I think this went ok, and we can proceed here.

I looked through the v2-0001 patch in detail.  Most of it is mechanical,
so no problems.  I couple of issues you already mentioned:

I have committed v2-0001, omitting the parts that I had flagged in my
review. I have also committed v2-0002. From my perspective, this can
conclude this thread.

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Peter Eisentraut (#18)
Re: 64 bit numbers vs format strings

On Sun, Mar 30, 2025 at 6:24 AM Peter Eisentraut <peter@eisentraut.org> wrote:

I have committed v2-0001, omitting the parts that I had flagged in my
review. I have also committed v2-0002. From my perspective, this can
conclude this thread.

Thank you! Fingers crossed that the translation updates go as
smoothly as predicted.

#20Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#16)
Re: 64 bit numbers vs format strings

On Tue, Mar 18, 2025 at 4:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@gmail.com> writes:

On Mon, Mar 17, 2025 at 8:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hmm, I find that comment fairly scary. How do we know that the
runtime library actually gets this right on every supported platform?

I don't know too much about libintl and its history other than what
I've looked up for these discussions, but I can't find any other
implementations other than Sun's, GNU's and NetBSD's. Sun/Oracle and
NetBSD went out of their way to understand these and other GNUisms.

Okay, that reduces the size of the problem considerably.

For the record, I found one more hiding on Alpine Linux. It has two
implementations available:

1. You can install the regular GNU library and tools with "gettext"
and "gettext-dev". Then I assume this all just works.

2. There is a cleanroom implementation reachable with "musl-libintl".
It supplies a different /usr/include/libintl.h that activates an
implementation inside musl itself, and IIRC "gettext-tiny" gives you
msgfmt etc. I bet this new 64-bit stuff doesn't work though: they do
support %<PRIu64> etc, but the lookup table[1]https://github.com/sabotage-linux/gettext-tiny/blob/master/src/poparser.c seems a little on the
short side for our usage. gettext-tiny's README.md also explains that
they pre-chew them in msgfmt, so GNU/non-GNU combinations probably
break once you start using these macros, if they ever worked.

It looks like the packages and docker scripts people use to run
PostgreSQL on Alpine don't enable nls anyway, so I doubt we'll hear
anything about this from the field. It might still be interesting to
know which msgfmt/libintl pair the BF animals are using (CC Wolfgang),
and whether this stuff actually works. If not, installing "gettext"
and "gettext-dev" would likely be the solution, though if the musl
version already works for %<PRIu64>, perhaps a small PR could get the
other variants to work too?

[1]: https://github.com/sabotage-linux/gettext-tiny/blob/master/src/poparser.c

#21Wolfgang Walther
walther@technowledgy.de
In reply to: Thomas Munro (#20)