Improving the ngettext() patch

Started by Tom Laneabout 17 years ago7 messageshackers

tgl@sss.pgh.pa.us

about 17 years ago

After looking through the current uses of ngettext(), I think that it
wouldn't be too difficult to modify the patch to address the concerns
I had about it. What I propose doing is to add an additional elog.h
function

errmsg_plural(const char *fmt_singular, const char *fmt_plural,
unsigned long n, ...)

and replace the current errmsg(ngettext(...)) calls with this.
Similarly add errdetail_plural to replace errdetail(ngettext(...)).
(We could also add errhint_plural and so on, but right offhand these
seem unlikely to be useful.) The advantage of doing this is that
we avoid double translation and eliminate the current kluge whereby
usages in PL code have to be different from usages anywhere else.

I don't feel a need to touch the usages in client programs (pg_dump and
so on). In principle the double-translation risk still exists there,
but it seems much less likely to be a real hazard because any one client
program has a *far* smaller pool of translatable messages than the
backend does. Also, there's only one active text domain in a client
program, so the problem of needing to use dngettext in special cases
doesn't exist.

There are a few usages of ngettext() in the backend that are not tied
to ereport calls, but I think they can be left as-is. There's no
double-translation risk, and with so few of them I don't see much of
a risk of wrongly copying the usage in PL code, either.

Also: one thought that came to me while looking at the existing usages
is that there are several places that are plural-ized that seem
completely pointless; why are we making our translators work
harder on them? For example

ereport(ERROR,
(errcode(ERRCODE_TOO_MANY_ARGUMENTS),
errmsg(ngettext("functions cannot have more than %d argument",
"functions cannot have more than %d arguments",
FUNC_MAX_ARGS),
FUNC_MAX_ARGS)));

It seems extremely far-fetched that FUNC_MAX_ARGS would ever be small
enough that it would make any language's special cases kick in. Or
how about this one:

#if 0
write_msg(modulename, ngettext("read %lu byte into lookahead buffer\n",
"read %lu bytes into lookahead buffer\n",
AH->lookaheadLen),
(unsigned long) AH->lookaheadLen);
#endif

I'm not sure why this debug support is still there at all, but surely
it's a crummy candidate for making translators sweat over. So I'd like
to revert these.

Comments, objections?

regards, tom lane

Sergey Burladyan

eshkinkot@gmail.com

about 17 years ago

In reply to: Tom Lane (#1)

Re: Improving the ngettext() patch

Tom Lane <tgl@sss.pgh.pa.us> writes:

ereport(ERROR,
(errcode(ERRCODE_TOO_MANY_ARGUMENTS),
errmsg(ngettext("functions cannot have more than %d argument",
"functions cannot have more than %d arguments",
FUNC_MAX_ARGS),
FUNC_MAX_ARGS)));

It seems extremely far-fetched that FUNC_MAX_ARGS would ever be small
enough that it would make any language's special cases kick in.

Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.

--
Sergey Burladyan

Noname

pg@thetdh.com

about 17 years ago

In reply to: Sergey Burladyan (#2)

Re: Improving the ngettext() patch

Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.

True. The rule IIRC is that except for 11-14 and for collective numerals, declination follows the last digit.

It would be possible to generalize declination via a language-specific message-selector function, especially if the number of numerical complements were limited to 1.

How awkward would it be to re-word the style of messages to avoid declination? For example, the Russian equivalent of "X rows" could be something like "#rows -- X".

David Hudson

Import Notes

Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Noname (#3)

Re: Improving the ngettext() patch

pg@thetdh.com writes:

Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.

True. The rule IIRC is that except for 11-14 and for collective numerals, declination follows the last digit.

Wow. So how does anyone represent that in the .po files? AFAICT the
notation the gettext machinery provides isn't really powerful enough
for this.

regards, tom lane

Aidan Van Dyk

aidan@highrise.ca

about 17 years ago

In reply to: Tom Lane (#4)

Re: Improving the ngettext() patch

* Tom Lane <tgl@sss.pgh.pa.us> [090604 10:22]:

pg@thetdh.com writes:

Russian plural forms for 100, 101, 102 etc. is different, as for 0, 1, 2.

True. The rule IIRC is that except for 11-14 and for collective numerals, declination follows the last digit.

Wow. So how does anyone represent that in the .po files? AFAICT the
notation the gettext machinery provides isn't really powerful enough
for this.

Well, the C/english "template" one includes just the msgid, and
msgid_plural string.

When the russian translators get to it, they make a russion .po which
contains (something like) the following in the msgid "" header:
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

And then they provide msgstr[0], msgstr[1], and msgstr[2] to fill the 3
slots that above plural-forms can use when translationg plural-form
strings.

It's all encapsulated in the gettext tools and libraries, and the C
(non-translated) base just always uses ngetttext(single, plural, n), and
ngettext will (if the compiled catalog has different plural-forms) use
whatever the catalog specifies, or fall back to the simple
n == 1 ? singular : plural
type choice when no translated catalog is available.

--
Aidan Van Dyk Create like a god,
aidan@highrise.ca command like a king,
http://www.highrise.ca/ work like a slave.

Tom Lane

tgl@sss.pgh.pa.us

about 17 years ago

In reply to: Aidan Van Dyk (#5)

Re: Improving the ngettext() patch

Aidan Van Dyk <aidan@highrise.ca> writes:

When the russian translators get to it, they make a russion .po which
contains (something like) the following in the msgid "" header:
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

Oh, I see. I didn't realize there was a mapping mechanism available
to the translator.

Okay, so the bottom line there is that there is some value in
pluralizing the messages about FUNC_MAX_ARGS --- I withdraw the
suggestion to undo that. Anyone wish to defend the ones that
are ifdef'd out?

regards, tom lane

Noname

pg@thetdh.com

about 17 years ago

In reply to: Tom Lane (#6)

Re: Improving the ngettext() patch

(Grrr, declension, not declination.)

"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 :n%10>=2 && n%10<=4 && (n%100<10 ||n%100>=20) ? 1 : 2;\n"

Thanks. The above (ignoring backslash-EOL) is the form recommended for Russian (inter alia(s)) in the Texinfo manual for gettext ("info gettext"). FWIW this might be an alternative:

"Plural-Forms: nplurals=3; plural=((n - 1) % 10) >= (5-1) || (((n - 1) % 100) <= (14-1) && ((n - 1) % 100) >= (11 - 1)) ? 2 : ((n - 1) % 10) == (1 - 1) ? 0 : 1;\n"

David Hudson

Import Notes

Resolved by subject fallback