Get rid of translation strings that only contain punctuation

Started by David Rowley10 days ago9 messageshackers
Jump to latest
#1David Rowley
dgrowleyml@gmail.com

(Follow-on work from [1]/messages/by-id/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com)

We've got a few parts of the code that translate strings that contain
only a single punctuation character. I'm not a translator, but I
suspect that these would be tricky to deal with as such short strings
could be used for various different things, and if the required
translation was to differ between requirements, then you're out of
luck.

I looked at: git grep -A 1 "msgid \", \"" and I see French is the only
translation to do anything different with the ", " string, and only in
psql.

src/bin/psql/po/fr.po:msgid ", "
src/bin/psql/po/fr.po-msgstr " , "

This is used for suffixing "unique" or "unique nulls not distinct". I
adjusted the logic there to get rid of the short translation string.

Quite a few are new to v19: fd366065e (AmitK), 48efefa6c (AmitK),
0fc33b005 (PeterE)
The relation.c one is from v18: 8fcd80258 (AmitK)
The describe.c one is from v15: 94aa7cc5f (PeterE)

Should we get rid of these?

David

[1]: /messages/by-id/CAApHDvohYOdrvhVxXzCJNX_GYMSWBfjTTtB6hgDauEtZ8Nar2A@mail.gmail.com

Attachments:

get_rid_of_single_punctuation_char_translation_strings.patchapplication/octet-stream; name=get_rid_of_single_punctuation_char_translation_strings.patchDownload+17-58
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Rowley (#1)
Re: Get rid of translation strings that only contain punctuation

David Rowley <dgrowleyml@gmail.com> writes:

We've got a few parts of the code that translate strings that contain
only a single punctuation character. I'm not a translator, but I
suspect that these would be tricky to deal with as such short strings
could be used for various different things, and if the required
translation was to differ between requirements, then you're out of
luck.

Yeah. I concur with your feeling that a separate translatable string
containing just a punctuation mark is probably the Wrong Thing. But
just removing the translation marker doesn't fix the problem. You
need more extensive restructuring so that what needs to be translated
is a coherent message.

We previously discussed the append_tuple_value_detail case [1]/messages/by-id/227279.1775956328@sss.pgh.pa.us, and
I opined that the right fix was to change things so that what that
function produces is a string that doesn't need translation because
it matches SQL syntax for a row constructor. It doesn't look like
that's happened yet.

regards, tom lane

[1]: /messages/by-id/227279.1775956328@sss.pgh.pa.us

#3Peter Smith
smithpb2250@gmail.com
In reply to: David Rowley (#1)
Re: Get rid of translation strings that only contain punctuation

On Wed, Apr 22, 2026 at 10:30 AM David Rowley <dgrowleyml@gmail.com> wrote:

(Follow-on work from [1])

We've got a few parts of the code that translate strings that contain
only a single punctuation character. I'm not a translator, but I
suspect that these would be tricky to deal with as such short strings
could be used for various different things, and if the required
translation was to differ between requirements, then you're out of
luck.

I looked at: git grep -A 1 "msgid \", \"" and I see French is the only
translation to do anything different with the ", " string, and only in
psql.

src/bin/psql/po/fr.po:msgid ", "
src/bin/psql/po/fr.po-msgstr " , "

This is used for suffixing "unique" or "unique nulls not distinct". I
adjusted the logic there to get rid of the short translation string.

Quite a few are new to v19: fd366065e (AmitK), 48efefa6c (AmitK),
0fc33b005 (PeterE)
The relation.c one is from v18: 8fcd80258 (AmitK)
The describe.c one is from v15: 94aa7cc5f (PeterE)

Should we get rid of these?

This question overlaps with another thread of mine [1]/messages/by-id/CAHut+Pui7RaQ8OfJEVn2ry-ykjnGc+3ujsFmcHDFw9FsXw_tRw@mail.gmail.com.

There, I was told that a punctuation double-quote (") *should* be translated.

OTOH, I did not see why the comma separator (,) should be translated
-- my patch did so only to be the same as existing code.

======
[1]: /messages/by-id/CAHut+Pui7RaQ8OfJEVn2ry-ykjnGc+3ujsFmcHDFw9FsXw_tRw@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Smith (#3)
Re: Get rid of translation strings that only contain punctuation

Peter Smith <smithpb2250@gmail.com> writes:

On Wed, Apr 22, 2026 at 10:30 AM David Rowley <dgrowleyml@gmail.com> wrote:

Should we get rid of these?

This question overlaps with another thread of mine [1].
There, I was told that a punctuation double-quote (") *should* be translated.

It should be, but it *has to be translated as part of a coherent
message*. As the examples in [1]/messages/by-id/CAHut+Pui7RaQ8OfJEVn2ry-ykjnGc+3ujsFmcHDFw9FsXw_tRw@mail.gmail.com show, several languages translate
opening and closing double-quotes differently. So if you write _("\"")
there is zero hope of that being usefully translatable.

This all goes back to the translatability guideline about not
constructing messages out of parts [2]https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES. If you've got a single
punctuation mark as a separate string, you are violating both the
letter and the spirit of that guideline, and that has consequences
for translatability.

regards, tom lane

[1]: /messages/by-id/CAHut+Pui7RaQ8OfJEVn2ry-ykjnGc+3ujsFmcHDFw9FsXw_tRw@mail.gmail.com
[2]: https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES

#5Peter Smith
smithpb2250@gmail.com
In reply to: Tom Lane (#4)
Re: Get rid of translation strings that only contain punctuation

On Wed, Apr 22, 2026 at 11:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Smith <smithpb2250@gmail.com> writes:

On Wed, Apr 22, 2026 at 10:30 AM David Rowley <dgrowleyml@gmail.com> wrote:

Should we get rid of these?

This question overlaps with another thread of mine [1].
There, I was told that a punctuation double-quote (") *should* be translated.

It should be, but it *has to be translated as part of a coherent
message*. As the examples in [1] show, several languages translate
opening and closing double-quotes differently. So if you write _("\"")
there is zero hope of that being usefully translatable.

This all goes back to the translatability guideline about not
constructing messages out of parts [2]. If you've got a single
punctuation mark as a separate string, you are violating both the
letter and the spirit of that guideline, and that has consequences
for translatability.

regards, tom lane

[1] /messages/by-id/CAHut+Pui7RaQ8OfJEVn2ry-ykjnGc+3ujsFmcHDFw9FsXw_tRw@mail.gmail.com
[2] https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES

To my knowledge, we aren't violating that guideline because our
substituted parts aren't words of a sentence; they are quoted names in
a list

e.g.

Case#1: publication "XXX" has a problem

Case#2: the following publications have a problem: "XXX", "YYY", "ZZZ"

~~~

Case#1 is easy. "publication \"%s\" has a problem"
The quotes are part of the message, so they get translated as normal.

Case#2 is more fiddly. "the following publications have a problem: %s"
The substituted quoted-name list is constructed at runtime, but still,
we require those quotes to be translated so that quoted-names in cases
#1 and #2 look the same.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Smith (#5)
Re: Get rid of translation strings that only contain punctuation

Peter Smith <smithpb2250@gmail.com> writes:

Case#1: publication "XXX" has a problem
Case#2: the following publications have a problem: "XXX", "YYY", "ZZZ"

Entirely aside from the mechanics of producing the output,
I am not sure I buy that emitting that is a desirable goal.
It seems to be based on an English-centric notion that singular
and indefinitely-many plural are the only two categories.
This is incorrect (see the documentation for ngettext()).

Is there a good reason not to output a separate message for
each publication? If we need to throw an ereport(ERROR)
covering them all, maybe list them in separate sentences
in a DETAIL message.

regards, tom lane

#7Peter Smith
smithpb2250@gmail.com
In reply to: Tom Lane (#6)
Re: Get rid of translation strings that only contain punctuation

On Wed, Apr 22, 2026 at 12:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Peter Smith <smithpb2250@gmail.com> writes:

Case#1: publication "XXX" has a problem
Case#2: the following publications have a problem: "XXX", "YYY", "ZZZ"

Entirely aside from the mechanics of producing the output,
I am not sure I buy that emitting that is a desirable goal.
It seems to be based on an English-centric notion that singular
and indefinitely-many plural are the only two categories.
This is incorrect (see the documentation for ngettext()).

Those case#1 and case#2 were just illustrative. The real code is using
`errmsg_plural` and `errdetail_plural`, so I think that makes it ok
for languages that have multiple forms of "plural".

======
Kind Regards,
Peter Smith.
Fujitsu Australia

#8Amit Kapila
amit.kapila16@gmail.com
In reply to: Tom Lane (#2)
Re: Get rid of translation strings that only contain punctuation

On Wed, Apr 22, 2026 at 6:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

We previously discussed the append_tuple_value_detail case [1], and
I opined that the right fix was to change things so that what that
function produces is a string that doesn't need translation because
it matches SQL syntax for a row constructor. It doesn't look like
that's happened yet.

I'll look into your suggestions.

--
With Regards,
Amit Kapila.

#9Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: David Rowley (#1)
Re: Get rid of translation strings that only contain punctuation

On 2026-Apr-22, David Rowley wrote:

We've got a few parts of the code that translate strings that contain
only a single punctuation character. I'm not a translator, but I
suspect that these would be tricky to deal with as such short strings
could be used for various different things, and if the required
translation was to differ between requirements, then you're out of
luck.

Yeah.

I looked at: git grep -A 1 "msgid \", \"" and I see French is the only
translation to do anything different with the ", " string, and only in
psql.

Japanese also uses different punctuation characters, so I don't think we
should get rid of translating these characters.

Instead we should do what Tom says and integrate these characters into a
larger string. I showed one example in a nearby thread from Peter Smith
[1]: /messages/by-id/aeniYoOwCQmtWtQW@alvherre.pgsql
done in the same way.

As for the one in guc.c, I think what we should do is change
config_enum_get_options() to have an API similar to GetPublicationsStr:
instead of receiving prefix, suffix and separator, we should tell that
function that we're constructing a list to be used in as an SQL value
(GetConfigOptionValues), or one to be displayed to the user
(parse_and_validate_value); and have the function add the separators and
other decoration as needed, using the same technique. (The other prefix
"Available values: " can be added by the caller, I think, and maybe the
braces also, not sure.)

[1]: /messages/by-id/aeniYoOwCQmtWtQW@alvherre.pgsql

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/