Day and month name localization uses wrong locale category

Started by Peter Eisentrautover 19 years ago26 messageshackers
Jump to latest
#1Peter Eisentraut
peter_e@gmx.net

In 8.2, utils/adt/formatting.c uses our NLS mechanism to localize day and
month names (I assume for use by to_char). But since this necessarily ties
the outcome to the LC_MESSAGES setting, this comes out inconsistently with
Unix locale behavior, e.g.,

pei@bell:~$ locale
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

pei@bell:~$ date +%A
Friday

pei@bell:~$ LC_MESSAGES=de_DE@euro date +%A
Friday

pei@bell:~$ LC_TIME=de_DE@euro date +%A
Freitag

Is there no API to get the localized names from the C library so that LC_TIME
takes effect?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

In reply to: Peter Eisentraut (#1)
Re: Day and month name localization uses wrong locale category

Peter Eisentraut wrote:

pei@bell:~$ date +%A
Friday

pei@bell:~$ LC_MESSAGES=de_DE@euro date +%A
Friday

pei@bell:~$ LC_TIME=de_DE@euro date +%A
Freitag

Is there no API to get the localized names from the C library so that LC_TIME
takes effect?

What about using strftime()? So we couldn't worry about gettext
translations; "all" is localized.
Why didn't I think it before? :-)

I'll try to code a patch today later if noone objects.

PS> I was thinking about changing this for the same reasons Peter
pointed out; I didn't have the time to do it before.

--
Euler Taveira de Oliveira
http://www.timbira.com/

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Euler Taveira de Oliveira (#2)
Re: Day and month name localization uses wrong locale category

Euler Taveira de Oliveira wrote:

What about using strftime()? So we couldn't worry about gettext
translations; "all" is localized.
Why didn't I think it before? :-)

I'll try to code a patch today later if noone objects.

How is this going?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

In reply to: Peter Eisentraut (#3)
Re: Day and month name localization uses wrong locale category

Peter Eisentraut wrote:

What about using strftime()? So we couldn't worry about gettext
translations; "all" is localized.
Why didn't I think it before? :-)

I'll try to code a patch today later if noone objects.

How is this going?

Finished. Sorry for the delay I had some trouble understanding how
backend treats the locale stuff (Neil pointed out the path).
Now TM mode is returning strftime() output. It would be nice if in the
future we change this to pg_strftime() but unfortunately the last one is
not i18n. :(

template1=# show lc_time;
lc_time
---------
pt_BR
(1 registro)

template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
---------------------------
Segunda, 20 Novembro 2006
(1 registro)

template1=# set lc_time to 'C';
SET
template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
--------------------------
Monday, 20 November 2006
(1 registro)

template1=# set lc_time to 'de_DE';
SET
template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
--------------------------
Montag, 20 November 2006
(1 registro)

template1=#

Comments?

--
Euler Taveira de Oliveira
http://www.timbira.com/

Attachments:

tm.difftext/plain; charset=us-asciiDownload+163-283
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Euler Taveira de Oliveira (#4)
Re: Day and month name localization uses wrong locale category

Euler Taveira de Oliveira <euler@timbira.com> writes:

+ /*
+  * Return the LC_TIME information
+  */
+ char *
+ pg_get_lc_time(void)
+ {
+ 	return locale_time;
+ }

locale_time is a global GUC variable, so there is surely no point in the
above function. I have not looked at the rest of the patch.

regards, tom lane

In reply to: Tom Lane (#5)
Re: Day and month name localization uses wrong locale category

Tom Lane wrote:

+ /*
+  * Return the LC_TIME information
+  */
+ char *
+ pg_get_lc_time(void)
+ {
+ 	return locale_time;
+ }

locale_time is a global GUC variable, so there is surely no point in the
above function. I have not looked at the rest of the patch.

I know that. If I didn't use it how could i know what is the current
LC_TIME setting? The LC_TIME in backend is always C so I need to change
it to xx_XX briefly, do the job (strftime) and then get it back to C. Am
I wrong? That's what I do.

--
Euler Taveira de Oliveira
http://www.timbira.com/

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Euler Taveira de Oliveira (#6)
Re: Day and month name localization uses wrong locale category

Euler Taveira de Oliveira <euler@timbira.com> writes:

Tom Lane wrote:

locale_time is a global GUC variable, so there is surely no point in the
above function. I have not looked at the rest of the patch.

I know that. If I didn't use it how could i know what is the current
LC_TIME setting?

You just look at the variable directly. While there's sometimes value
in an encapsulation function, I fail to see any here.

regards, tom lane

In reply to: Tom Lane (#7)
Re: Day and month name localization uses wrong locale category

Tom Lane wrote:

You just look at the variable directly. While there's sometimes value
in an encapsulation function, I fail to see any here.

Oh, my :-) That's the consequence to not sleep at least a little at
night.
The attached patch, corrects what was pointed out by Tom (thanks).

PS> going to bed right now :-)

--
Euler Taveira de Oliveira
http://www.timbira.com/

Attachments:

tm2.difftext/plain; charset=us-asciiDownload+150-283
#9Bruce Momjian
bruce@momjian.us
In reply to: Euler Taveira de Oliveira (#8)
Re: Day and month name localization uses wrong

Is this for 8.2?

---------------------------------------------------------------------------

Euler Taveira de Oliveira wrote:

Tom Lane wrote:

You just look at the variable directly. While there's sometimes value
in an encapsulation function, I fail to see any here.

Oh, my :-) That's the consequence to not sleep at least a little at
night.
The attached patch, corrects what was pointed out by Tom (thanks).

PS> going to bed right now :-)

--
Euler Taveira de Oliveira
http://www.timbira.com/

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In reply to: Bruce Momjian (#9)
Re: Day and month name localization uses wrong locale category

Bruce Momjian wrote:

Is this for 8.2?

This patch "fixes" (reimplements) a feature that was written for 8.2. So
I think it's a must-fix. That patch is not so huge or invasive.
Comments?

--
Euler Taveira de Oliveira
http://www.timbira.com/

#11Bruce Momjian
bruce@momjian.us
In reply to: Euler Taveira de Oliveira (#10)
Re: Day and month name localization uses wrong

Euler Taveira de Oliveira wrote:

Bruce Momjian wrote:

Is this for 8.2?

This patch "fixes" (reimplements) a feature that was written for 8.2. So
I think it's a must-fix. That patch is not so huge or invasive.
Comments?

Agreed, patch applied:

Fix to_char() locale handling to honor LC_TIME, not LC_MESSAGES.

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Euler Taveira de Oliveira (#10)
Re: Day and month name localization uses wrong locale category

Euler Taveira de Oliveira <euler@timbira.com> writes:

Bruce Momjian wrote:

Is this for 8.2?

This patch "fixes" (reimplements) a feature that was written for 8.2. So
I think it's a must-fix. That patch is not so huge or invasive.
Comments?

Exactly how bad could the consequences get if someone sets LC_TIME to a
value not encoding-compatible with the database encoding? One of the
reasons LC_MESSAGES is superuser-only is that you can PANIC the backend
by choosing an incompatible value --- will that happen now for LC_TIME
too?

I think it might be OK, because the reason for the PANIC in the bogus
message case is that the encoding-violation error happens recursively
inside error processing, and that shouldn't need to happen here. But
one thing we'll need to be damn sure of is that control can't get into
elog.c while we've got LC_TIME set to a non-C value, else the same
recursion scenario could occur due to log_line_prefix expansion.

regards, tom lane

#13Peter Eisentraut
peter_e@gmx.net
In reply to: Euler Taveira de Oliveira (#4)
Re: Day and month name localization uses wrong locale category

Am Dienstag, 21. November 2006 00:52 schrieb Euler Taveira de Oliveira:

Finished. Sorry for the delay I had some trouble understanding how
backend treats the locale stuff (Neil pointed out the path).
Now TM mode is returning strftime() output. It would be nice if in the
future we change this to pg_strftime() but unfortunately the last one is
not i18n. :(

What's concerning me about the way this is written is that it calls
setlocale() for each formatting instance, which will be very slow.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#13)
Re: Day and month name localization uses wrong locale category

Peter Eisentraut <peter_e@gmx.net> writes:

What's concerning me about the way this is written is that it calls
setlocale() for each formatting instance, which will be very slow.

Perhaps, the first time the info is needed, do setlocale(), ask strftime
for the 12+7 strings we need and save them away, then revert to C locale
and proceed from there.

regards, tom lane

#15Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#13)
Re: Day and month name localization uses wrong

Peter Eisentraut wrote:

Am Dienstag, 21. November 2006 00:52 schrieb Euler Taveira de Oliveira:

Finished. Sorry for the delay I had some trouble understanding how
backend treats the locale stuff (Neil pointed out the path).
Now TM mode is returning strftime() output. It would be nice if in the
future we change this to pg_strftime() but unfortunately the last one is
not i18n. :(

What's concerning me about the way this is written is that it calls
setlocale() for each formatting instance, which will be very slow.

Should we have it set from a guc hook on lc_time?

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#16Bruce Momjian
bruce@momjian.us
In reply to: Euler Taveira de Oliveira (#4)
Re: Day and month name localization uses wrong

It is too close to the RC1 release to apply this patch. I have added
documentation that "TM"'s locale is controlled by "lc_messages".

This has been saved for the 8.3 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Euler Taveira de Oliveira wrote:

Peter Eisentraut wrote:

What about using strftime()? So we couldn't worry about gettext
translations; "all" is localized.
Why didn't I think it before? :-)

I'll try to code a patch today later if noone objects.

How is this going?

Finished. Sorry for the delay I had some trouble understanding how
backend treats the locale stuff (Neil pointed out the path).
Now TM mode is returning strftime() output. It would be nice if in the
future we change this to pg_strftime() but unfortunately the last one is
not i18n. :(

template1=# show lc_time;
lc_time
---------
pt_BR
(1 registro)

template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
---------------------------
Segunda, 20 Novembro 2006
(1 registro)

template1=# set lc_time to 'C';
SET
template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
--------------------------
Monday, 20 November 2006
(1 registro)

template1=# set lc_time to 'de_DE';
SET
template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
--------------------------
Montag, 20 November 2006
(1 registro)

template1=#

Comments?

--
Euler Taveira de Oliveira
http://www.timbira.com/

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/bjm/difftext/x-diffDownload+2-2
/bjm/difftext/x-diffDownload+2-2
#17Bruce Momjian
bruce@momjian.us
In reply to: Euler Taveira de Oliveira (#4)
Re: Day and month name localization uses wrong

I now remember a new problem with this feature, irregardless of whether
we use 'lc_messages' or 'lc_time'.

The problem is having a function's output affected by a GUC variable.
If you create an expression index using the function, and later query
the index with a different GUC value, or you do inserts with different
GUC values, the index will not work.

I know we have had this problem in the past, but I can't remember if or
how we addressed it.

---------------------------------------------------------------------------

Euler Taveira de Oliveira wrote:

Peter Eisentraut wrote:

What about using strftime()? So we couldn't worry about gettext
translations; "all" is localized.
Why didn't I think it before? :-)

I'll try to code a patch today later if noone objects.

How is this going?

Finished. Sorry for the delay I had some trouble understanding how
backend treats the locale stuff (Neil pointed out the path).
Now TM mode is returning strftime() output. It would be nice if in the
future we change this to pg_strftime() but unfortunately the last one is
not i18n. :(

template1=# show lc_time;
lc_time
---------
pt_BR
(1 registro)

template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
---------------------------
Segunda, 20 Novembro 2006
(1 registro)

template1=# set lc_time to 'C';
SET
template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
--------------------------
Monday, 20 November 2006
(1 registro)

template1=# set lc_time to 'de_DE';
SET
template1=# select to_char(now(), 'TMDay, DD TMMonth YYYY');
to_char
--------------------------
Montag, 20 November 2006
(1 registro)

template1=#

Comments?

--
Euler Taveira de Oliveira
http://www.timbira.com/

[ Attachment, skipping... ]

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

--
Bruce Momjian bruce@momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#18Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Bruce Momjian (#17)
Re: Day and month name localization uses wrong

Bruce Momjian wrote:

I now remember a new problem with this feature, irregardless of whether
we use 'lc_messages' or 'lc_time'.

The problem is having a function's output affected by a GUC variable.
If you create an expression index using the function, and later query
the index with a different GUC value, or you do inserts with different
GUC values, the index will not work.

I know we have had this problem in the past, but I can't remember if or
how we addressed it.

Mark the function as volatile, which precludes from using it in a
functional index?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alvaro Herrera (#18)
Re: Day and month name localization uses wrong

Alvaro Herrera <alvherre@commandprompt.com> writes:

Bruce Momjian wrote:

The problem is having a function's output affected by a GUC variable.

Mark the function as volatile, which precludes from using it in a
functional index?

Stable, not volatile.

It looks like we have some of the variants marked stable already for
their dependence on TimeZone, but the dependence on lc_messages is new.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#19)
Re: Day and month name localization uses wrong

I wrote:

It looks like we have some of the variants marked stable already for
their dependence on TimeZone, but the dependence on lc_messages is new.

Actually, now that I look at it, *most* of the variants of to_char,
to_number, and friends have been broken on this score since day one.
There's been a dependency on LC_NUMERIC for the numeric variants all
along, but they're marked immutable :-(

regards, tom lane

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#22)
#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#14)
#26Bruce Momjian
bruce@momjian.us
In reply to: Euler Taveira de Oliveira (#4)