pgsql: setlocale() on Windows doesn't work correctly if the locale name

Started by Heikki Linnakangasover 14 years ago12 messages
#1Heikki Linnakangas
heikki.linnakangas@iki.fi

setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots. There isn't much hope of Microsoft fixing it any time
soon, it's been like that for ages, so we better work around it. So, map a
few common Windows locale names known to cause problems to aliases that work.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/d5a7bf8c11c8b66c822bbb1a6c90e1a14425bd6e

Modified Files
--------------
src/bin/initdb/initdb.c | 89 +++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 82 insertions(+), 7 deletions(-)

#2Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Heikki Linnakangas (#1)
Re: pgsql: setlocale() on Windows doesn't work correctly if the locale name

(2011/04/16 2:56), Heikki Linnakangas wrote:

setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots.

As for apostrophes, isn't the cause that initdb loses the single quote
of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)

As the bug reporter mentions, initdb loses the single quote in reality.
Concretely speaking, scanstr() called from bootscanner.l loses it.
I'm not sure if it's suitable for the bootstrap code to call scanstr().

regards,
Hiroshi Inoue

Show quoted text

There isn't much hope of Microsoft fixing it any time
soon, it's been like that for ages, so we better work around it. So, map a
few common Windows locale names known to cause problems to aliases that work.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/d5a7bf8c11c8b66c822bbb1a6c90e1a14425bd6e

Modified Files
--------------
src/bin/initdb/initdb.c | 89 +++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 82 insertions(+), 7 deletions(-)

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#2)
Re: [HACKERS] Re: pgsql: setlocale() on Windows doesn't work correctly if the locale name

Hiroshi Inoue <inoue@tpf.co.jp> writes:

(2011/04/16 2:56), Heikki Linnakangas wrote:

setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots.

As for apostrophes, isn't the cause that initdb loses the single quote
of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)

As the bug reporter mentions, initdb loses the single quote in reality.
Concretely speaking, scanstr() called from bootscanner.l loses it.
I'm not sure if it's suitable for the bootstrap code to call scanstr().

Huh? Bootstrap mode just deals with the data found in
src/include/catalog/*.h. The locale names found by initdb.c are stuck
in there afterwards, using regular SQL commands. I don't know where the
problem really comes from, but I doubt the connection you're trying to
make above.

regards, tom lane

#4Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Tom Lane (#3)
Re: [HACKERS] Re: pgsql: setlocale() on Windows doesn't work correctly if the locale name

(2011/04/20 9:22), Tom Lane wrote:

Hiroshi Inoue<inoue@tpf.co.jp> writes:

(2011/04/16 2:56), Heikki Linnakangas wrote:

setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots.

As for apostrophes, isn't the cause that initdb loses the single quote
of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)

As the bug reporter mentions, initdb loses the single quote in reality.
Concretely speaking, scanstr() called from bootscanner.l loses it.
I'm not sure if it's suitable for the bootstrap code to call scanstr().

Huh? Bootstrap mode just deals with the data found in
src/include/catalog/*.h. The locale names found by initdb.c are stuck
in there afterwards, using regular SQL commands.

bootstrap_template1() in initdb runs the BKI script in bootstrap
mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
pg_database etc) in the BKI script are substituted by actual values
using replace_token(). Isn't it correct?
ISTM replace_token() takes care of nothing about single quotes
in its input values but the comment in scanstr() says
/*
* Note: if scanner is working right, unescaped
quotes can only
* appear in pairs, so there should be another
character.
*/

regards,
Hiroshi Inoue

Show quoted text

I don't know where the
problem really comes from, but I doubt the connection you're trying to
make above.

regards, tom lane

#5Andrew Dunstan
andrew@dunslane.net
In reply to: Hiroshi Inoue (#4)
Re: [HACKERS] Re: pgsql: setlocale() on Windows doesn't work correctly if the locale name

On 04/19/2011 09:42 PM, Hiroshi Inoue wrote:

bootstrap_template1() in initdb runs the BKI script in bootstrap
mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
pg_database etc) in the BKI script are substituted by actual values
using replace_token(). Isn't it correct?
ISTM replace_token() takes care of nothing about single quotes
in its input values but the comment in scanstr() says
/*
* Note: if scanner is working right, unescaped
quotes can only
* appear in pairs, so there should be another
character.
*/

That's perfectly true, but only one of the replaced locale names
contains a single quote mark. So clearly there's more going on here than
just the bug you're referring to. Heikki's commit message specifically
refers to dots in locale names, which shouldn't cause a problem of that
type, I believe.

cheers

andrew

#6Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Andrew Dunstan (#5)
Re: [HACKERS] Re: pgsql: setlocale() on Windows doesn't work correctly if the locale name

(2011/04/20 12:25), Andrew Dunstan wrote:

On 04/19/2011 09:42 PM, Hiroshi Inoue wrote:

bootstrap_template1() in initdb runs the BKI script in bootstrap
mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
pg_database etc) in the BKI script are substituted by actual values
using replace_token(). Isn't it correct?
ISTM replace_token() takes care of nothing about single quotes
in its input values but the comment in scanstr() says
/*
* Note: if scanner is working right, unescaped
quotes can only
* appear in pairs, so there should be another
character.
*/

That's perfectly true, but only one of the replaced locale names
contains a single quote mark. So clearly there's more going on here than
just the bug you're referring to. Heikki's commit message specifically
refers to dots in locale names, which shouldn't cause a problem of that
type, I believe.

Yes it's completely another issue as for dots.
I can find no concrete reference to problems about locale
names containing dots. Is the following an example?

In my environment (Windows Vista using VC8)

setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.

regards,
Hiroshi Inoue

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Hiroshi Inoue (#6)
Re: Re: [COMMITTERS] pgsql: setlocale() on Windows doesn't work correctly if the locale name

On 20.04.2011 06:48, Hiroshi Inoue wrote:

I can find no concrete reference to problems about locale
names containing dots. Is the following an example?

Yes.

In my environment (Windows Vista using VC8)

setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950

Interesting. According to Microsoft's documentation, the codes are
three-letter country codes specified by ISO-3166
(http://msdn.microsoft.com/en-us/library/cdax410z%28v=VS.100%29.aspx).
However, according to Wikipedia, MCO stands for Monaco, not Macau
(https://secure.wikimedia.org/wikipedia/en/wiki/ISO_3166-1_alpha-3).

So according to bug #5818, the problem with "People's Republic of China"
was different from "Hong Kong S.A.R.", "Macau S.A.R.", and "U.A.E.".
setlocale() handles apostrophe fine, but it's not escaped correctly in
the BKI file. I'll remove the "People's Republic of China" -> "China"
mapping I committed, and fix the escaping instead.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#6)
Re: Re: [COMMITTERS] pgsql: setlocale() on Windows doesn't work correctly if the locale name

Hiroshi Inoue <inoue@tpf.co.jp> writes:

In my environment (Windows Vista using VC8)

setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.

Interesting. This example suggests that maybe Windows' setlocale can
only cope with dot as introducing a codepage number. Are there any
cases where a dot works as part of the basic locale name?

regards, tom lane

#9Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Heikki Linnakangas (#7)
Re: Re: [COMMITTERS] pgsql: setlocale() on Windows doesn't work correctly if the locale name

(2011/04/20 15:30), Heikki Linnakangas wrote:

On 20.04.2011 06:48, Hiroshi Inoue wrote:

I can find no concrete reference to problems about locale
names containing dots. Is the following an example?

Yes.

In my environment (Windows Vista using VC8)

setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950

but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.

I see another issue for the behavior.

For example, the following code in src/backend/utis/adt/pg_locale.c
won't work as expected in case the current locale is Hong Kong, Macao or
UAE because the last setlocale() in the code would fail. I can
find such save & restore operations of locales in several places.

bool
check_locale(int category, const char *value)
{
char *save;
bool ret;

save = setlocale(category, NULL);
if (!save)
return false; /* won't happen, we hope */

/* save may be pointing at a modifiable scratch variable, see above */
save = pstrdup(save);

/* set the locale with setlocale, to see if it accepts it. */
ret = (setlocale(category, value) != NULL);

setlocale(category, save); /* assume this won't fail */
pfree(save);

return ret;
}

regards,
Hiroshi Inoue

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#9)
Re: Re: [COMMITTERS] pgsql: setlocale() on Windows doesn't work correctly if the locale name

Hiroshi Inoue <inoue@tpf.co.jp> writes:

I see another issue for the behavior.

For example, the following code in src/backend/utis/adt/pg_locale.c
won't work as expected in case the current locale is Hong Kong, Macao or
UAE because the last setlocale() in the code would fail. I can
find such save & restore operations of locales in several places.

Well, if Windows' setlocale is too brain-dead to accept its own output,
there's nothing to be done about it except to file a bug with Microsoft.
There isn't anything in the POSIX API that would let us avoid using
setlocale with a previous result value to restore the previous setting.

regards, tom lane

#11Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Tom Lane (#8)
Re: Re: [COMMITTERS] pgsql: setlocale() on Windows doesn't work correctly if the locale name

(2011/04/20 22:08), Tom Lane wrote:

Hiroshi Inoue<inoue@tpf.co.jp> writes:

In my environment (Windows Vista using VC8)

setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.

Interesting. This example suggests that maybe Windows' setlocale can
only cope with dot as introducing a codepage number.

ACP or OCP as well as codepage number seem to be allowed.

Are there any
cases where a dot works as part of the basic locale name?

Unfortunately I don't know any explanation how dots are allowed.

regards,
Hiroshi Inoue

#12Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Heikki Linnakangas (#7)
Re: Re: [COMMITTERS] pgsql: setlocale() on Windows doesn't work correctly if the locale name

(2011/04/20 15:30), Heikki Linnakangas wrote:

On 20.04.2011 06:48, Hiroshi Inoue wrote:

I can find no concrete reference to problems about locale
names containing dots. Is the following an example?

Yes.

In my environment (Windows Vista using VC8)

setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950

Interesting. According to Microsoft's documentation, the codes are
three-letter country codes specified by ISO-3166
(http://msdn.microsoft.com/en-us/library/cdax410z%28v=VS.100%29.aspx).
However, according to Wikipedia, MCO stands for Monaco, not Macau
(https://secure.wikimedia.org/wikipedia/en/wiki/ISO_3166-1_alpha-3).

Hmm Windows locale system seems to have an inconsistency and the same
country code (MCO) corresponds to different countries.
ZHM_MCO corresponds to Chinese (Traditional)_Macao S.A.R..950 whereas
FRM_MCO corresponds to French_Principality of Monaco.

regards,
Hiroshi Inoue