pgsql: setlocale() on Windows doesn't work correctly if the locale name
setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots. There isn't much hope of Microsoft fixing it any time
soon, it's been like that for ages, so we better work around it. So, map a
few common Windows locale names known to cause problems to aliases that work.
Branch
------
master
Details
-------
http://git.postgresql.org/pg/commitdiff/d5a7bf8c11c8b66c822bbb1a6c90e1a14425bd6e
Modified Files
--------------
src/bin/initdb/initdb.c | 89 +++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 82 insertions(+), 7 deletions(-)
(2011/04/16 2:56), Heikki Linnakangas wrote:
setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots.
As for apostrophes, isn't the cause that initdb loses the single quote
of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)
As the bug reporter mentions, initdb loses the single quote in reality.
Concretely speaking, scanstr() called from bootscanner.l loses it.
I'm not sure if it's suitable for the bootstrap code to call scanstr().
regards,
Hiroshi Inoue
Show quoted text
There isn't much hope of Microsoft fixing it any time
soon, it's been like that for ages, so we better work around it. So, map a
few common Windows locale names known to cause problems to aliases that work.Branch
------
masterDetails
-------
http://git.postgresql.org/pg/commitdiff/d5a7bf8c11c8b66c822bbb1a6c90e1a14425bd6eModified Files
--------------
src/bin/initdb/initdb.c | 89 +++++++++++++++++++++++++++++++++++++++++++----
1 files changed, 82 insertions(+), 7 deletions(-)
Hiroshi Inoue <inoue@tpf.co.jp> writes:
(2011/04/16 2:56), Heikki Linnakangas wrote:
setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots.
As for apostrophes, isn't the cause that initdb loses the single quote
of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)
As the bug reporter mentions, initdb loses the single quote in reality.
Concretely speaking, scanstr() called from bootscanner.l loses it.
I'm not sure if it's suitable for the bootstrap code to call scanstr().
Huh? Bootstrap mode just deals with the data found in
src/include/catalog/*.h. The locale names found by initdb.c are stuck
in there afterwards, using regular SQL commands. I don't know where the
problem really comes from, but I doubt the connection you're trying to
make above.
regards, tom lane
(2011/04/20 9:22), Tom Lane wrote:
Hiroshi Inoue<inoue@tpf.co.jp> writes:
(2011/04/16 2:56), Heikki Linnakangas wrote:
setlocale() on Windows doesn't work correctly if the locale name contains
apostrophes or dots.As for apostrophes, isn't the cause that initdb loses the single quote
of locale? ([BUGS] BUG #5818: initdb lose the single quote of locale)As the bug reporter mentions, initdb loses the single quote in reality.
Concretely speaking, scanstr() called from bootscanner.l loses it.
I'm not sure if it's suitable for the bootstrap code to call scanstr().Huh? Bootstrap mode just deals with the data found in
src/include/catalog/*.h. The locale names found by initdb.c are stuck
in there afterwards, using regular SQL commands.
bootstrap_template1() in initdb runs the BKI script in bootstrap
mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
pg_database etc) in the BKI script are substituted by actual values
using replace_token(). Isn't it correct?
ISTM replace_token() takes care of nothing about single quotes
in its input values but the comment in scanstr() says
/*
* Note: if scanner is working right, unescaped
quotes can only
* appear in pairs, so there should be another
character.
*/
regards,
Hiroshi Inoue
Show quoted text
I don't know where the
problem really comes from, but I doubt the connection you're trying to
make above.regards, tom lane
On 04/19/2011 09:42 PM, Hiroshi Inoue wrote:
bootstrap_template1() in initdb runs the BKI script in bootstrap
mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
pg_database etc) in the BKI script are substituted by actual values
using replace_token(). Isn't it correct?
ISTM replace_token() takes care of nothing about single quotes
in its input values but the comment in scanstr() says
/*
* Note: if scanner is working right, unescaped
quotes can only
* appear in pairs, so there should be another
character.
*/
That's perfectly true, but only one of the replaced locale names
contains a single quote mark. So clearly there's more going on here than
just the bug you're referring to. Heikki's commit message specifically
refers to dots in locale names, which shouldn't cause a problem of that
type, I believe.
cheers
andrew
(2011/04/20 12:25), Andrew Dunstan wrote:
On 04/19/2011 09:42 PM, Hiroshi Inoue wrote:
bootstrap_template1() in initdb runs the BKI script in bootstrap
mode to create template1. Some symbols (LC_COLLATE, LC_CTYPE in
pg_database etc) in the BKI script are substituted by actual values
using replace_token(). Isn't it correct?
ISTM replace_token() takes care of nothing about single quotes
in its input values but the comment in scanstr() says
/*
* Note: if scanner is working right, unescaped
quotes can only
* appear in pairs, so there should be another
character.
*/That's perfectly true, but only one of the replaced locale names
contains a single quote mark. So clearly there's more going on here than
just the bug you're referring to. Heikki's commit message specifically
refers to dots in locale names, which shouldn't cause a problem of that
type, I believe.
Yes it's completely another issue as for dots.
I can find no concrete reference to problems about locale
names containing dots. Is the following an example?
In my environment (Windows Vista using VC8)
setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.
regards,
Hiroshi Inoue
On 20.04.2011 06:48, Hiroshi Inoue wrote:
I can find no concrete reference to problems about locale
names containing dots. Is the following an example?
Yes.
In my environment (Windows Vista using VC8)
setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
Interesting. According to Microsoft's documentation, the codes are
three-letter country codes specified by ISO-3166
(http://msdn.microsoft.com/en-us/library/cdax410z%28v=VS.100%29.aspx).
However, according to Wikipedia, MCO stands for Monaco, not Macau
(https://secure.wikimedia.org/wikipedia/en/wiki/ISO_3166-1_alpha-3).
So according to bug #5818, the problem with "People's Republic of China"
was different from "Hong Kong S.A.R.", "Macau S.A.R.", and "U.A.E.".
setlocale() handles apostrophe fine, but it's not escaped correctly in
the BKI file. I'll remove the "People's Republic of China" -> "China"
mapping I committed, and fix the escaping instead.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Hiroshi Inoue <inoue@tpf.co.jp> writes:
In my environment (Windows Vista using VC8)
setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.
Interesting. This example suggests that maybe Windows' setlocale can
only cope with dot as introducing a codepage number. Are there any
cases where a dot works as part of the basic locale name?
regards, tom lane
(2011/04/20 15:30), Heikki Linnakangas wrote:
On 20.04.2011 06:48, Hiroshi Inoue wrote:
I can find no concrete reference to problems about locale
names containing dots. Is the following an example?Yes.
In my environment (Windows Vista using VC8)
setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.
I see another issue for the behavior.
For example, the following code in src/backend/utis/adt/pg_locale.c
won't work as expected in case the current locale is Hong Kong, Macao or
UAE because the last setlocale() in the code would fail. I can
find such save & restore operations of locales in several places.
bool
check_locale(int category, const char *value)
{
char *save;
bool ret;
save = setlocale(category, NULL);
if (!save)
return false; /* won't happen, we hope */
/* save may be pointing at a modifiable scratch variable, see above */
save = pstrdup(save);
/* set the locale with setlocale, to see if it accepts it. */
ret = (setlocale(category, value) != NULL);
setlocale(category, save); /* assume this won't fail */
pfree(save);
return ret;
}
regards,
Hiroshi Inoue
Hiroshi Inoue <inoue@tpf.co.jp> writes:
I see another issue for the behavior.
For example, the following code in src/backend/utis/adt/pg_locale.c
won't work as expected in case the current locale is Hong Kong, Macao or
UAE because the last setlocale() in the code would fail. I can
find such save & restore operations of locales in several places.
Well, if Windows' setlocale is too brain-dead to accept its own output,
there's nothing to be done about it except to file a bug with Microsoft.
There isn't anything in the POSIX API that would let us avoid using
setlocale with a previous result value to restore the previous setting.
regards, tom lane
(2011/04/20 22:08), Tom Lane wrote:
Hiroshi Inoue<inoue@tpf.co.jp> writes:
In my environment (Windows Vista using VC8)
setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950
but
setlocale(LC_XXXX, "Chinese (Traditional)_Macao S.A.R..950");
fails.Interesting. This example suggests that maybe Windows' setlocale can
only cope with dot as introducing a codepage number.
ACP or OCP as well as codepage number seem to be allowed.
Are there any
cases where a dot works as part of the basic locale name?
Unfortunately I don't know any explanation how dots are allowed.
regards,
Hiroshi Inoue
(2011/04/20 15:30), Heikki Linnakangas wrote:
On 20.04.2011 06:48, Hiroshi Inoue wrote:
I can find no concrete reference to problems about locale
names containing dots. Is the following an example?Yes.
In my environment (Windows Vista using VC8)
setlocale(LC_XXXX, "Chinese (Traditional)_MCO.950");
works and
setlocale(LC_XXXX, NULL);
returns
Chinese (Traditional)_Macao S.A.R..950Interesting. According to Microsoft's documentation, the codes are
three-letter country codes specified by ISO-3166
(http://msdn.microsoft.com/en-us/library/cdax410z%28v=VS.100%29.aspx).
However, according to Wikipedia, MCO stands for Monaco, not Macau
(https://secure.wikimedia.org/wikipedia/en/wiki/ISO_3166-1_alpha-3).
Hmm Windows locale system seems to have an inconsistency and the same
country code (MCO) corresponds to different countries.
ZHM_MCO corresponds to Chinese (Traditional)_Macao S.A.R..950 whereas
FRM_MCO corresponds to French_Principality of Monaco.
regards,
Hiroshi Inoue