pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Started by Nonameover 17 years ago31 messages
#1Noname
mha@postgresql.org

Log Message:
-----------
Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Fixes crash with translated error messages per bugs 4180
and 4196.

Tom Lane

Modified Files:
--------------
pgsql/src/backend/utils/mb:
mbutils.c (r1.70 -> r1.71)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/mbutils.c?r1=1.70&r2=1.71)

#2Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Noname (#1)
1 attachment(s)
Re: pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hi Magnus and all,

Magnus Hagander wrote:

Log Message:
-----------
Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Thanks to this change UTF-8 case was solved but Japanese users
are still unhappy with Windows databases with EUC_JP encoding.
Shift_JIS which is a Japanese encoding under Windows doesn't
match any server encoding and causes a crash with the use of
gettext. So Saito-san removed ja message catalog just before
the 8.3 release.

Attached is a simple patch to avoid the crash and enable the
use of Japanese message catalog.
Please apply the patch if there's no problem.

regards,
Hiroshi Inoue

Show quoted text

Fixes crash with translated error messages per bugs 4180
and 4196.

Tom Lane

Modified Files:
--------------
pgsql/src/backend/utils/mb:
mbutils.c (r1.70 -> r1.71)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/mbutils.c?r1=1.70&r2=1.71)

Attachments:

mbutils.patchtext/plain; name=mbutils.patchDownload
*** mbutils.c.orig	Sun Nov 23 08:42:57 2008
--- mbutils.c	Sun Nov 23 13:52:41 2008
***************
*** 825,830 ****
--- 825,832 ----
  void
  SetDatabaseEncoding(int encoding)
  {
+ 	const char *target_codeset = NULL;
+ 
  	if (!PG_VALID_BE_ENCODING(encoding))
  		elog(ERROR, "invalid database encoding: %d", encoding);
  
***************
*** 846,852 ****
  	 */
  #ifdef ENABLE_NLS
  	if (encoding == PG_UTF8)
! 		if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
  			elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
--- 848,860 ----
  	 */
  #ifdef ENABLE_NLS
  	if (encoding == PG_UTF8)
! 		target_codeset = "UTF-8";
! #ifdef	WIN32
! 	else if (encoding == PG_EUC_JP)
! 		target_codeset = "EUC_JP";
! #endif /* WIN32 */
! 	if (NULL != target_codeset)
! 		if (bind_textdomain_codeset("postgres", target_codeset) == NULL)
  			elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
#3Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Noname (#1)
Re: pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hi.

It was comfortable in my environment. Then, as for me, to be contained
in the next release is desire.

BTW, I was wondering that it did not look at a problem in any countries
other than Japan. Probably, the environment with the runs locale which
can't be specified to be SERVER_ENCODING will be required.

Regards,
Hiroshi Saito

----- Original Message -----
From: "Hiroshi Inoue" <inoue@tpf.co.jp>

Hi Magnus and all,

Magnus Hagander wrote:

Log Message:
-----------
Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Thanks to this change UTF-8 case was solved but Japanese users
are still unhappy with Windows databases with EUC_JP encoding.
Shift_JIS which is a Japanese encoding under Windows doesn't
match any server encoding and causes a crash with the use of
gettext. So Saito-san removed ja message catalog just before
the 8.3 release.

Attached is a simple patch to avoid the crash and enable the
use of Japanese message catalog.
Please apply the patch if there's no problem.

regards,
Hiroshi Inoue

Fixes crash with translated error messages per bugs 4180
and 4196.

Tom Lane

Modified Files:
--------------
pgsql/src/backend/utils/mb:
mbutils.c (r1.70 -> r1.71)

(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/mbutils.c?r1=1.70&amp;r2=1.71)

--------------------------------------------------------------------------------

*** mbutils.c.orig Sun Nov 23 08:42:57 2008
--- mbutils.c Sun Nov 23 13:52:41 2008
***************
*** 825,830 ****
--- 825,832 ----
void
SetDatabaseEncoding(int encoding)
{
+ const char *target_codeset = NULL;
+
if (!PG_VALID_BE_ENCODING(encoding))
elog(ERROR, "invalid database encoding: %d", encoding);
***************
*** 846,852 ****
*/
#ifdef ENABLE_NLS
if (encoding == PG_UTF8)
! if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
elog(LOG, "bind_textdomain_codeset failed");
#endif
}
--- 848,860 ----
*/
#ifdef ENABLE_NLS
if (encoding == PG_UTF8)
! target_codeset = "UTF-8";
! #ifdef WIN32
! else if (encoding == PG_EUC_JP)
! target_codeset = "EUC_JP";
! #endif /* WIN32 */
! if (NULL != target_codeset)
! if (bind_textdomain_codeset("postgres", target_codeset) == NULL)
elog(LOG, "bind_textdomain_codeset failed");
#endif
}

--------------------------------------------------------------------------------

Show quoted text

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

#4Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Inoue (#2)
Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

Hi Magnus and all,

Magnus Hagander wrote:

Log Message:
-----------
Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Thanks to this change UTF-8 case was solved but Japanese users
are still unhappy with Windows databases with EUC_JP encoding.
Shift_JIS which is a Japanese encoding under Windows doesn't
match any server encoding and causes a crash with the use of
gettext. So Saito-san removed ja message catalog just before
the 8.3 release.

Attached is a simple patch to avoid the crash and enable the
use of Japanese message catalog.
Please apply the patch if there's no problem.

Hi!

It will clearly also need an update to the comment, but I can take care
of that.

I assume you have tested this? The comment says that it works because we
are handling UTF8 on a special way on Windows, but AFAIK we *don't*
handle EUC_JP in a special way there?

If your database is in EUC_JP, I don't see why gettext() isn't picking
it up properly in the first place..

And why do we need that on Windows only, and not on other platforms?

//Magnus

#5Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Magnus Hagander (#4)
Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander wrote:

Hiroshi Inoue wrote:

Hi Magnus and all,

Magnus Hagander wrote:

Log Message:
-----------
Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Thanks to this change UTF-8 case was solved but Japanese users
are still unhappy with Windows databases with EUC_JP encoding.
Shift_JIS which is a Japanese encoding under Windows doesn't
match any server encoding and causes a crash with the use of
gettext. So Saito-san removed ja message catalog just before
the 8.3 release.

Attached is a simple patch to avoid the crash and enable the
use of Japanese message catalog.
Please apply the patch if there's no problem.

Hi!

It will clearly also need an update to the comment, but I can take care
of that.

I assume you have tested this?

Though I myself didn't test it, Saito-san tested it.

The comment says that it works because we
are handling UTF8 on a special way on Windows,

ISTM UTF-8 isn't a special case.
In fact the comment also mentions the following.

* In future we might want to call bind_textdomain_codeset
* unconditionally, but that....

If your database is in EUC_JP, I don't see why gettext() isn't picking
it up properly in the first place..

In Japan 2 encodings (EUC_JP and Shift_JIS) are often used.
EUC_JP is mainly used on *nix and on Windows Shift_JIS is
used. We use EUC_JP not Shift_JIS as the server encoding

And why do we need that on Windows only, and not on other platforms?

because Shift_JIS isn't allowed as a server encoding. So
the Japanese Windows native message encoding Shift_JIS never
matches the server encoding EUC_JP and a conversion between
Shitt_jis and EUC_JP is necessarily needed.

regards,
Hiroshi Inoue

#6Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Inoue (#5)
Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

Magnus Hagander wrote:

Hiroshi Inoue wrote:

Hi Magnus and all,

Magnus Hagander wrote:

Log Message:
-----------
Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Thanks to this change UTF-8 case was solved but Japanese users
are still unhappy with Windows databases with EUC_JP encoding.
Shift_JIS which is a Japanese encoding under Windows doesn't
match any server encoding and causes a crash with the use of
gettext. So Saito-san removed ja message catalog just before
the 8.3 release.

Attached is a simple patch to avoid the crash and enable the
use of Japanese message catalog.
Please apply the patch if there's no problem.

Hi!

It will clearly also need an update to the comment, but I can take care
of that.

I assume you have tested this?

Though I myself didn't test it, Saito-san tested it.

Ok, good.

The comment says that it works because we
are handling UTF8 on a special way on Windows,

ISTM UTF-8 isn't a special case.
In fact the comment also mentions the following.

* In future we might want to call bind_textdomain_codeset
* unconditionally, but that....

I think that's partially unrelated. UTF8 is special in that the
environment that the backend runs in is different from SERVER_ENCODING
in this case. For other encodings, we have setlocale():d to the same
encoding.

If your database is in EUC_JP, I don't see why gettext() isn't picking
it up properly in the first place..

In Japan 2 encodings (EUC_JP and Shift_JIS) are often used.
EUC_JP is mainly used on *nix and on Windows Shift_JIS is
used. We use EUC_JP not Shift_JIS as the server encoding

And why do we need that on Windows only, and not on other platforms?

because Shift_JIS isn't allowed as a server encoding. So
the Japanese Windows native message encoding Shift_JIS never
matches the server encoding EUC_JP and a conversion between
Shitt_jis and EUC_JP is necessarily needed.

Ah, so we're basically hardcoding that information? The system will go
up in SJIS, but since we can't deal with it, we switch it to EUC_JP?

Ok, I think I understand. I've made some minor stylistic changes (we
don't normally use if (NULL != <whatever>) in the pg sources), and will
apply with those.

This is for HEAD only, correct? Or is it something we should backpatch?

//Magnus

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#6)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander <magnus@hagander.net> writes:

Hiroshi Inoue wrote:

because Shift_JIS isn't allowed as a server encoding. So
the Japanese Windows native message encoding Shift_JIS never
matches the server encoding EUC_JP and a conversion between
Shitt_jis and EUC_JP is necessarily needed.

Ah, so we're basically hardcoding that information? The system will go
up in SJIS, but since we can't deal with it, we switch it to EUC_JP?

I'm not following this either. If the patch is really necessary then it
seems it must be working around a bug in the Windows version of gettext,
ie failure to distinguish CP932 from CP20932. Is that correct?

Ok, I think I understand. I've made some minor stylistic changes (we
don't normally use if (NULL != <whatever>) in the pg sources), and will
apply with those.

It definitely needs a comment explaining why this is needed.

regards, tom lane

#8Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Tom Lane (#7)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Hiroshi Inoue wrote:

because Shift_JIS isn't allowed as a server encoding. So
the Japanese Windows native message encoding Shift_JIS never
matches the server encoding EUC_JP and a conversion between
Shitt_jis and EUC_JP is necessarily needed.

Ah, so we're basically hardcoding that information? The system will go
up in SJIS, but since we can't deal with it, we switch it to EUC_JP?

I'm not following this either. If the patch is really necessary then it
seems it must be working around a bug in the Windows version of gettext,
ie failure to distinguish CP932 from CP20932. Is that correct?

I'm afraid I don't understand what you mean exactly.
AFAIK the output encoding of Windows gettext is detemined by the
ANSI system code page which is usualy CP932(Shift_JIS) in Japan and
unrelated to the locale settings.
In addition CP20932 is rarely used in Japan IIRC. I've never used it
and don't know what it is correctly (maybe a kind of EUC_JP).

regards,
Hiroshi Inoue

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#8)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue <inoue@tpf.co.jp> writes:

Tom Lane wrote:

I'm not following this either. If the patch is really necessary then it
seems it must be working around a bug in the Windows version of gettext,
ie failure to distinguish CP932 from CP20932. Is that correct?

I'm afraid I don't understand what you mean exactly.
AFAIK the output encoding of Windows gettext is detemined by the
ANSI system code page which is usualy CP932(Shift_JIS) in Japan and
unrelated to the locale settings.

If that's true then this code is presently broken for *every* locale
under Windows, not only Japanese.

To my mind the really correct thing to be doing here would be to call
bind_textdomain_codeset in all cases, rather than trusting gettext to
guess correctly about which encoding we want. As the comment notes,
we have not attempted that because the codeset names aren't well
standardized. But it seems to me that we could certainly find out what
codeset names are used on Windows, and apply bind_textdomain_codeset
all the time on Windows. That would make a lot more sense than ad-hoc
treatment of UTF-8 and EUC-JP if you ask me ...

regards, tom lane

#10Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Tom Lane (#9)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Tom Lane wrote:

Hiroshi Inoue <inoue@tpf.co.jp> writes:

Tom Lane wrote:

I'm not following this either. If the patch is really necessary then it
seems it must be working around a bug in the Windows version of gettext,
ie failure to distinguish CP932 from CP20932. Is that correct?

I'm afraid I don't understand what you mean exactly.
AFAIK the output encoding of Windows gettext is detemined by the
ANSI system code page which is usualy CP932(Shift_JIS) in Japan and
unrelated to the locale settings.

If that's true then this code is presently broken for *every* locale
under Windows, not only Japanese.

Maybe there are a few languages/countires where 2 encodings are
widely used.

To my mind the really correct thing to be doing here would be to call
bind_textdomain_codeset in all cases, rather than trusting gettext to
guess correctly about which encoding we want. As the comment notes,
we have not attempted that because the codeset names aren't well
standardized. But it seems to me that we could certainly find out what
codeset names are used on Windows, and apply bind_textdomain_codeset
all the time on Windows. That would make a lot more sense than ad-hoc
treatment of UTF-8 and EUC-JP if you ask me ...

I fundamentally agree with you.
What we hope is to enable the use of Japanese message catalog which
we gave up in 8.3 Windows-version release.

regards,
Hiroshi Inoue

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#10)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue <inoue@tpf.co.jp> writes:

Tom Lane wrote:

If that's true then this code is presently broken for *every* locale
under Windows, not only Japanese.

Maybe there are a few languages/countires where 2 encodings are
widely used.

UTF8 vs Latin-N? In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at. Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.

regards, tom lane

#12Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#11)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hiroshi Inoue <inoue@tpf.co.jp> writes:

Tom Lane wrote:

If that's true then this code is presently broken for *every* locale
under Windows, not only Japanese.

Maybe there are a few languages/countires where 2 encodings are
widely used.

UTF8 vs Latin-N?

We already special-cases utf8...

I think the thing us that as long as the encodings are compatible
(latin1 with different names for example) it worked fine.

In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at. Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.

Agreed.

/Magnus

#13Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Magnus Hagander (#12)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander wrote:

On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hiroshi Inoue <inoue@tpf.co.jp> writes:

Tom Lane wrote:

If that's true then this code is presently broken for *every* locale
under Windows, not only Japanese.

Maybe there are a few languages/countires where 2 encodings are
widely used.

UTF8 vs Latin-N?

We already special-cases utf8...

I think the thing us that as long as the encodings are compatible
(latin1 with different names for example) it worked fine.

In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at. Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.

Attached is a new patch to apply bind_textdomain_codeset() to most
server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.

Unfortunately it's hard for Saito-san and me to check encodings
other than EUC-JP.

regards,
Hiroshi Inoue

Attachments:

mbutils.patchtext/plain; name=mbutils.patchDownload
*** mbutils.c.orig	Sun Nov 23 08:42:57 2008
--- mbutils.c	Wed Nov 26 12:17:12 2008
***************
*** 822,830 ****
--- 822,870 ----
  	return clen;
  }
  
+ #ifdef	WIN32
+ static const struct codeset_map {
+ 	int	encoding;
+ 	const char *codeset;
+ } codeset_map_array[] = {
+ 		{PG_UTF8, "UTF-8"},
+ 		{PG_LATIN1, "LATIN1"}, 
+ 		{PG_LATIN2, "LATIN2"},
+ 		{PG_LATIN3, "LATIN3"},
+ 		{PG_LATIN4, "LATIN4"},
+ 		{PG_ISO_8859_5, "ISO-8859-5"},
+ 		{PG_ISO_8859_6, "ISO_8859-6"},
+ 		{PG_ISO_8859_7, "ISO-8859-7"},
+ 		{PG_ISO_8859_8, "ISO-8859-8"},
+ 		{PG_LATIN5, "LATIN5"},
+ 		{PG_LATIN6, "LATIN6"},
+ 		{PG_LATIN7, "LATIN7"},
+ 		{PG_LATIN8, "LATIN8"},
+ 		{PG_LATIN9, "LATIN-9"},
+ 		{PG_LATIN10, "LATIN10"},
+ 		{PG_KOI8R, "KOI8-R"},
+ 		{PG_WIN1250, "CP1250"},
+ 		{PG_WIN1251, "CP1251"},
+ 		{PG_WIN1252, "CP1252"},
+ 		{PG_WIN1253, "CP1253"},
+ 		{PG_WIN1254, "CP1254"},
+ 		{PG_WIN1255, "CP1255"},
+ 		{PG_WIN1256, "CP1256"},
+ 		{PG_WIN1257, "CP1257"},
+ 		{PG_WIN1258, "CP1258"},
+ 		{PG_WIN866, "CP866"},
+ 		{PG_WIN874, "CP874"},
+ 		{PG_EUC_CN, "EUC-CN"},
+ 		{PG_EUC_JP, "EUC-JP"},
+ 		{PG_EUC_KR, "EUC-KR"},
+ 		{PG_EUC_TW, "EUC-TW"}};
+ #endif /* WIN32 */
+ 
  void
  SetDatabaseEncoding(int encoding)
  {
+ 	const char *target_codeset = NULL;
+ 
  	if (!PG_VALID_BE_ENCODING(encoding))
  		elog(ERROR, "invalid database encoding: %d", encoding);
  
***************
*** 846,852 ****
  	 */
  #ifdef ENABLE_NLS
  	if (encoding == PG_UTF8)
! 		if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
  			elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
--- 886,907 ----
  	 */
  #ifdef ENABLE_NLS
  	if (encoding == PG_UTF8)
! 		target_codeset = "UTF-8";
! #ifdef	WIN32
! 	else
! 	{
! 		int	i;
! 
! 		for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++)
! 			if (codeset_map_array[i].encoding == encoding)
! 			{
! 				target_codeset = codeset_map_array[i].codeset;
! 				break;
! 			}
! 	}
! #endif /* WIN32 */
! 	if (target_codeset != NULL)
! 		if (bind_textdomain_codeset("postgres", target_codeset) == NULL)
  			elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
#14Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Inoue (#13)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

I think the thing us that as long as the encodings are compatible
(latin1 with different names for example) it worked fine.

In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at. Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.

Attached is a new patch to apply bind_textdomain_codeset() to most
server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.

Unfortunately it's hard for Saito-san and me to check encodings
other than EUC-JP.

In principle this looks good, I think, but I'm a bit worried around the
lack of testing. I can do some testing under LATIN1 which is what we use
in Sweden (just need to get gettext working *at all* in my dev
environment again - I've somehow managed to break it), and perhaps we
can find someone to do a test in an eastern-european locale to get some
more datapoints?

Can you outline the steps one needs to go through to show the problem,
so we can confirm it's fixed in these locales?

//Magnus

#15Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Magnus Hagander (#14)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander wrote:

Hiroshi Inoue wrote:

I think the thing us that as long as the encodings are compatible
(latin1 with different names for example) it worked fine.

In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at. Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.

Attached is a new patch to apply bind_textdomain_codeset() to most
server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.

Unfortunately it's hard for Saito-san and me to check encodings
other than EUC-JP.

In principle this looks good, I think, but I'm a bit worried around the
lack of testing.

Thanks and I agree with you.

I can do some testing under LATIN1 which is what we use
in Sweden (just need to get gettext working *at all* in my dev
environment again - I've somehow managed to break it), and perhaps we
can find someone to do a test in an eastern-european locale to get some
more datapoints?

Can you outline the steps one needs to go through to show the problem,
so we can confirm it's fixed in these locales?

Saito-san and I have been working on another related problem about
changing LC_MESSAGES locale properly under Windows and would be able
to provide a patch in a few days. It seems preferable for us to apply
the patch also so as to change the message catalog easily.

Attached is an example in which LC_MESSAGES is cht_twn(zh_TW) and
the server encoding is EUC-TW. You can see it as a UTF-8 text
because the client_encoding is set to UTF-8 first.

BTW you can see another problem at line 4 in the text.
At the point the LC_MESSAGES is still japanese and postgres fails
to convert a Japanese error message to EUC_TW encoding. There's
no wonder but it doesn't seem preferable.

regards,
Hiroshi Inoue

Attachments:

euctw.txttext/plain; name=euctw.txtDownload
#16Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#13)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi, is this patch still needed?

---------------------------------------------------------------------------

Hiroshi Inoue wrote:

Magnus Hagander wrote:

On 25 nov 2008, at 05.00, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hiroshi Inoue <inoue@tpf.co.jp> writes:

Tom Lane wrote:

If that's true then this code is presently broken for *every* locale
under Windows, not only Japanese.

Maybe there are a few languages/countires where 2 encodings are
widely used.

UTF8 vs Latin-N?

We already special-cases utf8...

I think the thing us that as long as the encodings are compatible
(latin1 with different names for example) it worked fine.

In any case I think the problem is that gettext is
looking at a setting that is not what we are looking at. Particularly
with the 8.4 changes to allow per-database locale settings, this has
got to be fixed in a bulletproof way.

Attached is a new patch to apply bind_textdomain_codeset() to most
server encodings. Exceptions are PG_SQL_ASCII, PG_MULE_INTERNAL
and PG_EUC_JIS_2004. "EUC-JP" may be OK for EUC_JIS_2004.

Unfortunately it's hard for Saito-san and me to check encodings
other than EUC-JP.

regards,
Hiroshi Inoue

*** mbutils.c.orig	Sun Nov 23 08:42:57 2008
--- mbutils.c	Wed Nov 26 12:17:12 2008
***************
*** 822,830 ****
--- 822,870 ----
return clen;
}
+ #ifdef	WIN32
+ static const struct codeset_map {
+ 	int	encoding;
+ 	const char *codeset;
+ } codeset_map_array[] = {
+ 		{PG_UTF8, "UTF-8"},
+ 		{PG_LATIN1, "LATIN1"}, 
+ 		{PG_LATIN2, "LATIN2"},
+ 		{PG_LATIN3, "LATIN3"},
+ 		{PG_LATIN4, "LATIN4"},
+ 		{PG_ISO_8859_5, "ISO-8859-5"},
+ 		{PG_ISO_8859_6, "ISO_8859-6"},
+ 		{PG_ISO_8859_7, "ISO-8859-7"},
+ 		{PG_ISO_8859_8, "ISO-8859-8"},
+ 		{PG_LATIN5, "LATIN5"},
+ 		{PG_LATIN6, "LATIN6"},
+ 		{PG_LATIN7, "LATIN7"},
+ 		{PG_LATIN8, "LATIN8"},
+ 		{PG_LATIN9, "LATIN-9"},
+ 		{PG_LATIN10, "LATIN10"},
+ 		{PG_KOI8R, "KOI8-R"},
+ 		{PG_WIN1250, "CP1250"},
+ 		{PG_WIN1251, "CP1251"},
+ 		{PG_WIN1252, "CP1252"},
+ 		{PG_WIN1253, "CP1253"},
+ 		{PG_WIN1254, "CP1254"},
+ 		{PG_WIN1255, "CP1255"},
+ 		{PG_WIN1256, "CP1256"},
+ 		{PG_WIN1257, "CP1257"},
+ 		{PG_WIN1258, "CP1258"},
+ 		{PG_WIN866, "CP866"},
+ 		{PG_WIN874, "CP874"},
+ 		{PG_EUC_CN, "EUC-CN"},
+ 		{PG_EUC_JP, "EUC-JP"},
+ 		{PG_EUC_KR, "EUC-KR"},
+ 		{PG_EUC_TW, "EUC-TW"}};
+ #endif /* WIN32 */
+ 
void
SetDatabaseEncoding(int encoding)
{
+ 	const char *target_codeset = NULL;
+ 
if (!PG_VALID_BE_ENCODING(encoding))
elog(ERROR, "invalid database encoding: %d", encoding);
***************
*** 846,852 ****
*/
#ifdef ENABLE_NLS
if (encoding == PG_UTF8)
! 		if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
elog(LOG, "bind_textdomain_codeset failed");
#endif
}
--- 886,907 ----
*/
#ifdef ENABLE_NLS
if (encoding == PG_UTF8)
! 		target_codeset = "UTF-8";
! #ifdef	WIN32
! 	else
! 	{
! 		int	i;
! 
! 		for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++)
! 			if (codeset_map_array[i].encoding == encoding)
! 			{
! 				target_codeset = codeset_map_array[i].codeset;
! 				break;
! 			}
! 	}
! #endif /* WIN32 */
! 	if (target_codeset != NULL)
! 		if (bind_textdomain_codeset("postgres", target_codeset) == NULL)
elog(LOG, "bind_textdomain_codeset failed");
#endif
}

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#16)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Bruce Momjian <bruce@momjian.us> writes:

Hiroshi, is this patch still needed?

This patch is for a problem that's entirely separate from the LC_TIME
issue, if that's what you were wondering.

regards, tom lane

#18Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Bruce Momjian (#16)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Bruce Momjian wrote:

Hiroshi, is this patch still needed?

Yes though it should be slightly changed now.
*set lc_messages does not work* issue isn't directly related to this
issue.

regards,
Hiroshi Inoue

#19Hiroshi Saito
z-saito@guitar.ocn.ne.jp
In reply to: Bruce Momjian (#16)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hi.

My swift attack test was the MinGW environment.
But, Inoue-san suggestion.

1. MinGW+gcc build
HIROSHI=# set LC_TIME=Ja;
SET
HIROSHI=# select to_char(now(),'TMDay');
to_char
---------
日曜日
(1 行)
HIROSHI=# set LC_TIME='Japan';
SET
HIROSHI=# select to_char(Now(),'TMDay');
to_char
---------
日曜日
(1 行)
HIROSHI=# set LC_TIME='Japanese';
SET
HIROSHI=# select to_char(Now(),'TMDay');
to_char
---------
日曜日
(1 行)

However, A setup of 'Ja' was strange.?_?
http://msdn.microsoft.com/en-us/library/aa246450(VS.60).aspx

2. MSVC build
HIROSHI=# set LC_TIME='Ja';
ERROR: invalid value for parameter "lc_time": "Ja"
STATEMENT: set LC_TIME='Ja';
ERROR: invalid value for parameter "lc_time": "Ja"
HIROSHI=# set LC_TIME='Japan';
ERROR: invalid value for parameter "lc_time": "Japan"
STATEMENT: set LC_TIME='Japan';
ERROR: invalid value for parameter "lc_time": "Japan"
HIROSHI=# set LC_TIME=Japanese;
SET
HIROSHI=# select to_char(Now(),'TMDay');
to_char
---------
日曜日
(1 行)

Umm, Re-investigation is required for this. :-(
However, If reasonable clear, it will be good for a document at suggestion.

Regards,
Hiroshi Saito

#20Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Hiroshi Inoue (#18)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Hiroshi, is this patch still needed?

Yes though it should be slightly changed now.
*set lc_messages does not work* issue isn't directly related to this
issue.

Though I'm not sure how we can test it, I can provide test results
like the attached one.

The attached is a test result in case LC_MESSAGES=fr causing
the following 3 errors in databases with various encodings.

select a; ==> column "a" does not exist
1; ==> syntax error at or near "1"
select * from a; ==> relation "a" does not exist

Comments?
Please note the encoding of the attached file is utf-8.

regards,
Hiroshi Inoue

Attachments:

fr.txttext/plain; name=fr.txtDownload
#21Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#2)
Re: pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Is this patch still needed?

---------------------------------------------------------------------------

Hiroshi Inoue wrote:

Hi Magnus and all,

Magnus Hagander wrote:

Log Message:
-----------
Explicitly bind gettext() to the UTF8 locale when in use.
This is required on Windows due to the special locale
handling for UTF8 that doesn't change the full environment.

Thanks to this change UTF-8 case was solved but Japanese users
are still unhappy with Windows databases with EUC_JP encoding.
Shift_JIS which is a Japanese encoding under Windows doesn't
match any server encoding and causes a crash with the use of
gettext. So Saito-san removed ja message catalog just before
the 8.3 release.

Attached is a simple patch to avoid the crash and enable the
use of Japanese message catalog.
Please apply the patch if there's no problem.

regards,
Hiroshi Inoue

Fixes crash with translated error messages per bugs 4180
and 4196.

Tom Lane

Modified Files:
--------------
pgsql/src/backend/utils/mb:
mbutils.c (r1.70 -> r1.71)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/mbutils.c?r1=1.70&amp;r2=1.71)

*** mbutils.c.orig	Sun Nov 23 08:42:57 2008
--- mbutils.c	Sun Nov 23 13:52:41 2008
***************
*** 825,830 ****
--- 825,832 ----
void
SetDatabaseEncoding(int encoding)
{
+ 	const char *target_codeset = NULL;
+ 
if (!PG_VALID_BE_ENCODING(encoding))
elog(ERROR, "invalid database encoding: %d", encoding);
***************
*** 846,852 ****
*/
#ifdef ENABLE_NLS
if (encoding == PG_UTF8)
! 		if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
elog(LOG, "bind_textdomain_codeset failed");
#endif
}
--- 848,860 ----
*/
#ifdef ENABLE_NLS
if (encoding == PG_UTF8)
! 		target_codeset = "UTF-8";
! #ifdef	WIN32
! 	else if (encoding == PG_EUC_JP)
! 		target_codeset = "EUC_JP";
! #endif /* WIN32 */
! 	if (NULL != target_codeset)
! 		if (bind_textdomain_codeset("postgres", target_codeset) == NULL)
elog(LOG, "bind_textdomain_codeset failed");
#endif
}

--
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
Re: pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Bruce Momjian <bruce@momjian.us> writes:

Is this patch still needed?

No; this is the beginning of the still-continuing thread about
lc_messages and Windows. I'm not sure what we'll end up applying,
but not this.

regards, tom lane

#23Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Inoue (#20)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Hiroshi, is this patch still needed?

Yes though it should be slightly changed now.

In what way should it be changed?

//Magnus

#24Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Magnus Hagander (#23)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander wrote:

Hiroshi Inoue wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Hiroshi, is this patch still needed?

Yes though it should be slightly changed now.

In what way should it be changed?

One is already committed by you.
[COMMITTERS] pgsql: Use the new text domain names

Another is to bind the codeset "EUC-JP" for
PG_EUC_JIS_2004 server encoding.
Though EUC_JP and EUC_JIS_2004 aren't completely
compatible, it seems OK in most cases.

regards,
Hiroshi Inoue

#25Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Hiroshi Inoue (#24)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

Magnus Hagander wrote:

Hiroshi Inoue wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Hiroshi, is this patch still needed?

Yes though it should be slightly changed now.

In what way should it be changed?

One is already committed by you.
[COMMITTERS] pgsql: Use the new text domain names

Another is to bind the codeset "EUC-JP" for
PG_EUC_JIS_2004 server encoding.

The attached is an updated patch.

regards,
Hiroshi Inoue

Attachments:

mbutils.patchtext/plain; name=mbutils.patchDownload
Index: mbutils.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
retrieving revision 1.77
diff -c -c -r1.77 mbutils.c
*** mbutils.c	19 Jan 2009 15:34:23 -0000	1.77
--- mbutils.c	20 Jan 2009 12:54:33 -0000
***************
*** 837,842 ****
--- 837,881 ----
  	return clen;
  }
  
+ #ifdef	WIN32
+ static const struct codeset_map {
+ 	int	encoding;
+ 	const char *codeset;
+ } codeset_map_array[] = {
+ 		{PG_UTF8, "UTF-8"},
+ 		{PG_LATIN1, "LATIN1"}, 
+ 		{PG_LATIN2, "LATIN2"},
+ 		{PG_LATIN3, "LATIN3"},
+ 		{PG_LATIN4, "LATIN4"},
+ 		{PG_ISO_8859_5, "ISO-8859-5"},
+ 		{PG_ISO_8859_6, "ISO_8859-6"},
+ 		{PG_ISO_8859_7, "ISO-8859-7"},
+ 		{PG_ISO_8859_8, "ISO-8859-8"},
+ 		{PG_LATIN5, "LATIN5"},
+ 		{PG_LATIN6, "LATIN6"},
+ 		{PG_LATIN7, "LATIN7"},
+ 		{PG_LATIN8, "LATIN8"},
+ 		{PG_LATIN9, "LATIN-9"},
+ 		{PG_LATIN10, "LATIN10"},
+ 		{PG_KOI8R, "KOI8-R"},
+ 		{PG_WIN1250, "CP1250"},
+ 		{PG_WIN1251, "CP1251"},
+ 		{PG_WIN1252, "CP1252"},
+ 		{PG_WIN1253, "CP1253"},
+ 		{PG_WIN1254, "CP1254"},
+ 		{PG_WIN1255, "CP1255"},
+ 		{PG_WIN1256, "CP1256"},
+ 		{PG_WIN1257, "CP1257"},
+ 		{PG_WIN1258, "CP1258"},
+ 		{PG_WIN866, "CP866"},
+ 		{PG_WIN874, "CP874"},
+ 		{PG_EUC_CN, "EUC-CN"},
+ 		{PG_EUC_JP, "EUC-JP"},
+ 		{PG_EUC_KR, "EUC-KR"},
+ 		{PG_EUC_TW, "EUC-TW"},
+ 		{PG_EUC_JIS_2004, "EUC-JP"}};
+ #endif /* WIN32 */
+ 
  /* mbcliplen for any single-byte encoding */
  static int
  cliplen(const char *str, int len, int limit)
***************
*** 852,857 ****
--- 891,898 ----
  void
  SetDatabaseEncoding(int encoding)
  {
+ 	const char *target_codeset = NULL;
+ 
  	if (!PG_VALID_BE_ENCODING(encoding))
  		elog(ERROR, "invalid database encoding: %d", encoding);
  
***************
*** 873,879 ****
  	 */
  #ifdef ENABLE_NLS
  	if (encoding == PG_UTF8)
! 		if (bind_textdomain_codeset(textdomain(NULL), "UTF-8") == NULL)
  			elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
--- 914,935 ----
  	 */
  #ifdef ENABLE_NLS
  	if (encoding == PG_UTF8)
! 		target_codeset = "UTF-8";
! #ifdef	WIN32
! 	else
! 	{
! 		int	i;
! 
! 		for (i = 0; i < sizeof(codeset_map_array) / sizeof(struct codeset_map); i++)
! 			if (codeset_map_array[i].encoding == encoding)
! 			{
! 				target_codeset = codeset_map_array[i].codeset;
! 				break;
! 			}
! 	}
! #endif /* WIN32 */
! 	if (target_codeset != NULL)
! 		if (bind_textdomain_codeset(textdomain(NULL), target_codeset) == NULL)
  			elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
#26Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Inoue (#25)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

Hiroshi Inoue wrote:

Magnus Hagander wrote:

Hiroshi Inoue wrote:

Hiroshi Inoue wrote:

Bruce Momjian wrote:

Hiroshi, is this patch still needed?

Yes though it should be slightly changed now.

In what way should it be changed?

One is already committed by you.
[COMMITTERS] pgsql: Use the new text domain names

Another is to bind the codeset "EUC-JP" for
PG_EUC_JIS_2004 server encoding.

The attached is an updated patch.

Thanks.

Looking at it, the comment clearly needs updating - I'll do that.

However, one question: The comment currently says it's harmless to do
this on non-windows platforms. Does this still hold true? In that case,
this whole thing shouldn't be #ifdef:ed to WIN32 and can be simplified.
Or does the "middle part" of the comment come into play, in that the
codeset names can be different on different platforms?

Peter, can you comment on that?

If we do keep the thing win32 only, I think we should just wrap the
whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
Unix - that'll make for cleaner code.

//Magnus

#27Peter Eisentraut
peter_e@gmx.net
In reply to: Magnus Hagander (#26)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander wrote:

However, one question: The comment currently says it's harmless to do
this on non-windows platforms. Does this still hold true?

Yes, the non-WIN32 code path appears to be the same, still. But the
ifdef WIN32 part we don't want, because that presumes something about
the spelling of encoding names in the local iconv library.

If we do keep the thing win32 only, I think we should just wrap the
whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
Unix - that'll make for cleaner code.

Yes, that would be much better.

#28Magnus Hagander
magnus@hagander.net
In reply to: Peter Eisentraut (#27)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Peter Eisentraut wrote:

Magnus Hagander wrote:

However, one question: The comment currently says it's harmless to do
this on non-windows platforms. Does this still hold true?

Yes, the non-WIN32 code path appears to be the same, still. But the
ifdef WIN32 part we don't want, because that presumes something about
the spelling of encoding names in the local iconv library.

If we do keep the thing win32 only, I think we should just wrap the
whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
Unix - that'll make for cleaner code.

Yes, that would be much better.

Something like this then?

//Magnus

Attachments:

gettext_win32.patchtext/x-diff; name=gettext_win32.patchDownload
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 849,854 **** cliplen(const char *str, int len, int limit)
--- 849,894 ----
  	return l;
  }
  
+ #if defined(ENABLE_NLS) && defined(WIN32)
+ static const struct codeset_map {
+ 	int	encoding;
+ 	const char *codeset;
+ } codeset_map_array[] = {
+ 	{PG_UTF8, "UTF-8"},
+ 	{PG_LATIN1, "LATIN1"},
+ 	{PG_LATIN2, "LATIN2"},
+ 	{PG_LATIN3, "LATIN3"},
+ 	{PG_LATIN4, "LATIN4"},
+ 	{PG_ISO_8859_5, "ISO-8859-5"},
+ 	{PG_ISO_8859_6, "ISO_8859-6"},
+ 	{PG_ISO_8859_7, "ISO-8859-7"},
+ 	{PG_ISO_8859_8, "ISO-8859-8"},
+ 	{PG_LATIN5, "LATIN5"},
+ 	{PG_LATIN6, "LATIN6"},
+ 	{PG_LATIN7, "LATIN7"},
+ 	{PG_LATIN8, "LATIN8"},
+ 	{PG_LATIN9, "LATIN-9"},
+ 	{PG_LATIN10, "LATIN10"},
+ 	{PG_KOI8R, "KOI8-R"},
+ 	{PG_WIN1250, "CP1250"},
+ 	{PG_WIN1251, "CP1251"},
+ 	{PG_WIN1252, "CP1252"},
+ 	{PG_WIN1253, "CP1253"},
+ 	{PG_WIN1254, "CP1254"},
+ 	{PG_WIN1255, "CP1255"},
+ 	{PG_WIN1256, "CP1256"},
+ 	{PG_WIN1257, "CP1257"},
+ 	{PG_WIN1258, "CP1258"},
+ 	{PG_WIN866, "CP866"},
+ 	{PG_WIN874, "CP874"},
+ 	{PG_EUC_CN, "EUC-CN"},
+ 	{PG_EUC_JP, "EUC-JP"},
+ 	{PG_EUC_KR, "EUC-KR"},
+ 	{PG_EUC_TW, "EUC-TW"},
+ 	{PG_EUC_JIS_2004, "EUC-JP"}
+ };
+ #endif /* WIN32 */
+ 
  void
  SetDatabaseEncoding(int encoding)
  {
***************
*** 859,880 **** SetDatabaseEncoding(int encoding)
  	Assert(DatabaseEncoding->encoding == encoding);
  
  	/*
! 	 * On Windows, we allow UTF-8 database encoding to be used with any
! 	 * locale setting, because UTF-8 requires special handling anyway.
! 	 * But this means that gettext() might be misled about what output
! 	 * encoding it should use, so we have to tell it explicitly.
! 	 *
! 	 * In future we might want to call bind_textdomain_codeset
! 	 * unconditionally, but that requires knowing how to spell the codeset
! 	 * name properly for all encodings on all platforms, which might be
! 	 * problematic.
! 	 *
! 	 * This is presently unnecessary, but harmless, on non-Windows platforms.
  	 */
! #ifdef ENABLE_NLS
! 	if (encoding == PG_UTF8)
! 		if (bind_textdomain_codeset(textdomain(NULL), "UTF-8") == NULL)
! 			elog(LOG, "bind_textdomain_codeset failed");
  #endif
  }
  
--- 899,921 ----
  	Assert(DatabaseEncoding->encoding == encoding);
  
  	/*
! 	 * On Windows, we need to explicitly bind gettext to the correct
! 	 * encoding, because gettext() tends to get confused.
  	 */
! #if defined(ENABLE_NLS) && defined(WIN32)
! 	{
! 		int	i;
! 
! 		for (i = 0; i < sizeof(codeset_map_array) / sizeof(codeset_map_array[0]); i++)
! 		{
! 			if (codeset_map_array[i].encoding == encoding)
! 			{
! 				if (bind_textdomain_codeset(textdomain(NULL), codeset_map_array[i].codeset) == NULL)
! 					elog(LOG, "bind_textdomain_codeset failed");
! 				break;
! 			}
! 		}
! 	}
  #endif
  }
  
#29Peter Eisentraut
peter_e@gmx.net
In reply to: Magnus Hagander (#28)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander wrote:

Peter Eisentraut wrote:

Magnus Hagander wrote:

However, one question: The comment currently says it's harmless to do
this on non-windows platforms. Does this still hold true?

Yes, the non-WIN32 code path appears to be the same, still. But the
ifdef WIN32 part we don't want, because that presumes something about
the spelling of encoding names in the local iconv library.

If we do keep the thing win32 only, I think we should just wrap the
whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
Unix - that'll make for cleaner code.

Yes, that would be much better.

Something like this then?

Looks OK to me.

#30Hiroshi Inoue
inoue@tpf.co.jp
In reply to: Magnus Hagander (#28)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Magnus Hagander wrote:

Peter Eisentraut wrote:

Magnus Hagander wrote:

However, one question: The comment currently says it's harmless to do
this on non-windows platforms. Does this still hold true?

Yes, the non-WIN32 code path appears to be the same, still. But the
ifdef WIN32 part we don't want, because that presumes something about
the spelling of encoding names in the local iconv library.

If we do keep the thing win32 only, I think we should just wrap the
whole thing in #ifdef WIN32 and no longer do the codeset stuff at all on
Unix - that'll make for cleaner code.

Yes, that would be much better.

Something like this then?

It seems OK to me.

regards,
Hiroshi Inoue

#31Magnus Hagander
magnus@hagander.net
In reply to: Hiroshi Inoue (#30)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext() to the UTF8 locale when in use.

Hiroshi Inoue wrote:

Magnus Hagander wrote:

Peter Eisentraut wrote:

Magnus Hagander wrote:

However, one question: The comment currently says it's harmless to do
this on non-windows platforms. Does this still hold true?

Yes, the non-WIN32 code path appears to be the same, still. But the
ifdef WIN32 part we don't want, because that presumes something about
the spelling of encoding names in the local iconv library.

If we do keep the thing win32 only, I think we should just wrap the
whole thing in #ifdef WIN32 and no longer do the codeset stuff at
all on
Unix - that'll make for cleaner code.

Yes, that would be much better.

Something like this then?

It seems OK to me.

Applied.

//Magnus