pgsql: Explicitly bind gettext to the correct encoding on Windows.

Started by Nonamealmost 17 years ago11 messages
#1Noname
mha@postgresql.org

Log Message:
-----------
Explicitly bind gettext to the correct encoding on Windows.

Original patch from Hiroshi Inoue.

Modified Files:
--------------
pgsql/src/backend/utils/mb:
mbutils.c (r1.77 -> r1.78)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/mbutils.c?r1=1.77&r2=1.78)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Noname (#1)
Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

mha@postgresql.org (Magnus Hagander) writes:

Explicitly bind gettext to the correct encoding on Windows.

I have a couple of objections to this patch. First, what happens if
it fails to find a matching table entry? (The existing answer is
"nothing", but that doesn't seem right.) Second and more critical,
it adds still another data structure that has to be maintained when
the list of encodings changes, and it doesn't even live in the same
file as any existing encoding-information table.

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

regards, tom lane

#3Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#2)
Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Tom Lane wrote:

mha@postgresql.org (Magnus Hagander) writes:

Explicitly bind gettext to the correct encoding on Windows.

I have a couple of objections to this patch. First, what happens if
it fails to find a matching table entry? (The existing answer is
"nothing", but that doesn't seem right.) Second and more critical,
it adds still another data structure that has to be maintained when
the list of encodings changes, and it doesn't even live in the same
file as any existing encoding-information table.

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

Would someone please comment on Tom's questions above.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#4Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#2)
Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Tom Lane wrote:

mha@postgresql.org (Magnus Hagander) writes:

Explicitly bind gettext to the correct encoding on Windows.

I have a couple of objections to this patch. First, what happens if
it fails to find a matching table entry? (The existing answer is
"nothing", but that doesn't seem right.) Second and more critical,
it adds still another data structure that has to be maintained when
the list of encodings changes, and it doesn't even live in the same
file as any existing encoding-information table.

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

Do you mean a separate table there, or should we add a new column to one
of the existing tables?

//Magnus

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#4)
Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

Do you mean a separate table there, or should we add a new column to one
of the existing tables?

Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.

regards, tom lane

#6Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#5)
Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

Do you mean a separate table there, or should we add a new column to one
of the existing tables?

Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.

I was about to start looking at this when that other thread
(http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
started about related issues on other platforms. Seems we should have a
"coordinated fix" for this, so I'm going to want and see what come sout
of that one. Unless I'm misunderstanding thigns and they're not related?

//Magnus

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Magnus Hagander (#6)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Magnus Hagander wrote:

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

Do you mean a separate table there, or should we add a new column to one
of the existing tables?

Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.

I was about to start looking at this when that other thread
(http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
started about related issues on other platforms. Seems we should have a
"coordinated fix" for this, so I'm going to want and see what come sout
of that one. Unless I'm misunderstanding thigns and they're not related?

I've committed a fairly trivial patch per Peter's suggestion to fix the
other thread's issue. I left the table as is, so whatever refactorings
were planned can now be applied.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#8Magnus Hagander
magnus@hagander.net
In reply to: Heikki Linnakangas (#7)
1 attachment(s)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Heikki Linnakangas wrote:

Magnus Hagander wrote:

Tom Lane wrote:

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

Do you mean a separate table there, or should we add a new column to
one
of the existing tables?

Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.

I was about to start looking at this when that other thread
(http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
started about related issues on other platforms. Seems we should have a
"coordinated fix" for this, so I'm going to want and see what come sout
of that one. Unless I'm misunderstanding thigns and they're not related?

I've committed a fairly trivial patch per Peter's suggestion to fix the
other thread's issue. I left the table as is, so whatever refactorings
were planned can now be applied.

Here's a patch that moves the table over to encnames.c, and renames it
to look like the others.

I don't know what it should be doing if it can't find a match, so I
haven't changed that behavior.

Comments?

//Magnus

Attachments:

enctable.patchtext/x-diff; name=enctable.patchDownload
*** a/src/backend/utils/mb/encnames.c
--- b/src/backend/utils/mb/encnames.c
***************
*** 431,436 **** pg_enc2name pg_enc2name_tbl[] =
--- 431,478 ----
  };
  
  /* ----------
+  * These are encoding names for gettext.
+  * ----------
+  */
+ pg_enc2gettext pg_enc2gettext_tbl[] =
+ {
+ 	{PG_UTF8, "UTF-8"},
+ 	{PG_LATIN1, "LATIN1"},
+ 	{PG_LATIN2, "LATIN2"},
+ 	{PG_LATIN3, "LATIN3"},
+ 	{PG_LATIN4, "LATIN4"},
+ 	{PG_ISO_8859_5, "ISO-8859-5"},
+ 	{PG_ISO_8859_6, "ISO_8859-6"},
+ 	{PG_ISO_8859_7, "ISO-8859-7"},
+ 	{PG_ISO_8859_8, "ISO-8859-8"},
+ 	{PG_LATIN5, "LATIN5"},
+ 	{PG_LATIN6, "LATIN6"},
+ 	{PG_LATIN7, "LATIN7"},
+ 	{PG_LATIN8, "LATIN8"},
+ 	{PG_LATIN9, "LATIN-9"},
+ 	{PG_LATIN10, "LATIN10"},
+ 	{PG_KOI8R, "KOI8-R"},
+ 	{PG_KOI8U, "KOI8-U"},
+ 	{PG_WIN1250, "CP1250"},
+ 	{PG_WIN1251, "CP1251"},
+ 	{PG_WIN1252, "CP1252"},
+ 	{PG_WIN1253, "CP1253"},
+ 	{PG_WIN1254, "CP1254"},
+ 	{PG_WIN1255, "CP1255"},
+ 	{PG_WIN1256, "CP1256"},
+ 	{PG_WIN1257, "CP1257"},
+ 	{PG_WIN1258, "CP1258"},
+ 	{PG_WIN866, "CP866"},
+ 	{PG_WIN874, "CP874"},
+ 	{PG_EUC_CN, "EUC-CN"},
+ 	{PG_EUC_JP, "EUC-JP"},
+ 	{PG_EUC_KR, "EUC-KR"},
+ 	{PG_EUC_TW, "EUC-TW"},
+ 	{PG_EUC_JIS_2004, "EUC-JP"}
+ };
+ 
+ 
+ /* ----------
   * Encoding checks, for error returns -1 else encoding id
   * ----------
   */
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 890,936 **** cliplen(const char *str, int len, int limit)
  	return l;
  }
  
- #if defined(ENABLE_NLS)
- static const struct codeset_map {
- 	int	encoding;
- 	const char *codeset;
- } codeset_map_array[] = {
- 	{PG_UTF8, "UTF-8"},
- 	{PG_LATIN1, "LATIN1"},
- 	{PG_LATIN2, "LATIN2"},
- 	{PG_LATIN3, "LATIN3"},
- 	{PG_LATIN4, "LATIN4"},
- 	{PG_ISO_8859_5, "ISO-8859-5"},
- 	{PG_ISO_8859_6, "ISO_8859-6"},
- 	{PG_ISO_8859_7, "ISO-8859-7"},
- 	{PG_ISO_8859_8, "ISO-8859-8"},
- 	{PG_LATIN5, "LATIN5"},
- 	{PG_LATIN6, "LATIN6"},
- 	{PG_LATIN7, "LATIN7"},
- 	{PG_LATIN8, "LATIN8"},
- 	{PG_LATIN9, "LATIN-9"},
- 	{PG_LATIN10, "LATIN10"},
- 	{PG_KOI8R, "KOI8-R"},
- 	{PG_KOI8U, "KOI8-U"},
- 	{PG_WIN1250, "CP1250"},
- 	{PG_WIN1251, "CP1251"},
- 	{PG_WIN1252, "CP1252"},
- 	{PG_WIN1253, "CP1253"},
- 	{PG_WIN1254, "CP1254"},
- 	{PG_WIN1255, "CP1255"},
- 	{PG_WIN1256, "CP1256"},
- 	{PG_WIN1257, "CP1257"},
- 	{PG_WIN1258, "CP1258"},
- 	{PG_WIN866, "CP866"},
- 	{PG_WIN874, "CP874"},
- 	{PG_EUC_CN, "EUC-CN"},
- 	{PG_EUC_JP, "EUC-JP"},
- 	{PG_EUC_KR, "EUC-KR"},
- 	{PG_EUC_TW, "EUC-TW"},
- 	{PG_EUC_JIS_2004, "EUC-JP"}
- };
- #endif /* ENABLE_NLS */
- 
  void
  SetDatabaseEncoding(int encoding)
  {
--- 890,895 ----
***************
*** 969,980 **** pg_bind_textdomain_codeset(const char *domainname)
  		return;
  #endif
  
! 	for (i = 0; i < lengthof(codeset_map_array); i++)
  	{
! 		if (codeset_map_array[i].encoding == encoding)
  		{
  			if (bind_textdomain_codeset(domainname,
! 										codeset_map_array[i].codeset) == NULL)
  				elog(LOG, "bind_textdomain_codeset failed");
  			break;
  		}
--- 928,939 ----
  		return;
  #endif
  
! 	for (i = 0; pg_enc2gettext_tbl[i].name != NULL; i++)
  	{
! 		if (pg_enc2gettext_tbl[i].encoding == encoding)
  		{
  			if (bind_textdomain_codeset(domainname,
! 										pg_enc2gettext_tbl[i].name) == NULL)
  				elog(LOG, "bind_textdomain_codeset failed");
  			break;
  		}
*** a/src/include/mb/pg_wchar.h
--- b/src/include/mb/pg_wchar.h
***************
*** 262,267 **** typedef struct pg_enc2name
--- 262,278 ----
  extern pg_enc2name pg_enc2name_tbl[];
  
  /*
+  * Encoding names for gettext
+  */
+ typedef struct pg_enc2gettext
+ {
+ 	pg_enc		encoding;
+ 	const char *name;
+ } pg_enc2gettext;
+ 
+ extern pg_enc2gettext pg_enc2gettext_tbl[];
+ 
+ /*
   * pg_wchar stuff
   */
  typedef int (*mb2wchar_with_len_converter) (const unsigned char *from,
#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Magnus Hagander (#8)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Magnus Hagander <magnus@hagander.net> writes:

Tom Lane wrote:

What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.

Here's a patch that moves the table over to encnames.c, and renames it
to look like the others.

I think you forgot to include the NULL terminating entry that the
loop seems to be expecting. Also, why isn't the array "const"?

I don't know what it should be doing if it can't find a match, so I
haven't changed that behavior.

As things stand, it should throw error, except in the case of SQL_ASCII;
there is no excuse for any other database encoding to not be in the
table. However, what seems more worrisome to me is the prospect already
discussed that the codeset name we have in the table is not actually
recognized by gettext/iconv. Did we have a solution for that?

Anyway, this fixes my immediate concern about where the info is located,
so you may as well apply it with the array-terminator fix.

regards, tom lane

#10Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#9)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Tom Lane wrote:

However, what seems more worrisome to me is the prospect already
discussed that the codeset name we have in the table is not actually
recognized by gettext/iconv. Did we have a solution for that?

You get English.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#11Magnus Hagander
magnus@hagander.net
In reply to: Tom Lane (#9)
Re: Re: [COMMITTERS] pgsql: Explicitly bind gettext to the correct encoding on Windows.

Tom Lane wrote:

I don't know what it should be doing if it can't find a match, so I
haven't changed that behavior.

As things stand, it should throw error, except in the case of SQL_ASCII;
there is no excuse for any other database encoding to not be in the
table. However, what seems more worrisome to me is the prospect already
discussed that the codeset name we have in the table is not actually
recognized by gettext/iconv. Did we have a solution for that?

Anyway, this fixes my immediate concern about where the info is located,
so you may as well apply it with the array-terminator fix.

Done.

//Magnus