pgsql: Explicitly bind gettext to the correct encoding on Windows.
Log Message:
-----------
Explicitly bind gettext to the correct encoding on Windows.
Original patch from Hiroshi Inoue.
Modified Files:
--------------
pgsql/src/backend/utils/mb:
mbutils.c (r1.77 -> r1.78)
(http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/mb/mbutils.c?r1=1.77&r2=1.78)
mha@postgresql.org (Magnus Hagander) writes:
Explicitly bind gettext to the correct encoding on Windows.
I have a couple of objections to this patch. First, what happens if
it fails to find a matching table entry? (The existing answer is
"nothing", but that doesn't seem right.) Second and more critical,
it adds still another data structure that has to be maintained when
the list of encodings changes, and it doesn't even live in the same
file as any existing encoding-information table.
What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.
regards, tom lane
Tom Lane wrote:
mha@postgresql.org (Magnus Hagander) writes:
Explicitly bind gettext to the correct encoding on Windows.
I have a couple of objections to this patch. First, what happens if
it fails to find a matching table entry? (The existing answer is
"nothing", but that doesn't seem right.) Second and more critical,
it adds still another data structure that has to be maintained when
the list of encodings changes, and it doesn't even live in the same
file as any existing encoding-information table.What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.
Would someone please comment on Tom's questions above.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote:
mha@postgresql.org (Magnus Hagander) writes:
Explicitly bind gettext to the correct encoding on Windows.
I have a couple of objections to this patch. First, what happens if
it fails to find a matching table entry? (The existing answer is
"nothing", but that doesn't seem right.) Second and more critical,
it adds still another data structure that has to be maintained when
the list of encodings changes, and it doesn't even live in the same
file as any existing encoding-information table.What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.
Do you mean a separate table there, or should we add a new column to one
of the existing tables?
//Magnus
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.
Do you mean a separate table there, or should we add a new column to one
of the existing tables?
Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.
regards, tom lane
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.Do you mean a separate table there, or should we add a new column to one
of the existing tables?Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.
I was about to start looking at this when that other thread
(http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
started about related issues on other platforms. Seems we should have a
"coordinated fix" for this, so I'm going to want and see what come sout
of that one. Unless I'm misunderstanding thigns and they're not related?
//Magnus
Magnus Hagander wrote:
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.Do you mean a separate table there, or should we add a new column to one
of the existing tables?Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.I was about to start looking at this when that other thread
(http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
started about related issues on other platforms. Seems we should have a
"coordinated fix" for this, so I'm going to want and see what come sout
of that one. Unless I'm misunderstanding thigns and they're not related?
I've committed a fairly trivial patch per Peter's suggestion to fix the
other thread's issue. I left the table as is, so whatever refactorings
were planned can now be applied.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote:
Magnus Hagander wrote:
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.Do you mean a separate table there, or should we add a new column to
one
of the existing tables?Whichever seems to make more sense is fine with me. I just don't want
add-an-encoding maintenance requirements spread across N different
source files.I was about to start looking at this when that other thread
(http://archives.postgresql.org//pgsql-hackers/2009-03/msg01270.php)
started about related issues on other platforms. Seems we should have a
"coordinated fix" for this, so I'm going to want and see what come sout
of that one. Unless I'm misunderstanding thigns and they're not related?I've committed a fairly trivial patch per Peter's suggestion to fix the
other thread's issue. I left the table as is, so whatever refactorings
were planned can now be applied.
Here's a patch that moves the table over to encnames.c, and renames it
to look like the others.
I don't know what it should be doing if it can't find a match, so I
haven't changed that behavior.
Comments?
//Magnus
Attachments:
enctable.patchtext/x-diff; name=enctable.patchDownload
*** a/src/backend/utils/mb/encnames.c
--- b/src/backend/utils/mb/encnames.c
***************
*** 431,436 **** pg_enc2name pg_enc2name_tbl[] =
--- 431,478 ----
};
/* ----------
+ * These are encoding names for gettext.
+ * ----------
+ */
+ pg_enc2gettext pg_enc2gettext_tbl[] =
+ {
+ {PG_UTF8, "UTF-8"},
+ {PG_LATIN1, "LATIN1"},
+ {PG_LATIN2, "LATIN2"},
+ {PG_LATIN3, "LATIN3"},
+ {PG_LATIN4, "LATIN4"},
+ {PG_ISO_8859_5, "ISO-8859-5"},
+ {PG_ISO_8859_6, "ISO_8859-6"},
+ {PG_ISO_8859_7, "ISO-8859-7"},
+ {PG_ISO_8859_8, "ISO-8859-8"},
+ {PG_LATIN5, "LATIN5"},
+ {PG_LATIN6, "LATIN6"},
+ {PG_LATIN7, "LATIN7"},
+ {PG_LATIN8, "LATIN8"},
+ {PG_LATIN9, "LATIN-9"},
+ {PG_LATIN10, "LATIN10"},
+ {PG_KOI8R, "KOI8-R"},
+ {PG_KOI8U, "KOI8-U"},
+ {PG_WIN1250, "CP1250"},
+ {PG_WIN1251, "CP1251"},
+ {PG_WIN1252, "CP1252"},
+ {PG_WIN1253, "CP1253"},
+ {PG_WIN1254, "CP1254"},
+ {PG_WIN1255, "CP1255"},
+ {PG_WIN1256, "CP1256"},
+ {PG_WIN1257, "CP1257"},
+ {PG_WIN1258, "CP1258"},
+ {PG_WIN866, "CP866"},
+ {PG_WIN874, "CP874"},
+ {PG_EUC_CN, "EUC-CN"},
+ {PG_EUC_JP, "EUC-JP"},
+ {PG_EUC_KR, "EUC-KR"},
+ {PG_EUC_TW, "EUC-TW"},
+ {PG_EUC_JIS_2004, "EUC-JP"}
+ };
+
+
+ /* ----------
* Encoding checks, for error returns -1 else encoding id
* ----------
*/
*** a/src/backend/utils/mb/mbutils.c
--- b/src/backend/utils/mb/mbutils.c
***************
*** 890,936 **** cliplen(const char *str, int len, int limit)
return l;
}
- #if defined(ENABLE_NLS)
- static const struct codeset_map {
- int encoding;
- const char *codeset;
- } codeset_map_array[] = {
- {PG_UTF8, "UTF-8"},
- {PG_LATIN1, "LATIN1"},
- {PG_LATIN2, "LATIN2"},
- {PG_LATIN3, "LATIN3"},
- {PG_LATIN4, "LATIN4"},
- {PG_ISO_8859_5, "ISO-8859-5"},
- {PG_ISO_8859_6, "ISO_8859-6"},
- {PG_ISO_8859_7, "ISO-8859-7"},
- {PG_ISO_8859_8, "ISO-8859-8"},
- {PG_LATIN5, "LATIN5"},
- {PG_LATIN6, "LATIN6"},
- {PG_LATIN7, "LATIN7"},
- {PG_LATIN8, "LATIN8"},
- {PG_LATIN9, "LATIN-9"},
- {PG_LATIN10, "LATIN10"},
- {PG_KOI8R, "KOI8-R"},
- {PG_KOI8U, "KOI8-U"},
- {PG_WIN1250, "CP1250"},
- {PG_WIN1251, "CP1251"},
- {PG_WIN1252, "CP1252"},
- {PG_WIN1253, "CP1253"},
- {PG_WIN1254, "CP1254"},
- {PG_WIN1255, "CP1255"},
- {PG_WIN1256, "CP1256"},
- {PG_WIN1257, "CP1257"},
- {PG_WIN1258, "CP1258"},
- {PG_WIN866, "CP866"},
- {PG_WIN874, "CP874"},
- {PG_EUC_CN, "EUC-CN"},
- {PG_EUC_JP, "EUC-JP"},
- {PG_EUC_KR, "EUC-KR"},
- {PG_EUC_TW, "EUC-TW"},
- {PG_EUC_JIS_2004, "EUC-JP"}
- };
- #endif /* ENABLE_NLS */
-
void
SetDatabaseEncoding(int encoding)
{
--- 890,895 ----
***************
*** 969,980 **** pg_bind_textdomain_codeset(const char *domainname)
return;
#endif
! for (i = 0; i < lengthof(codeset_map_array); i++)
{
! if (codeset_map_array[i].encoding == encoding)
{
if (bind_textdomain_codeset(domainname,
! codeset_map_array[i].codeset) == NULL)
elog(LOG, "bind_textdomain_codeset failed");
break;
}
--- 928,939 ----
return;
#endif
! for (i = 0; pg_enc2gettext_tbl[i].name != NULL; i++)
{
! if (pg_enc2gettext_tbl[i].encoding == encoding)
{
if (bind_textdomain_codeset(domainname,
! pg_enc2gettext_tbl[i].name) == NULL)
elog(LOG, "bind_textdomain_codeset failed");
break;
}
*** a/src/include/mb/pg_wchar.h
--- b/src/include/mb/pg_wchar.h
***************
*** 262,267 **** typedef struct pg_enc2name
--- 262,278 ----
extern pg_enc2name pg_enc2name_tbl[];
/*
+ * Encoding names for gettext
+ */
+ typedef struct pg_enc2gettext
+ {
+ pg_enc encoding;
+ const char *name;
+ } pg_enc2gettext;
+
+ extern pg_enc2gettext pg_enc2gettext_tbl[];
+
+ /*
* pg_wchar stuff
*/
typedef int (*mb2wchar_with_len_converter) (const unsigned char *from,
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
What makes more sense to me is to add a table to encnames.c that
provides the gettext name of every encoding that we support.
Here's a patch that moves the table over to encnames.c, and renames it
to look like the others.
I think you forgot to include the NULL terminating entry that the
loop seems to be expecting. Also, why isn't the array "const"?
I don't know what it should be doing if it can't find a match, so I
haven't changed that behavior.
As things stand, it should throw error, except in the case of SQL_ASCII;
there is no excuse for any other database encoding to not be in the
table. However, what seems more worrisome to me is the prospect already
discussed that the codeset name we have in the table is not actually
recognized by gettext/iconv. Did we have a solution for that?
Anyway, this fixes my immediate concern about where the info is located,
so you may as well apply it with the array-terminator fix.
regards, tom lane
Tom Lane wrote:
However, what seems more worrisome to me is the prospect already
discussed that the codeset name we have in the table is not actually
recognized by gettext/iconv. Did we have a solution for that?
You get English.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote:
I don't know what it should be doing if it can't find a match, so I
haven't changed that behavior.As things stand, it should throw error, except in the case of SQL_ASCII;
there is no excuse for any other database encoding to not be in the
table. However, what seems more worrisome to me is the prospect already
discussed that the codeset name we have in the table is not actually
recognized by gettext/iconv. Did we have a solution for that?Anyway, this fixes my immediate concern about where the info is located,
so you may as well apply it with the array-terminator fix.
Done.
//Magnus