trouble with to_char('L')
Hi,
my database has UTF8 encoding and Finnish locale, the client_encoding
and the console is set to WIN1252. I created a table with a single
NUMERIC(5,2) column and inserted a few values. Running a query 'SELECT
to_char(money, '999D99L') FROM table' through psql gives the following
error message:
ERROR: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
The graphical Query tool returns a set of empty rows. The query works
ok without the 'L'.
Thanks in advance,
Mikko
Mikko wrote:
my database has UTF8 encoding and Finnish locale, the client_encoding
and the console is set to WIN1252. I created a table with a single
NUMERIC(5,2) column and inserted a few values. Running a query 'SELECT
to_char(money, '999D99L') FROM table' through psql gives the following
error message:ERROR: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".The graphical Query tool returns a set of empty rows. The query works
ok without the 'L'.
That is strange.
What is your psql version?
What is the output of the following commands:
SHOW server_version;
SHOW server_encoding;
SHOW client_encoding;
SHOW lc_numeric;
SHOW lc_monetary;
SELECT to_char(3.1415::numeric(5,2), '999D99L');
Yours,
Laurenz Albe
psql (PostgreSQL) 8.3.7
server_version 8.3.7
server_encoding UTF8
client_encoding win1252
lc_numeric Finnish, Finland
lc_monetary Finnish, Finland
testdb=# SELECT to_char(3.1415::numeric(5,2), '999D99L');
ERROR: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
If connected to postgres database the query returns 3,14.
Mikko
Mikko escribi�:
psql (PostgreSQL) 8.3.7
server_version 8.3.7
server_encoding UTF8
client_encoding win1252
lc_numeric Finnish, Finland
lc_monetary Finnish, Finlandtestdb=# SELECT to_char(3.1415::numeric(5,2), '999D99L');
ERROR: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".
FWIW 0x80 is the Euro symbol in Win1252 according to
http://en.wikipedia.org/wiki/Windows-1252
Maybe the problem here is that the chosen locales are not UTF8. Does it
work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
instead? Those should match the server_encoding.
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Tue, Apr 21, 2009 at 8:13 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Maybe the problem here is that the chosen locales are not UTF8. Does it
work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
instead? Those should match the server_encoding.
alter database testdb set lc_monetary(or numeric) to
'Finnish_Finland.65001' returns:
ERROR: invalid value for parameter "lc_monetary": "Finnish_Finland.65001"
However, I noticed that both lc_collate and lc_ctype are set to
Finnish_Finland.1252 by the installer. Should I have just run initdb
with --locale fi_FI.UTF8 at the very start? The to_char('L') works
fine with a database with win1252 encoding.
Mikko
Mikko escribi�:
On Tue, Apr 21, 2009 at 8:13 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:Maybe the problem here is that the chosen locales are not UTF8. �Does it
work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
instead? �Those should match the server_encoding.alter database testdb set lc_monetary(or numeric) to
'Finnish_Finland.65001' returns:
ERROR: invalid value for parameter "lc_monetary": "Finnish_Finland.65001"
Ouch ... I thought that was the way that Windows designated UTF8
locales, but maybe I am wrong.
However, I noticed that both lc_collate and lc_ctype are set to
Finnish_Finland.1252 by the installer. Should I have just run initdb
with --locale fi_FI.UTF8 at the very start? The to_char('L') works
fine with a database with win1252 encoding.
Hmm, it should have disallowed the creation of an UTF8 database then.
Maybe that part is what is broken here.
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Wed, Apr 22, 2009 at 2:13 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Ouch ... I thought that was the way that Windows designated UTF8
locales, but maybe I am wrong.
Ok, now I found out that Windows doesn't support locales with encoding
using more than two bytes per character and initdb falls back to 1252.
http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
I guess I'll have to manage with win1252 encoded dbs for the moment.
Thanks for the answers!
Mikko
Mikko escribi�:
On Wed, Apr 22, 2009 at 2:13 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:Ouch ... I thought that was the way that Windows designated UTF8
locales, but maybe I am wrong.Ok, now I found out that Windows doesn't support locales with encoding
using more than two bytes per character and initdb falls back to 1252.
Hmm.
Does this imply that we shouldn't allow UTF8 database on Windows at all?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes:
Does this imply that we shouldn't allow UTF8 database on Windows at all?
That would be pretty unfortunate :-(
I think what this suggests is that there probably needs to be some
encoding conversion logic near the places we examine localeconv()
output.
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Does this imply that we shouldn't allow UTF8 database on Windows at all?
That would be pretty unfortunate :-(
I think what this suggests is that there probably needs to be some
encoding conversion logic near the places we examine localeconv()
output.
Attached is a patch to the current CVS.
It uses a similar way like LC_TIME stuff does.
regards,
Hiroshi Inoue
Attachments:
pg_locale_20090423.patchtext/plain; name=pg_locale_20090423.patchDownload+82-0
Hiroshi Inoue <inoue@tpf.co.jp> writes:
Tom Lane wrote:
I think what this suggests is that there probably needs to be some
encoding conversion logic near the places we examine localeconv()
output.
Attached is a patch to the current CVS.
It uses a similar way like LC_TIME stuff does.
I'm not really in a position to test/commit this, since I don't have a
Windows machine. However, since no one else is stepping up to deal with
it, here's a quick review:
* This seems to be assuming that the user has set LC_MONETARY and
LC_NUMERIC the same. What if they're different?
* What if the selected locale corresponds to Unicode (ie UTF16)
encoding?
* #define'ing strdup() to do something rather different from strdup
seems pretty horrid from the standpoint of code readability and
maintainability, especially with nary a comment explaining it.
* Code will dump core on malloc failure.
* Since this code is surely not performance critical, I wouldn't bother
with trying to optimize it; hence drop the special case for all-ASCII.
* Surely we already have a symbol somewhere that can be used in
place of this:
#define MAX_BYTES_PER_CHARACTER 4
regards, tom lane
Tom Lane wrote:
Hiroshi Inoue <inoue@tpf.co.jp> writes:
Tom Lane wrote:
I think what this suggests is that there probably needs to be some
encoding conversion logic near the places we examine localeconv()
output.Attached is a patch to the current CVS.
It uses a similar way like LC_TIME stuff does.I'm not really in a position to test/commit this, since I don't have a
Windows machine. However, since no one else is stepping up to deal with
it, here's a quick review:
Thanks for the review.
I've forgotten the patch because Japanese doesn't have trouble with
this issue (the currency symbol is ascii \). If this is really
expected to be fixed, I would update the patch according to your
suggestion.
* This seems to be assuming that the user has set LC_MONETARY and
LC_NUMERIC the same. What if they're different?
Strictky speaking they should be handled individually.
* What if the selected locale corresponds to Unicode (ie UTF16)
encoding?
As far as I tested set_locale(LC_MONETARY, xxx.65001) causes an error.
* #define'ing strdup() to do something rather different from strdup
seems pretty horrid from the standpoint of code readability and
maintainability, especially with nary a comment explaining it.
Maybe using a function instead of strdup() which calls dbstr_win32()
in case of Windows would be better.
BTW grouping and money_grouping seem to be out of encoding conversion.
Are they guaranteed to be null terminated?
* Code will dump core on malloc failure.
I can take care of it.
* Since this code is surely not performance critical, I wouldn't bother
with trying to optimize it; hence drop the special case for all-ASCII.
I can take care of it.
* Surely we already have a symbol somewhere that can be used in
place of this:
#define MAX_BYTES_PER_CHARACTER 4
I can't find it.
max(pg_encoding_max_length(encoding), pg_encoding_max_length(PG_UTF8))
may be better.
regards,
Hiroshi Inoue
Hiroshi Inoue <inoue@tpf.co.jp> writes:
Tom Lane wrote:
* This seems to be assuming that the user has set LC_MONETARY and
LC_NUMERIC the same. What if they're different?
Strictky speaking they should be handled individually.
I thought about this some more, and I wonder why you did it like this at
all. The patch claimed to be copying the LC_TIME code, but the LC_TIME
code isn't trying to temporarily change any locale settings. What we
are doing in that code is assuming that the system will give us back
the localized strings in the encoding identified by CP_ACP; so all we
have to do is convert CP_ACP to wide chars and then to UTF8. Can't we
use a similar approach for the output of localeconv?
regards, tom lane
Tom Lane wrote:
Hiroshi Inoue <inoue@tpf.co.jp> writes:
Tom Lane wrote:
* This seems to be assuming that the user has set LC_MONETARY and
LC_NUMERIC the same. What if they're different?Strictky speaking they should be handled individually.
I thought about this some more, and I wonder why you did it like this at
all. The patch claimed to be copying the LC_TIME code, but the LC_TIME
code isn't trying to temporarily change any locale settings.
LC_TIME and LC_CTYPE (on Windows) settings are changed temporarily
in cache_locale_time() in pg_locale.c.
What we
are doing in that code is assuming that the system will give us back
the localized strings in the encoding identified by CP_ACP;
AFAIK it's not right. LC_TIME, LC_MONETARY or LC_NUMERIC related output
is encoded using LC_CTYPE setting.
so all we
have to do is convert CP_ACP to wide chars and then to UTF8. Can't we
use a similar approach for the output of localeconv?
What LC_CTIME code and my patch intend is setting LC_CTYPE to an
appropriate value so that related output is converted correctly.
If we can set LC_CTYPE to xxx_xxx.65001(UTF8), we can eliminate
two steps but it causes an error on Windows.
regards,
HIroshi Inoue
Where are we on this issue?
---------------------------------------------------------------------------
Hiroshi Inoue wrote:
Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Does this imply that we shouldn't allow UTF8 database on Windows at all?
That would be pretty unfortunate :-(
I think what this suggests is that there probably needs to be some
encoding conversion logic near the places we examine localeconv()
output.Attached is a patch to the current CVS.
It uses a similar way like LC_TIME stuff does.regards,
Hiroshi Inoue
Index: pg_locale.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v retrieving revision 1.49 diff -c -c -r1.49 pg_locale.c *** pg_locale.c 1 Apr 2009 09:17:32 -0000 1.49 --- pg_locale.c 22 Apr 2009 21:08:33 -0000 *************** *** 386,391 **** --- 386,449 ---- free(s->positive_sign); }+ #ifdef WIN32 + #define MAX_BYTES_PER_CHARACTER 4 + static char *dbstr_win32(bool matchenc, const char *str) + { + int encoding = GetDatabaseEncoding(); + bool is_ascii = true; + size_t len, ilen, wclen, dstlen; + wchar_t *wbuf; + char *dst, *ibuf; + + if (matchenc) + return strdup(str); + /* Is the str an ascii string ? */ + for (ibuf = str; *ibuf; ibuf++) + { + if (!isascii(*ibuf)) + { + is_ascii = false; + break; + } + } + /* Simply returns the strdup()ed ascii string */ + if (is_ascii) + return strdup(str); + + ilen = strlen(str) + 1; + wclen = ilen * sizeof(wchar_t); + wbuf = (wchar_t *) palloc(wclen); + len = mbstowcs(wbuf, str, ilen); + if (len == -1) + elog(ERROR, + "could not convert string to Wide characters:error %lu", GetLastError()); + + dstlen = len * MAX_BYTES_PER_CHARACTER + 1; + dst = malloc(dstlen); + + len = WideCharToMultiByte(CP_UTF8, 0, wbuf, len, dst, dstlen, NULL, NULL); + pfree(wbuf); + if (len == 0) + elog(ERROR, + "could not convert string to UTF-8:error %lu", GetLastError()); + + dst[len] = '\0'; + if (encoding != PG_UTF8) + { + char *convstr = pg_do_encoding_conversion(dst, len, PG_UTF8, encoding); + if (dst != convstr) + { + strlcpy(dst, convstr, dstlen); + pfree(convstr); + } + } + + return dst; + } + + #define strdup(str) dbstr_win32(is_encoding_match, str) + #endif /* WIN32 *//* * Return the POSIX lconv struct (contains number/money formatting *************** *** 398,403 **** --- 456,466 ---- struct lconv *extlconv; char *save_lc_monetary; char *save_lc_numeric; + #ifdef WIN32 + char *save_lc_ctype = NULL; + bool lc_ctype_change = false, is_encoding_match; + #endif /* WIN32 */ +/* Did we do it already? */ if (CurrentLocaleConvValid) *************** *** 413,418 **** --- 476,492 ---- if (save_lc_numeric) save_lc_numeric = pstrdup(save_lc_numeric);+ #ifdef WIN32 + save_lc_ctype = setlocale(LC_CTYPE, NULL); + if (save_lc_ctype && stricmp(locale_monetary, save_lc_ctype) != 0) + { + lc_ctype_change = true; + save_lc_ctype = pstrdup(save_lc_ctype); + setlocale(LC_CTYPE, locale_monetary); + } + is_encoding_match = (pg_get_encoding_from_locale(locale_monetary) == GetDatabaseEncoding()); + #endif + setlocale(LC_MONETARY, locale_monetary); setlocale(LC_NUMERIC, locale_numeric);*************** *** 437,442 **** --- 511,524 ---- CurrentLocaleConv.n_sign_posn = extlconv->n_sign_posn;/* Try to restore internal settings */ + #ifdef WIN32 + #undef strdup + if (lc_ctype_change) + { + setlocale(LC_CTYPE, save_lc_ctype); + pfree(save_lc_ctype); + } + #endif /* WIN32 */ if (save_lc_monetary) { setlocale(LC_MONETARY, save_lc_monetary);--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Where are we on this issue?
According to my files, I complained about the extreme ugliness of the
patch (redefining strdup for pete's sake) and the fact that it did not
actually do things anything like the LC_TIME code as was claimed.
Hiroshi rejected those criticisms. I don't know where we are, but
I don't want to see this patch applied in this form.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Where are we on this issue?
According to my files, I complained about the extreme ugliness of the
patch (redefining strdup for pete's sake) and the fact that it did not
actually do things anything like the LC_TIME code as was claimed.
Hiroshi rejected those criticisms. I don't know where we are, but
I don't want to see this patch applied in this form.
Right, but you are saying it is still an open issue, which says we
should look at it.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Right, but you are saying it is still an open issue, which says we
should look at it.
Sure. Maybe put it on TODO?
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Right, but you are saying it is still an open issue, which says we
should look at it.Sure. Maybe put it on TODO?
OK, TODO is:
Fix locale-aware handling (e.g. monetary) for specific
server/client encoding combinations
* http://archives.postgresql.org/pgsql-general/2009-04/msg00799.php
If someone wants to work on it, go ahead.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
Where are we on this issue?
Oops I forgot it completely.
I have a little improved version and would post it tonight.
regards,
Hiroshi Inoue
Show quoted text
---------------------------------------------------------------------------
Hiroshi Inoue wrote:
Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
Does this imply that we shouldn't allow UTF8 database on Windows at all?
That would be pretty unfortunate :-(
I think what this suggests is that there probably needs to be some
encoding conversion logic near the places we examine localeconv()
output.Attached is a patch to the current CVS.
It uses a similar way like LC_TIME stuff does.regards,
Hiroshi Inoue