Strange query results with invalid multibyte character
Hi,
I have a strange problem.
I have 2 database, both created with:
CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default;
Running the queries below on the first database is OK and do what I expect.
If I create a backup of the first datase and restore that backup in the second database the following happens:
The first query (see below) returns 17 records, all containing 'Boonk%'.
The second query (see below)returns 'ERROR: invalid multibyte character for locale'
Query 1:
select lastname from salesordercustomer where lower(lastname) like 'boonk%'
Query 2:
select lastname from salesordercustomer where upper(lastname) like 'BOONK%'
Both database are running *exactly* the same version of Debian Etch AMD64 and *exactly* the same version of PostgreSQL (8.1.4 for AMD64) ....
Can anyone explain this to me?
TIA
Joost
Hi Joost. (hoe gaat het ermee?)
I would like to test this too. Could you please provide the data you are
inserting into the database.
Regards,
Gevik
On Wed, 2006-09-27 at 13:13 +0200, Joost Kraaijeveld wrote:
Hi,
I have a strange problem.
I have 2 database, both created with:
CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default;Running the queries below on the first database is OK and do what I expect.
If I create a backup of the first datase and restore that backup in the second database the following happens:
The first query (see below) returns 17 records, all containing 'Boonk%'.
The second query (see below)returns 'ERROR: invalid multibyte character for locale'Query 1:
select lastname from salesordercustomer where lower(lastname) like 'boonk%'Query 2:
select lastname from salesordercustomer where upper(lastname) like 'BOONK%'Both database are running *exactly* the same version of Debian Etch AMD64 and *exactly* the same version of PostgreSQL (8.1.4 for AMD64) ....
Can anyone explain this to me?
TIA
Joost
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Regards,
Gevik Babakhani
"Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes:
I have 2 database, both created with:
CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default;
But have they got the same locale settings (lc_collate, lc_ctype)?
regards, tom lane
Hi Tom,
On Wed, 2006-09-27 at 11:00 -0400, Tom Lane wrote:
"Joost Kraaijeveld" <J.Kraaijeveld@Askesis.nl> writes:
I have 2 database, both created with:
CREATE DATABASE muntdev WITH OWNER = postgres ENCODING = 'UTF8' TABLESPACE = pg_default;But have they got the same locale settings (lc_collate, lc_ctype)?
According to the postgresql.conf of the *working* database the locales
are:
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
According to the other obe:
lc_messages = 'en_US'
lc_monetary = 'en_US'
lc_numeric = 'en_US'
lc_time = 'en_US'
Could this be the difference? Is there any documentation available
somewhere, on how these locale settings work and interact (in
combination with the server and/or client platform???), besides the
explanation in the PostgreSQL manual
(http://www.postgresql.org/docs/8.1/interactive/charset.html#LOCALE)
(which is to terse for me to understand)?
TIA
--
Groeten,
Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes:
On Wed, 2006-09-27 at 11:00 -0400, Tom Lane wrote:
But have they got the same locale settings (lc_collate, lc_ctype)?
According to the postgresql.conf of the *working* database the locales
are:
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
According to the other obe:
lc_messages = 'en_US'
lc_monetary = 'en_US'
lc_numeric = 'en_US'
lc_time = 'en_US'
You did not show us the settings that actually count, but based on this
I'm guessing they are en_US also. What you need to find out next is
what character set encoding that locale implies on your machine. I'm
betting it's not utf8 though :-(
regards, tom lane
On Wed, 2006-09-27 at 12:10 -0400, Tom Lane wrote:
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes:
On Wed, 2006-09-27 at 11:00 -0400, Tom Lane wrote:
But have they got the same locale settings (lc_collate, lc_ctype)?
According to the postgresql.conf of the *working* database the locales
are:
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'According to the other obe:
lc_messages = 'en_US'
lc_monetary = 'en_US'
lc_numeric = 'en_US'
lc_time = 'en_US'You did not show us the settings that actually count, but based on this
I'm guessing they are en_US also
Ah, sorry: "show lc_collate" and "show lc_ctype" confirm your guess.
What you need to find out next is
what character set encoding that locale implies on your machine. I'm
betting it's not utf8 though :-(
I am not sure what you mean by that but maybe this helps:
the output of "locale" on the working machine is:
LANG=
LANGUAGE=en_US:en_GB
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
The output of the same command on the non-working machine:
LANG=en_US
LANGUAGE=en_NL:en_US:en_GB:en
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=
If this is not what you mean, could you help me in the right direction?
--
Groeten,
Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes:
On Wed, 2006-09-27 at 12:10 -0400, Tom Lane wrote:
What you need to find out next is
what character set encoding that locale implies on your machine. I'm
betting it's not utf8 though :-(
If this is not what you mean, could you help me in the right direction?
Try this command:
LANG=en_US locale charmap
regards, tom lane
Hi Tom,
Thanks for putting up with the questions.
On Wed, 2006-09-27 at 14:58 -0400, Tom Lane wrote:
LANG=en_US locale charmap
I have done this on both machines:
The working machine says: ISO-8859-1
The not working machine says: ISO-8859-1
I still do not understand what is happening and maybe more important,
how I can solve this (without re-installing the OS / database).
A better understanding of how the server OS, database, codepages, client
OS, charmaps etc work would be nice....
--
Groeten,
Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> writes:
On Wed, 2006-09-27 at 14:58 -0400, Tom Lane wrote:
LANG=en_US locale charmap
The working machine says: ISO-8859-1
The not working machine says: ISO-8859-1
OK, so the problem is that you have a locale that expects ISO-8859-1
encoding, but the database is set to UTF8 encoding, and that means
it may contain characters that the locale can't deal with.
Locale-dependent operations ... like upper() ... will fail when that
happens.
If we had a more reliable way of telling what encoding the locale
expects, we'd probably forbid creation of databases with incompatible
encodings.
I still do not understand what is happening and maybe more important,
how I can solve this (without re-installing the OS / database).
If you want to use UTF8 then you're stuck with a re-initdb I'm afraid.
regards, tom lane