BUG #3932: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE exceeded

Started by Florian Wunderlichabout 18 years ago3 messagesbugs
Jump to latest
#1Florian Wunderlich
fwunderlich@factor3.de

The following bug has been logged online:

Bug reference: 3932
Logged by: Florian Wunderlich
Email address: fwunderlich@factor3.de
PostgreSQL version: 8.2.6
Operating system: Debian unstable
Description: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE
exceeded
Details:

- input file in encoding iso-8859-1:

set client_encoding='iso-8859-1';
select upper('ä'), lower('Ä');

(note: the argument to upper is a lower case a umlaut, while the argument to
lower is an upper case a umlaut)

- database "iso" with encoding iso-8859-1,
database "utf" with encoding utf-8,
both in a cluster with locale=de_DE

The command

psql iso < input

yields the correct output (upper case a umlaut, lower case a umlaut).

The command

psql utf < input

yields

PANIK: ERRORDATA_STACK_SIZE exceeded.
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost

The log shows:

ERROR: invalid byte sequence for encoding "UTF8": 0xe384
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".

then the same error four times but with 0xfc.

Doing the exact same thing with an input file with encoding utf-8 (with
client_encoding replaced accordingly) again works fine with the iso
database, but yields a lower case a umlaut for upper() and nothing for the
lower() function for the utf database.

Thus, it would seem that the upper() and lower() functions do not work at
all for databases with encoding utf-8 and non-US-ASCII input.

#2Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Florian Wunderlich (#1)
Re: BUG #3932: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE exceeded

Florian Wunderlich wrote:

- input file in encoding iso-8859-1:

set client_encoding='iso-8859-1';
select upper('�'), lower('�');

(note: the argument to upper is a lower case a umlaut, while the argument to
lower is an upper case a umlaut)

- database "iso" with encoding iso-8859-1,
database "utf" with encoding utf-8,
both in a cluster with locale=de_DE

I think this is just a case of a misconfigured server. If you choose
locale de_DE, which supports only the iso-8859-1 encoding, it is an
error to build a database with utf8 encoding -- which is why 8.3 rejects
that combination.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#3Florian Wunderlich
fwunderlich@factor3.de
In reply to: Alvaro Herrera (#2)
Re: BUG #3932: utf-8 and upper()/lower(): PANIC: ERRORDATA_STACK_SIZE exceeded

Alvaro Herrera wrote:

Florian Wunderlich wrote:

- input file in encoding iso-8859-1:

set client_encoding='iso-8859-1';
select upper('�'), lower('�');

(note: the argument to upper is a lower case a umlaut, while the argument to
lower is an upper case a umlaut)

- database "iso" with encoding iso-8859-1,
database "utf" with encoding utf-8,
both in a cluster with locale=de_DE

I think this is just a case of a misconfigured server. If you choose
locale de_DE, which supports only the iso-8859-1 encoding, it is an
error to build a database with utf8 encoding -- which is why 8.3 rejects
that combination.

You are correct; if I use de_DE.UTF-8 for initdb, the database with
encoding utf-8 works fine (and the database with iso-8859-1 doesn't).

Because such an invalid combination cannot happen for 8.3 anymore, the
PANIC cannot occur anymore, and thus the bug can be closed.