Composite Type with Domain

Started by 维姜about 20 years ago11 messagesbugs

jw.pgsql@sduept.com

about 20 years ago

# pg8.1.3

=> CREATE DOMAIN d_1 integer CHECK (VALUE < 10);
=> CREATE TYPE t_1 AS (m d_1);
=> SELECT '(100)':: t_1;
t_1
-------
(100)
(1 row)

=> SELECT row(100):: t_1;
错误: 域 d_1 的值违反了检查约束 "d_1_check"

=> \encoding ISO_8859_1
=> SELECT row(100):: t_1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: 维姜 (#1)

Re: Composite Type with Domain

=?UTF-8?Q?=E7=BB=B4_?= =?UTF-8?Q?=E5=A7=9C?= <jw.pgsql@sduept.com> writes:

=> \encoding ISO_8859_1
=> SELECT row(100):: t_1;
server closed the connection unexpectedly

Works for me:

regression=# SELECT row(100):: t_1;
ERROR: value for domain d_1 violates check constraint "d_1_check"
regression=# \encoding ISO_8859_1
regression=# SELECT row(100):: t_1;
ERROR: value for domain d_1 violates check constraint "d_1_check"

Please provide more details, like your locale and encoding settings.

regards, tom lane

维姜

jw.pgsql@sduept.com

about 20 years ago

In reply to: Tom Lane (#2)

Re: Composite Type with Domain

* BUG #1:

=> SELECT '(100)':: t_1;
t_1
-------
(100)
(1 row)

-------------------------------------------------------------------

* BUG #2:

=> \encoding
UTF8
=> show server_encoding ;
server_encoding
-----------------
UTF8

[jw@dell ~]$ locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

在 2006-04-04二的 01:46 -0400，Tom Lane写道：

Show quoted text

=?UTF-8?Q?=E7=BB=B4_?= =?UTF-8?Q?=E5=A7=9C?= <jw.pgsql@sduept.com> writes:

=> \encoding ISO_8859_1
=> SELECT row(100):: t_1;
server closed the connection unexpectedly

Works for me:

regression=# SELECT row(100):: t_1;
ERROR: value for domain d_1 violates check constraint "d_1_check"
regression=# \encoding ISO_8859_1
regression=# SELECT row(100):: t_1;
ERROR: value for domain d_1 violates check constraint "d_1_check"

Please provide more details, like your locale and encoding settings.

regards, tom lane

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: 维姜 (#3)

NLS vs error processing, again (was Re: Composite Type with Domain)

JiangWei <jw.pgsql@sduept.com> writes:

LANG=zh_CN.UTF-8
[ set client_encoding to LATIN1 and provoke an error ]

OK, I can reproduce the crash after initdb'ing with that LANG setting
(in an nls-enabled build). The postmaster log fills with a whole lot
of occurrences of

警告: 忽略不能转换的 UTF-8 字符 0x00e9
警告: 忽略不能转换的 UTF-8 字符 0x00e8
警告: 忽略不能转换的 UTF-8 字符 0x00e8
警告: 忽略不能转换的 UTF-8 字符 0x00e8
比致命错误还过分的错误: ERRORDATA_STACK_SIZE exceeded

Tracing through the dump shows that the error-handling code is
recursively producing this warning while trying to translate the word
WARNING to LATIN1. The zh_CN.po file shows the translation as

#: utils/error/elog.c:1909
msgid "WARNING"
msgstr "��"

(which apparently is GB2312?) and what's actually getting passed to
utf8_to_iso8859_1() is

(gdb) x/6o str
0x8b89d8: 0350 0255 0246 0345 0221 0212

I have no idea if this is a correct UTF8 transliteration of the GB2312
phrase --- can anyone confirm? But anyway, if this is Chinese then it's
hardly surprising that there would be no LATIN1 equivalent. And then
trying to report the problem gets us into a new instance of the same
problem. Even the code that's supposed to stop error recursion doesn't
get us out of it.

It seems to me that there basically is no graceful solution to this sort
of mismatch. It might be possible to kluge things so that we disable
NLS once we've recursed too many times in error processing, but that's
surely pretty ugly. What would be a lot more user-friendly would be to
refuse the attempt to set client_encoding to something that can't handle
our error message encoding, but I don't know what a reasonable set of
restrictions would be.

Comments?

regards, tom lane

Euler Taveira de Oliveira

euler@timbira.com

about 20 years ago

In reply to: Tom Lane (#4)

Re: NLS vs error processing, again (was Re: Composite Type

Tom Lane wrote:

It seems to me that there basically is no graceful solution to this sort
of mismatch. It might be possible to kluge things so that we disable
NLS once we've recursed too many times in error processing, but that's
surely pretty ugly. What would be a lot more user-friendly would be to
refuse the attempt to set client_encoding to something that can't handle
our error message encoding, but I don't know what a reasonable set of
restrictions would be.

Maybe it's the time to convert all PO files to UTF-8. I'm in process to
convert pt_BR ones.

--
Euler Taveira de Oliveira

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: Euler Taveira de Oliveira (#5)

Re: NLS vs error processing, again (was Re: Composite Type with Domain)

Euler Taveira de Oliveira <euler@timbira.com> writes:

Tom Lane wrote:

It seems to me that there basically is no graceful solution to this sort
of mismatch. It might be possible to kluge things so that we disable
NLS once we've recursed too many times in error processing, but that's
surely pretty ugly. What would be a lot more user-friendly would be to
refuse the attempt to set client_encoding to something that can't handle
our error message encoding, but I don't know what a reasonable set of
restrictions would be.

Maybe it's the time to convert all PO files to UTF-8. I'm in process to
convert pt_BR ones.

What does that have to do with it?

regards, tom lane

Tatsuo Ishii

t-ishii@sra.co.jp

about 20 years ago

In reply to: Tom Lane (#4)

Re: NLS vs error processing, again

JiangWei <jw.pgsql@sduept.com> writes:

LANG=zh_CN.UTF-8
[ set client_encoding to LATIN1 and provoke an error ]

OK, I can reproduce the crash after initdb'ing with that LANG setting
(in an nls-enabled build). The postmaster log fills with a whole lot
of occurrences of

��: �� UTF-8 �� 0x00e9
��: �� UTF-8 �� 0x00e8
��: �� UTF-8 �� 0x00e8
��: �� UTF-8 �� 0x00e8
��: ERRORDATA_STACK_SIZE exceeded

Tracing through the dump shows that the error-handling code is
recursively producing this warning while trying to translate the word
WARNING to LATIN1. The zh_CN.po file shows the translation as

#: utils/error/elog.c:1909
msgid "WARNING"
msgstr "��"

(which apparently is GB2312?)

It seems. zh_CN.po has the line:

"Content-Type: text/plain; charset=GB2312\n"

Which means at least someone who wrote the file intended to be it as
GB2312. However, please note that GB2312 is a character set, not an
encoding. The reality is that the file seems encoded in EUC-CN. Note
that I have confirmed this by just examining the bytes above
(��) are correct EUC-CN byte sequences. It is posibble
that the file is not written in EUC-CN, but I guess it's hardly
possible.

and what's actually getting passed to
utf8_to_iso8859_1() is

(gdb) x/6o str
0x8b89d8: 0350 0255 0246 0345 0221 0212

I have no idea if this is a correct UTF8 transliteration of the GB2312
phrase --- can anyone confirm?

As fas as looking into utils/mb/Unicode/euc_cn_to_utf8.map, the
translation above seems to be correct. BTW, who does the translation
from EUC-CN to UTF-8? Maybe gettext()?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Show quoted text

But anyway, if this is Chinese then it's
hardly surprising that there would be no LATIN1 equivalent. And then
trying to report the problem gets us into a new instance of the same
problem. Even the code that's supposed to stop error recursion doesn't
get us out of it.

It seems to me that there basically is no graceful solution to this sort
of mismatch. It might be possible to kluge things so that we disable
NLS once we've recursed too many times in error processing, but that's
surely pretty ugly. What would be a lot more user-friendly would be to
refuse the attempt to set client_encoding to something that can't handle
our error message encoding, but I don't know what a reasonable set of
restrictions would be.

Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: Tatsuo Ishii (#7)

Re: NLS vs error processing, again

Tatsuo Ishii <ishii@sraoss.co.jp> writes:

As fas as looking into utils/mb/Unicode/euc_cn_to_utf8.map, the
translation above seems to be correct. BTW, who does the translation
from EUC-CN to UTF-8? Maybe gettext()?

I'm far from an expert on this, but the gettext documentation indicates
that it tries to translate the .po file contents into whatever encoding
is implied by LC_CTYPE. The fact that the string passed to
utf8_to_iso8859_1 is not identical to the .po file contents indicates
that gettext is doing *something*. I'm a bit worried that this
translation could be out of step with what we will expect the
server_encoding to be --- but there's not any immediate evidence of
that.

Anyway, the real problem seems to be what to do if translation of an
error message to the client_encoding fails. That's clearly a risk even
if gettext has behaved perfectly.

regards, tom lane

Alvaro Herrera

alvherre@2ndquadrant.com

about 20 years ago

In reply to: Euler Taveira de Oliveira (#5)

Re: NLS vs error processing, again (was Re: Composite Type

Euler Taveira de Oliveira wrote:

Tom Lane wrote:

It seems to me that there basically is no graceful solution to this sort
of mismatch. It might be possible to kluge things so that we disable
NLS once we've recursed too many times in error processing, but that's
surely pretty ugly. What would be a lot more user-friendly would be to
refuse the attempt to set client_encoding to something that can't handle
our error message encoding, but I don't know what a reasonable set of
restrictions would be.

Maybe it's the time to convert all PO files to UTF-8. I'm in process to
convert pt_BR ones.

I don't understand what do you think would be gained by doing that. If
the message has chinese chars, a recode from UTF8 to Latin1 is as bad as
GB1232 to Latin1.

What needs to be done for this to work is to refuse trying to recode, as
Tom proposes above. We would need to determine what recodes are "safe";
for example, (I think) valid encodings to Latin1 (iso 8859-1) are from
Latin9 (iso 8859-15 ?), Unicode and Win1252 and ASCII. If the server
encoding or the encoding of the message files is a chinese encoding,
setting client_encoding to latin1 would raise an error.

The problem, I think, would be in determining what recodings are sane.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#10

Peter Eisentraut

peter_e@gmx.net

about 20 years ago

In reply to: Tom Lane (#8)

Re: NLS vs error processing, again

Tom Lane wrote:

I'm far from an expert on this, but the gettext documentation
indicates that it tries to translate the .po file contents into
whatever encoding is implied by LC_CTYPE.

Correct. That is just one more reason to have server encoding,
LC_COLLATE, and LC_CTYPE matching. In practice, there is hardly a
reason to have LC_COLLATE and LC_CTYPE be different, so the problem
should not be that big.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

#11

Bruce Momjian

bruce@momjian.us

almost 20 years ago

In reply to: Peter Eisentraut (#10)

Re: NLS vs error processing, again

Peter Eisentraut wrote:

Tom Lane wrote:

I'm far from an expert on this, but the gettext documentation
indicates that it tries to translate the .po file contents into
whatever encoding is implied by LC_CTYPE.

Correct. That is just one more reason to have server encoding,
LC_COLLATE, and LC_CTYPE matching. In practice, there is hardly a
reason to have LC_COLLATE and LC_CTYPE be different, so the problem
should not be that big.

Is there any TODO here?

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +