Multibyte char encoding atttypmod weirdness
Version: PostgreSQL 7.2.1 (7.3 not tested)
Summary:
When locale is set to multibyte char encoding languages,
such as ja_JP.eucjp, and char encoding set to EUC_JP, for the char(20)
columns (attributes), the libpq ((PGresult *)res)->attDescs[0].atttypmod
returned by PQfmod(res, 0) is not correct. It's neither 20, nor 20+4 as
reported in the hackers' mail list [1]http://archives.postgresql.org/pgsql-hackers/1998-03/msg00430.php, but something varying (which I
failed
to figure out). In my specific case, it's 25.
Is it a bug, or a feature that needs special care which is not
documented
in the postgresql documents? Is this extra byte overhead reflected by
VARHDRSZ? But a simple fgrep -r VARHDRSZ in the header files showed:
internal/c.h:#define VARHDRSZ ((int32) sizeof(int32))
internal/c.h: * always VARSIZE(ptr) - VARHDRSZ.
server/access/tuptoaster.h: VARHDRSZ))
server/utils/varbit.h:/* Header overhead *in addition to* VARHDRSZ */
server/utils/varbit.h:#define VARBITBYTES(PTR) (VARSIZE(PTR) -
VARHDRSZ - VARBITHDRSZ)
server/utils/varbit.h:
VARHDRSZ + VARBITHDRSZ)
server/c.h:#define VARHDRSZ ((int32) sizeof(int32))
server/c.h: * always VARSIZE(ptr) - VARHDRSZ.
which means VARHDRSZ should be sizeof(int32), which is always a constant
4
bytes. Is the VARBITHDRSZ relevant to this problem? But VARBITHDRSZ is
not
defined in any header files "make install-all-headers" installed.
BTW, if it's not a bug, this kind of implementation inconsistent with
common
sense is ugly and a potential of buggy code.
[1]: http://archives.postgresql.org/pgsql-hackers/1998-03/msg00430.php
"Huaxin WANG" <wanghx@netspeed-tech.com> writes:
When locale is set to multibyte char encoding languages,
such as ja_JP.eucjp, and char encoding set to EUC_JP, for the char(20)
columns (attributes), the libpq ((PGresult *)res)->attDescs[0].atttypmod
returned by PQfmod(res, 0) is not correct. It's neither 20, nor 20+4 as
reported in the hackers' mail list [1], but something varying (which I
failed
to figure out). In my specific case, it's 25.
I don't think so. A column declared as char(N) *will* have an atttypmod
of N+4. The actual physical length in bytes of a column entry might
be more, though, since we measure N in terms of characters not bytes.
regards, tom lane
Sorry but I made a mistake in describing the problem.
PQfmod(...) returns 20 + 4, but strlen(PQgetvalue(...)) returns
something varying, more than 24.
Since you said atttypmod is char len + 4, "The actual physical length in
bytes of a column entry might be more", it's dependant to the current
locale settings and multibyte/wide char related functions should be used
to calculate the byte length. Is there a simple and direct way to know
this byte lenght through libpq API? I will try to figure it out.
Thank you very much for you informative and helpful reply.
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Huaxin WANG" <wanghx@netspeed-tech.com>
Cc: <pgsql-bugs@postgresql.org>
Sent: Monday, February 24, 2003 11:07 PM
Subject: Re: [BUGS] Multibyte char encoding atttypmod weirdness
"Huaxin WANG" <wanghx@netspeed-tech.com> writes:
When locale is set to multibyte char encoding languages,
such as ja_JP.eucjp, and char encoding set to EUC_JP, for the
char(20)
columns (attributes), the libpq ((PGresult
*)res)->attDescs[0].atttypmod
returned by PQfmod(res, 0) is not correct. It's neither 20, nor
20+4 as
reported in the hackers' mail list [1], but something varying (which
I
failed
to figure out). In my specific case, it's 25.I don't think so. A column declared as char(N) *will* have an
atttypmod
Show quoted text
of N+4. The actual physical length in bytes of a column entry might
be more, though, since we measure N in terms of characters not bytes.regards, tom lane