Multibyte char encoding atttypmod weirdness

Started by Huaxin WANGover 23 years ago3 messagesbugs

wanghx@netspeed-tech.com

over 23 years ago

Version: PostgreSQL 7.2.1 (7.3 not tested)

Summary:

When locale is set to multibyte char encoding languages,
such as ja_JP.eucjp, and char encoding set to EUC_JP, for the char(20)
columns (attributes), the libpq ((PGresult *)res)->attDescs[0].atttypmod
returned by PQfmod(res, 0) is not correct. It's neither 20, nor 20+4 as
reported in the hackers' mail list [1]http://archives.postgresql.org/pgsql-hackers/1998-03/msg00430.php, but something varying (which I
failed
to figure out). In my specific case, it's 25.

Is it a bug, or a feature that needs special care which is not
documented
in the postgresql documents? Is this extra byte overhead reflected by
VARHDRSZ? But a simple fgrep -r VARHDRSZ in the header files showed:

internal/c.h:#define VARHDRSZ ((int32) sizeof(int32))
internal/c.h: * always VARSIZE(ptr) - VARHDRSZ.
server/access/tuptoaster.h: VARHDRSZ))
server/utils/varbit.h:/* Header overhead *in addition to* VARHDRSZ */
server/utils/varbit.h:#define VARBITBYTES(PTR) (VARSIZE(PTR) -
VARHDRSZ - VARBITHDRSZ)
server/utils/varbit.h:
VARHDRSZ + VARBITHDRSZ)
server/c.h:#define VARHDRSZ ((int32) sizeof(int32))
server/c.h: * always VARSIZE(ptr) - VARHDRSZ.

which means VARHDRSZ should be sizeof(int32), which is always a constant
4
bytes. Is the VARBITHDRSZ relevant to this problem? But VARBITHDRSZ is
not
defined in any header files "make install-all-headers" installed.

BTW, if it's not a bug, this kind of implementation inconsistent with
common
sense is ugly and a potential of buggy code.

[1]: http://archives.postgresql.org/pgsql-hackers/1998-03/msg00430.php

Tom Lane

tgl@sss.pgh.pa.us

over 23 years ago

In reply to: Huaxin WANG (#1)

Re: Multibyte char encoding atttypmod weirdness

"Huaxin WANG" <wanghx@netspeed-tech.com> writes:

When locale is set to multibyte char encoding languages,
such as ja_JP.eucjp, and char encoding set to EUC_JP, for the char(20)
columns (attributes), the libpq ((PGresult *)res)->attDescs[0].atttypmod
returned by PQfmod(res, 0) is not correct. It's neither 20, nor 20+4 as
reported in the hackers' mail list [1], but something varying (which I
failed
to figure out). In my specific case, it's 25.

I don't think so. A column declared as char(N) *will* have an atttypmod
of N+4. The actual physical length in bytes of a column entry might
be more, though, since we measure N in terms of characters not bytes.

regards, tom lane

Huaxin WANG

wanghx@netspeed-tech.com

over 23 years ago

In reply to: Huaxin WANG (#1)

Re: Multibyte char encoding atttypmod weirdness

Sorry but I made a mistake in describing the problem.

PQfmod(...) returns 20 + 4, but strlen(PQgetvalue(...)) returns
something varying, more than 24.

Since you said atttypmod is char len + 4, "The actual physical length in
bytes of a column entry might be more", it's dependant to the current
locale settings and multibyte/wide char related functions should be used
to calculate the byte length. Is there a simple and direct way to know
this byte lenght through libpq API? I will try to figure it out.

Thank you very much for you informative and helpful reply.

----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Huaxin WANG" <wanghx@netspeed-tech.com>
Cc: <pgsql-bugs@postgresql.org>
Sent: Monday, February 24, 2003 11:07 PM
Subject: Re: [BUGS] Multibyte char encoding atttypmod weirdness

"Huaxin WANG" <wanghx@netspeed-tech.com> writes:

When locale is set to multibyte char encoding languages,
such as ja_JP.eucjp, and char encoding set to EUC_JP, for the

char(20)

columns (attributes), the libpq ((PGresult

*)res)->attDescs[0].atttypmod

returned by PQfmod(res, 0) is not correct. It's neither 20, nor

20+4 as

reported in the hackers' mail list [1], but something varying (which

failed
to figure out). In my specific case, it's 25.

I don't think so. A column declared as char(N) *will* have an

atttypmod

Show quoted text

of N+4. The actual physical length in bytes of a column entry might
be more, though, since we measure N in terms of characters not bytes.

regards, tom lane