multi-byte aware char_length() etc.
I'm planning to modify some string functions so that they would be
aware of multi-byte strings if compiled with the multi-byte
capability. Followings are files I'm going to modify. I would like to
hear your opinions if you have any.
o character_length()
It seems that the function is implemented as textlen() in
utils/adt/varlena.c or as varcharlen() in varchar.c. Current
implementaion returns an octet length rather than a char length. So I
will change them. However, there might be necessity for getting an
octet length in some applications. Maybe this is a good chance to add
SQL92's octet_length().
o lower()/upper()
Implemented in oracle_compat.c. One thing I have noticed is that it
uses toupper()/tolower(). For ASCII, they are fine. But on some
platforms (I guess SysV) they might have some problems:
char c; /* c is an 8-bit letter and this platform uses char as
signed char */
toupper(c); /* may cause segfault or any other bad thing */
So I will change like:
toupper((unsigned char)c);
o position()
Implemented as textpos() in varlena.c.
o substring()
Implemented as text_substr() in varlena.c.
--
Tatsuo Ishii
t-ishii@sra.co.jp
I'm planning to modify some string functions so that they would be
aware of multi-byte strings if compiled with the multi-byte
capability. Followings are files I'm going to modify. I would like to
hear your opinions if you have any.o character_length()
It seems that the function is implemented as textlen() in
utils/adt/varlena.c or as varcharlen() in varchar.c. Current
implementaion returns an octet length rather than a char length. So I
will change them. However, there might be necessity for getting an
octet length in some applications. Maybe this is a good chance to add
SQL92's octet_length().
Yes.
o lower()/upper()
Implemented in oracle_compat.c. One thing I have noticed is that it
uses toupper()/tolower(). For ASCII, they are fine. But on some
platforms (I guess SysV) they might have some problems:char c; /* c is an 8-bit letter and this platform uses char as
signed char */
toupper(c); /* may cause segfault or any other bad thing */So I will change like:
toupper((unsigned char)c);
I would like to move these routines, as you clean them up, to varlena.c
or whatever Postgres-specific source file is appropriate. Let's leave
oracle_compat.c for non-standard, Oracle-specific functions. Perhaps
eventually we can move any of those which remain to the contrib
directory, assuming that there are good equivalent functions available
in SQL92.
Sort of annoying having oracle_compat when Oracle doesn't return the
favor by having a "postgres_compat". Well, maybe DataBlades are the same
thing?? :)
o position()
Implemented as textpos() in varlena.c.
o substring()
Implemented as text_substr() in varlena.c.
These two are OK. I'm not yet clear on where in the parser these varlena
functions are matched up with both text and varchar() types. We may need
to do something different as we keep working on getting the
text/varchar/char behavior improved.