multibyte support by default

Started by Tatsuo Ishiiover 23 years ago7 messages
#1Tatsuo Ishii
t-ishii@sra.co.jp

In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?
--
Tatsuo Ishii

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Tatsuo Ishii (#1)
Re: multibyte support by default

Tatsuo Ishii writes:

In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?

It was my understanding (or if I was mistaken, then it is my suggestion)
that the build-time option would be removed altogether and certain
performance-critical places (if any) would be wrapped into

if (encoding_is_single_byte(current_encoding)) { }

That's basically what I did with the locale support.

--
Peter Eisentraut peter_e@gmx.net

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#1)
Re: multibyte support by default

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?

Uh, was it? I don't recall that. Do we have any numbers on the
performance overhead?

regards, tom lane

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Tom Lane (#3)
Re: multibyte support by default

In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?

Uh, was it? I don't recall that. Do we have any numbers on the
performance overhead?

regards, tom lane

See below.

Subject: Re: [HACKERS] Unicode combining characters
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Tatsuo Ishii <t-ishii@sra.co.jp>
cc: ZeugswetterA@spardat.at, pgman@candle.pha.pa.us, phede-ml@islande.org,
pgsql-hackers@postgresql.org
Date: Wed, 03 Oct 2001 23:05:16 -0400
Comments: In-reply-to Tatsuo Ishii <t-ishii@sra.co.jp> message dated "Thu, 04 Oct 2001 11:16:42 +0900"

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

To accomplish this, I moved MatchText etc. to a separate file and now
like.c includes it *twice* (similar technique used in regexec()). This
makes like.o a little bit larger, but I believe this is worth for the
optimization.

That sounds great.

What's your feeling now about the original question: whether to enable
multibyte by default now, or not? I'm still thinking that Peter's
counsel is the wisest: plan to do it in 7.3, not today. But this fix
seems to eliminate the only hard reason we have not to do it today ...

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tatsuo Ishii (#4)
Re: multibyte support by default

Tatsuo Ishii <t-ishii@sra.co.jp> writes:

In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?

Uh, was it? I don't recall that. Do we have any numbers on the
performance overhead?

See below.

Oh, okay, now I recall that thread. You're right, we did agree.

regards, tom lane

#6Hannu Krosing
hannu@tm.ee
In reply to: Tatsuo Ishii (#1)
Re: multibyte support by default

On Tue, 2002-04-16 at 03:20, Tatsuo Ishii wrote:

In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?

Is there currently some agreed plan for introducing standard
NCHAR/NVARCHAR types.

What does ISO/ANSI say about multybyteness of simple CHAR types ?

--------------
Hannu

#7Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Hannu Krosing (#6)
Re: multibyte support by default

On Tue, 2002-04-16 at 03:20, Tatsuo Ishii wrote:

In my understanding, our consensus was enabling multibyte support by
default for 7.3. Any objection?

Is there currently some agreed plan for introducing standard
NCHAR/NVARCHAR types.

I have such a kind of *personal* plan, maybe for 7.4, not for 7.3 due
to the limitation of my free time.

BTW, NCHAR/NVARCHAR is just a abbreviation of "CHAR(n) CHARACTER SET
foo"(where foo is an implementaion defined charset). So I'm not too
impressed by an idea implementing NCHAR/NVARCHAR alone.

What does ISO/ANSI say about multybyteness of simple CHAR types ?

There's no such that idea "multybyteness" in the standard. In my
understanding the standard does not restrict "normal" CHAR types to
have only ASCII (more precisely "SQL_CHARACTER"). Moreover, CHAR types
without CHARSET specification will a have default charset to SQL_TEXT,
and its actual charset will be defined by the implementation.

In summary allowing any characters including multibyte ones in CHAR
types is not againt the standard at all, IMO.
--
Tatsuo Ishii