Re: Proposal for fixing numeric type-resolution issues

Started by Thomas Lockhartover 25 years ago5 messages
#1Thomas Lockhart
lockhart@alumni.caltech.edu

Here is a proposal for fixing these problems.

Sounds good. We would be looking up this info in a table, right? So we
can integrate this type hierarchy fully into our type extensibility
system.

Another 7.1 project is to work on alternate languages and character
sets, to decouple multibyte and locale from the default SQL_TEXT
character set. This will probably bring up issues similar to the
numeric problems, and since these character sets will be added as
user-defined types it will be important for the backend to understand
how to convert them for comparison operations, for example.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Lockhart (#1)

Tom Lane writes:

I expect what you are after is the ability to produce an 0-or-1
numeric value from a bool field, so that you could do things like
sum(boolfield::int) to count the number of true values in a column.
I agree that we need such an operator (and I'm surprised no one's
gotten round to contributing one).

Let's contribute one now ...

select count(*) from test4 where a = true;

select sum( case when a then 1 else 0 end ) from test4;

But I don't agree that there
should be an implicit, automatic conversion from bool to int; that
defeats half of the error-checking value of having a separate type for
truth values in the first place.

Definitely.

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#3Peter Eisentraut
peter_e@gmx.net
In reply to: Thomas Lockhart (#1)

Thomas Lockhart writes:

Another 7.1 project is to work on alternate languages and character
sets, to decouple multibyte and locale from the default SQL_TEXT
character set. This will probably bring up issues similar to the
numeric problems, and since these character sets will be added as
user-defined types it will be important for the backend to understand
how to convert them for comparison operations, for example.

Really? I always thought the character set would be some separate entity
and perhaps an oid reference would be stored with every character string
and attribute. That would get you around any type conversion as long as
the functions acting on character types take this "header" field into
account.

If you want to go the data type way then you'd need to have some sort of
most general character set to cast to. That could be Unicode but that
would require that every user-defined character set be a subset of
Unicode, which is perhaps not a good assumption to make. Also, I wonder
how collations would fit in there. Collations definitely can't be ordered
at all, so casting can't be done in a controlled fashion.

Just wondering...

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#4Tatsuo Ishii
t-ishii@sra.co.jp
In reply to: Peter Eisentraut (#3)

Thomas Lockhart writes:

Another 7.1 project is to work on alternate languages and character
sets, to decouple multibyte and locale from the default SQL_TEXT
character set. This will probably bring up issues similar to the
numeric problems, and since these character sets will be added as
user-defined types it will be important for the backend to understand
how to convert them for comparison operations, for example.

Really? I always thought the character set would be some separate entity
and perhaps an oid reference would be stored with every character string
and attribute. That would get you around any type conversion as long as
the functions acting on character types take this "header" field into
account.

I think that way too. If what Thomas is suggesting is that to make a
user-defined charaset, one need to make everything such as operators,
charset, functions to work with index etc. (like defining new a data
type), that would be too painfull.

If you want to go the data type way then you'd need to have some sort of
most general character set to cast to. That could be Unicode but that
would require that every user-defined character set be a subset of
Unicode, which is perhaps not a good assumption to make.

Right. But the problem is SQL92 actually requires such a charset
called "SQL_TEXT." For me, the only candidate for SQL_TEX at this
point seems to be "mule internal code." Basically it is a variant of
ISO-2022 and has a capability to adapt to most of charsets defined in
ISO-2022. I think we could expand it so that it could become a
superset even for Unicode. Of course the problem is mule internal code
is a "internal code" and is not widely spread in the world. Even
that's true we could use it for purely internal purpose (for the parse
tree etc.).

Also, I wonder
how collations would fit in there. Collations definitely can't be ordered
at all, so casting can't be done in a controlled fashion.

Hmm... Collations seem to be a different issue. I think there's no
such an idea like "collation casting" in SQL92.
--
Tatsuo Ishii

#5Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Thomas Lockhart (#1)

All good ideas and thoughts. I have been thinking that essentially
separate types per character set is the right thing, but we'll have
plenty of time to talk about it.

One point is that SQL92 assigns a specific character set and collation
sequence to every character string and every column definition; if we
embedded this "type" identification into every string then we would be
replicating the existing Postgres type system one layer down (at least
for argument's sake ;)

There also need to be well defined conversions between character
sets/collations, and some or most combinations will be illegal (e.g.
how do you collate American English against Japanese?). The Postgres
type system can enforce this simply by not providing conversion or
comparison functions for the relevant mixture of types.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California