Re: Casting, again

Started by Thomas Lockhartover 25 years ago8 messages
#1Thomas Lockhart
lockhart@alumni.caltech.edu

We've talked about examples like this before. I'm inclined to think
that when we are unable to resolve an operator involving unknown-type
inputs, we should try again assuming that the unknowns are of type
'text'. Comments?

Yes please. SQL and the rest of the world assumes the 'xxx' is a character
constant, only PostgreSQL doesn't.

And only Postgres is trying to have a properly built extensible type
system with fewer legacy "SQL80" holdovers. So don't start throwing
things out without having a solid alternative that considers these
cases.

In the case of length():

pre-7.0 used length() to give a "length" of several data types,
including strings and geometric types. This led to more instances of
parser confusion when using untyped strings, since there were more
possible matches of types and the function.

For 7.0, I changed the implementation to decouple string types and
other types by natively supporting char_length() (and
character_length()), the SQL92-defined length function(s) for strings.
I left length() for the other types.

I believe that this is mentioned in the release notes.

btw, what were we hoping to accomplish with length(755)? Why isn't "3"
a good answer??

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Thomas Lockhart (#1)

For 7.0, I changed the implementation to decouple string types and
other types by natively supporting char_length() (and
character_length()), the SQL92-defined length function(s) for strings.
I left length() for the other types.

I believe that this is mentioned in the release notes.

btw, what were we hoping to accomplish with length(755)? Why isn't "3"
a good answer??

Three is the right answer. I was just wondering why it used to fail,
but now it works.

-- 
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#1)

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

btw, what were we hoping to accomplish with length(755)? Why isn't "3"
a good answer??

If you believe it should have an answer at all, then 3 is probably
the right answer. But it used to be rejected, and I tend to think
that that's the right behavior. I don't like the idea of silent
conversions from numeric-looking things into text. It might be
merely amusing in this case but in other cases it could be very
confusing if not outright wrong. Why was this change put in?

regards, tom lane

#4Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Thomas Lockhart (#1)

btw, what were we hoping to accomplish with length(755)? Why isn't "3"
a good answer??

If you believe it should have an answer at all, then 3 is probably
the right answer. But it used to be rejected, and I tend to think
that that's the right behavior. I don't like the idea of silent
conversions from numeric-looking things into text. It might be
merely amusing in this case but in other cases it could be very
confusing if not outright wrong. Why was this change put in?

Actually, I'm not sure a change *was* put in! I haven't yet looked,
but it may be that this is a result of my adding a "number to text"
conversion function. The type conversion code took that and ran!

Remember that for v7.0, "length" for character strings should be
"char_length". Maybe some of the trouble here is from leftover
attempts to get strings and other "length" types to play together in
an underspecified query.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thomas Lockhart (#4)

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

Actually, I'm not sure a change *was* put in! I haven't yet looked,
but it may be that this is a result of my adding a "number to text"
conversion function. The type conversion code took that and ran!

Ah, I think you are right --- I was through the type-resolution code
not too long ago, and I don't recall seeing any special cases for
numeric->text either. It must be as you say, that the addition of
this apparently harmless conversion function caused an unexpected
change in the overall behavior.

After reflecting on this example for a little bit, I like my proposal
for explicit "promotability" links between types even better. The
example illustrates that it's dangerous to have promotability decisions
made on the basis of whether there happens to be a conversion function
available or not. Offering a text(int4) function that users can call
when they want to force a conversion is fine, but that should not
automatically mean that the system can *silently* call it to cause an
implicit conversion. Another example is that if we were to offer an
int(bool) conversion function, as was suggested for the Nth time in
a nearby thread, the current system would not allow us to control
whether that conversion can happen implicitly --- it would, period.
If implicit conversions can only follow specific "promotability" links
then we don't have this risk.

regards, tom lane

#6Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#5)

Tom Lane writes:

I've noticed in other cases that it's willing to do implicit
conversion to text from practically anything. That strikes me as a
bug, or at least pretty darn unintuitive. Is this behavior
intentional, and if so what's the rationale?

Well, I'm sure you know my stance on these things by now, but let me add
one last thing: What I'm missing from the type system is a governing
strategy. On the one hand there is strong typing, on the other hand
implicit casting across the board, but not when you would actually want
it. This is usually where programming language paradigms clash, but note
that programming language design in the last two decades has clearly been
moving to one of two things: either strong and strict typing or no typing
at all. I'm not saying I like strict like Java were you can't pass a
literal `5' as a `short int' argument without complaints but strict as in
a string is a string, a date is a date, a point is a point, a number is a
number.

I would go as far as saying that if you try to insert a 5 in a text field
then this should be an error, you must write '5'. Surely some might claim
that this is an inconvenience. Indeed, this is inconveniencing me because
possible errors in string processing or even system logic are silently
dropped under the table. Nobody ever got carpal tunnel syndrome because of
two extra quotes, and if the SQL is machine-generated then fixing the
program is the best thing in the long run anyway.

The previous paragraph agrees with SQL (see section 9.2).

--
Peter Eisentraut Sernanders v�g 10:115
peter_e@gmx.net 75262 Uppsala
http://yi.org/peter-e/ Sweden

#7Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Thomas Lockhart (#1)

If implicit conversions can only follow specific "promotability" links
then we don't have this risk.

We've had some evolution in how we do type coersion and conversion,
and istm that we are about ready to take the next step. Before, there
were only a few folks thinking about it, and we implemented some steps
to test out new ideas without making fundamental changes outside the
parser. Now, we can make some more improvements based on experience
with the current system.

I like the idea of having some table-driven rules for implicit type
coersion, since that technique can extend to user-defined types. The
issues we have regarding string types and numeric types need to be
thought through in the context of having *more* string types, which
afaik is how we are going to integrate multibyte character sets into
basic Postgres.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#8Raul Chirea
raul@brahms.ro
In reply to: Peter Eisentraut (#6)

Your pg_excelencies,

Peter Eisentraut wrote:

I would go as far as saying that if you try to insert a 5 in a text field
then this should be an error, you must write '5'. Surely some might claim
that this is an inconvenience. Indeed, this is inconveniencing me because
possible errors in string processing or even system logic are silently
dropped under the table. Nobody ever got carpal tunnel syndrome because of
two extra quotes, and if the SQL is machine-generated then fixing the
program is the best thing in the long run anyway.

Did you think to a system with some abstract types (like numeric, char, date,
etc.) and fizical
types within the abstract types (i.e. abstract numeric type holds physical
integer type, physical
float type, phisical numeric type), and the only implicit cast that parser
does to be
between phisical types whithin the same abstract type ?

Raul.