Proposal for resolving casting issues
We've been discussing this stuff in fits and starts for months now, but
nothing satisfactory has been arrived at. I've concluded that part of
the problem is that we are trying to force the system's behavior into
a model that is too limiting: we need more than an implicit/explicit cast
distinction. Accordingly, I suggest we bite the bullet and make it happen.
(Note that I've resigned myself to having to do an initdb for 7.3beta2.)
I think we must extend pg_cast's castimplicit column to a three-way value:
* okay as implicit cast in expression (or in assignment)
* okay as implicit cast in assignment only
* okay only as explicit cast
"In expression" refers to cases where we have (or potentially have) multiple
possible interpretations; essentially, anytime a value is being fed to a
function or operator, there can be ambiguity due to overloading, and so we
need to restrict the set of possible implicit casts to limit ambiguity and
ensure a reasonable choice of function is made.
"In assignment only" actually means any case where the destination datatype
is known with certainty. For example CoerceTargetExpr is currently used to
coerce an array subscript expression to integer, and I think it's okay to
treat that context like store assignment.
Question: what shall we call these alternatives in CREATE CAST? The SQL99
phrase AS ASSIGNMENT looks like it should mean the second, but I think
the spec semantics require it to mean the first. Ugh. Perhaps AS
ASSIGNMENT ONLY for the second case?
Also, I think we should allow cast functions to take an optional boolean
second argument "isExplicit", so that explicit casts can be distinguished
from implicit at runtime. We'll use this to get spec-compliant semantics
for char/varchar truncation (there shouldn't be an error if you explicitly
cast to a shorter length).
We'll need to add fields to Func and RelabelType nodes so that we can tell
whether a node was generated due to an explicit function call, implicit
cast, or explicit cast; we'll use these for better reverse-listing. (In
particular this will let us hide the boolean second argument from being
reverse-listed, when present.)
Now, as to just what to do with it --- Peter posted a list of questions
awhile back that weren't ever resolved, but I think we can make some
progress with this scheme in mind:
From looking at the set of implicit or not casts, I think there are two
major issues to discuss:1. Should truncating/rounding casts be implicit? (e.g., float4 -> int4)
I think there's a good argument for "no", but for some reason SQL99 says
"yes", at least for the family of numerical types.
We can make this work cleanly if "down" casts are assignment-only while
"up" casts are fully implicit. I think that the spec requires implicit
casting only in the context of store assignment.
2. Should casts from non-character types to text be implicit? (e.g., date
-> text)I think this should be "no", for the same reason that the other direction
is already disallowed. It's just sloppy programming.
I agree with this in principle, but in practice we probably have to allow
implicit casts to text, at least for awhile yet. Seems that too many
people depend on stuff like
SELECT 'Meeting time is ' || timestamp_var
Since this is an expression context we don't get any help from the notion
of store assignment :-(
I also have a few individual cases that look worthy of consideration:
abstime <-> int4: I think these should not be implicit because they
represent different "kinds" of data. (These are binary compatible casts,
so changing them to not implicit probably won't have any effect. I'd have
to check this.)
I believe that as of current sources we can mark a binary cast non-implicit,
and I agree with marking these two explicit-only.
date -> timestamp[tz]: I'm suspicious of this one, but it's hard to
explain. The definition to fill in the time component with zeros is
reasonable, but it's not the same thing as casting integers to floats
because dates really represent a time span of 24 hours and timestamps an
indivisible point in time. I suggest making this non-implicit, for
conformance with SQL and for general consistency between the date/time
types.
I disagree here; promoting date to timestamp seems perfectly reasonable,
and I think it's something a lot of people rely on.
time -> interval: I'm not even sure this cast should exist at all.
Proper arithmetic would be IntervalValue = TimeValue - TIME 'midnight'.
At least make it non-implicit.
I'd go along with marking it assignment-only.
timestamp -> abstime: This can be implicit AFAICS.
This is lossy (abstime doesn't preserve fractional seconds) so I'd vote
for making it assignment-only.
In a later message Peter wrote:
Since almost every cast to "text" is implicit, then I believe so should
inet -> text
macaddr -> text
int4 -> varchar
int8 -> varchar
which are currently not.
I'd go along with making the inet->text and macaddr->text cases implicit,
since as you note all the other casts to text are. However, those two
casts to varchar must not be implicit (or at most assignment-only) else
they will create ambiguity against the implicit casts to text for the same
source datatype.
In summary: I haven't yet gone through the existing casts in detail, but
I propose the following general rules for deciding how to mark casts:
* Casts across datatype categories should be explicit-only, with the
exception of casts to text, which we will allow implicitly for backward
compatibility's sake.
* Within a category, "up" (lossless) conversions are implicit, "down"
(potentially lossy) conversions should be assignment-only.
Comments?
regards, tom lane
abstime <-> int4: I think these should not be implicit because they
represent different "kinds" of data. (These are binarycompatible casts,
so changing them to not implicit probably won't have any
effect. I'd have
to check this.)
I believe that as of current sources we can mark a binary cast
non-implicit,
and I agree with marking these two explicit-only.
Everything in this proposal looks pretty good. With regards to the above
abstime<->int4 thing - what about the 'magic' values in that conversion.
(eg. -infinity, etc.)
Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
Everything in this proposal looks pretty good. With regards to the above
abstime<->int4 thing - what about the 'magic' values in that conversion.
(eg. -infinity, etc.)
They map to some magic int4 values, same as it ever was. I'm not
interested in trying to improve the semantics of any specific conversion
at the moment...
regards, tom lane
I think we must extend pg_cast's castimplicit column to a three-way value:
* okay as implicit cast in expression (or in assignment)
* okay as implicit cast in assignment only
* okay only as explicit cast
In summary: I haven't yet gone through the existing casts in detail, but
I propose the following general rules for deciding how to mark casts:* Casts across datatype categories should be explicit-only, with the
exception of casts to text, which we will allow implicitly for backward
compatibility's sake.* Within a category, "up" (lossless) conversions are implicit, "down"
(potentially lossy) conversions should be assignment-only.
First of all, thank you for taking this on !
I think the following three states may enable a closer match to an actually
desired (Peter said mandated by SQL99) behavior.
1. okay as implicit cast in expression or assignment
2. okay as implicit cast in expression or assignment but needs runtime check
(precision loss possible)
3. okay only as explicit cast (precision loss possible)
Now, since we prbbly can't do this all in beta I think it would be okay to
interpret my state 2 with yours for this release, and add expressions with
runtime checks later.
Regarding the "isExplicit": I think a closer match would be an output argument
"PrecisionInfo" enum(ok, precision loss, [conversion failed ?]). With this,
the caller can differentiate whether to raise a warning (note that char truncation
is actually supposed to raise a warning in sqlca.sqlwarn).
Maybe make it in/out so you can tell the function to not abort on conversion error,
since what I think we need for constants is a "try convert" that does not even abort
on wrong input.
For numbers there is probably only the solution to invent an "anynumber" generic type.
Examples that should imho succeed (and do succeed in other db's):
select int2col = 1000000; conversion fails (looses precision ?) --> return false
this case could probably behave better if it where defined,
that is_a_number but doesn't convert is a precision loss,
maybe with this anynumeric would not be necessary
select char6col = '123456789'; conversion looses precision --> return false for eq
select int2col = 10.0; conversion ok --> use index on int2col (same for '10', '10.0')
Andreas
PS: pg snapshot 09/11 does not compile on AIX (large files (don't want _LARGE_FILES), and mb conversions
(pg_ascii2mic and pg_mic2ascii not found in the postmaster and not included from elsewhere)
are the culprit) (make check hangs on without_oid's vacuum when removing conversions from Makefile and
undef _LARGE_FILES to make it compile)
There are also tons of "unsigned char vs signed char" warnings in current mb sources :-(
Import Notes
Resolved by subject fallback
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
I think the following three states may enable a closer match to an actually
desired (Peter said mandated by SQL99) behavior.
1. okay as implicit cast in expression or assignment
2. okay as implicit cast in expression or assignment but needs runtime check
(precision loss possible)
3. okay only as explicit cast (precision loss possible)
The runtime checks are there already, eg
regression=# select 123456789::int4::int2;
ERROR: i4toi2: '123456789' causes int2 overflow
However this does not help us much; the critical point is that if we
want function overloading to work in a sane fashion, we have to prefer
up-conversions to down-conversions *at parse time*, at least for the
operands of functions and operators (which is what I meant by "in
expressions"). Runtime checks are irrelevant to this problem.
Regarding the "isExplicit": I think a closer match would be an output
argument "PrecisionInfo" enum(ok, precision loss, [conversion failed
?]).
I'm not planning to add output arguments to fix this problem ;-)
For numbers there is probably only the solution to invent an
"anynumber" generic type.
Actually, I had been toying with the notion of doing the following:
1. A numeric literal is initially typed as the smallest type that will
hold it in the series int2, int4, int8, numeric (notice NOT float8).
2. Allow implicit up-coercion int2->int4->int8->numeric->float4->float8,
but down-coercions aren't implicit except for assignment.
3. Eliminate most or all of the cross-numeric-type operators (eg, there
is no reason to support int2+int4 as a separate operator).
With this approach, an expression like "int4var = 42" would be initially
typed as int4 and int2, but then the constant would be coerced to int4
because int4=int4 is the closest-match operator. (int2=int2 would not
be considered because down-coercion isn't implicitly invokable.) Also
we get more nearly SQL-standard behavior in expressions that combine
numeric with float4/float8: the preferred type will be float, which
accords with the spec's notions of exact numeric vs. approximate numeric.
I think this solves most or all of our problems with poor type choices
for numeric literals, but it still needs thought --- I'm not suggesting
we shoehorn it into 7.3. In particular, I'm not sure whether the
current preferred-type arrangement (namely, numeric and float8 are both
preferred types for the numeric category) would need to change.
There are also tons of "unsigned char vs signed char" warnings in
current mb sources :-(
Yeah, I know :-( ... I see 'em too when using HPUX' vendor compiler.
We ought to clean that up someday.
regards, tom lane
I said:
Also, I think we should allow cast functions to take an optional boolean
second argument "isExplicit", so that explicit casts can be distinguished
from implicit at runtime. We'll use this to get spec-compliant semantics
for char/varchar truncation (there shouldn't be an error if you explicitly
cast to a shorter length).
After looking closely at SQL92 sections 6.10 (cast specification) and
9.2 (store assignment), it seems that the only places where the spec
demands different behavior for an explicit cast than for an implicit
assignment cast are for length coercions of char, varchar, bit, and
varbit types.
Accordingly, the places where we actually *need* the extra isExplicit
argument are not in the type-coercion functions per se, but in the
length-coercion functions associated with these four datatypes.
While we could still add the extra argument for the type-coercion
functions, I'm inclined not to do so; there is no need for it for spec
compliance of the standard types, and I don't think we should encourage
user-defined types to behave differently for explicit and implicit
casts.
What I will do instead is adjust parse_coerce.c so that a
length-coercion function can have either of the signatures
foo(foo,int4) returns foo
or
foo(foo,int4,bool) returns foo
and then modify the above-mentioned length coercion functions to provide
the desired behavior. This has no direct impact on pg_cast because we
do not use pg_cast for length-coercion functions.
regards, tom lane
Tom Lane writes:
I think we must extend pg_cast's castimplicit column to a three-way value:
* okay as implicit cast in expression (or in assignment)
* okay as implicit cast in assignment only
* okay only as explicit cast
Viewed in isolation this looks entirely reasonable, but I think we would
be adding a lot of infrastructure for the benefit of a relatively small
number of cases.
As the writer of a cast, this presents me with at least one more option
than I can really manage.
As the user of a cast, these options make the whole system nearly
unpredictable because in any non-trivial expression each of these
behaviors could take effect somehow (possibly even depending on how the
inner expressions turned out).
I am not aware of any programming language that has more than three
castability levels (never/explicit/implicit).
Finally, I believe this paints over the real problems, namely the
inadequate and hardcoded type category preferences and the inadequate
handling of numerical constants. Both of these issues have had adequate
approaches proposed in the past and would solve this an a number of other
issues.
--
Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane writes:
I think we must extend pg_cast's castimplicit column to a three-way value:
* okay as implicit cast in expression (or in assignment)
* okay as implicit cast in assignment only
* okay only as explicit cast
As the user of a cast, these options make the whole system nearly
unpredictable because in any non-trivial expression each of these
behaviors could take effect somehow (possibly even depending on how the
inner expressions turned out).
How so? Only the first set of casts applies inside an expression.
It seems to me that this proposal actually *reduces* the number of casts
that might apply in any given context, and thus makes the behavior more
predictable not less so. Certainly it is more predictable than
any-cast-can-be-applied-implicitly, which I seem to remember you arguing
for (at least for the numeric types).
I am not aware of any programming language that has more than three
castability levels (never/explicit/implicit).
Actually I think that this scheme would allow us to model typical
programming-language behavior quite accurately. C for example will let
you assign a float to an integer (with appropriate runtime behavior) ---
but if you add a float and an integer, you get a float addition; there's
no possibility that the system will choose to coerce the float to int
and do an int addition. So the set of available implicit casts is
different in an assignment context than it is in an expression context.
Seems pretty close to what I'm suggesting.
Finally, I believe this paints over the real problems, namely the
inadequate and hardcoded type category preferences and the inadequate
handling of numerical constants. Both of these issues have had adequate
approaches proposed in the past and would solve this an a number of other
issues.
If they were adequate they would have gotten implemented; we had issues
with all the proposals so far. See my later response to Andreas for a
possible solution to the numerical-constant issue based on this
mechanism.
regards, tom lane
Tom Lane wrote:
We've been discussing this stuff in fits and starts for months now, but
nothing satisfactory has been arrived at. I've concluded that part of
the problem is that we are trying to force the system's behavior into
a model that is too limiting: we need more than an implicit/explicit cast
distinction. Accordingly, I suggest we bite the bullet and make it happen.
(Note that I've resigned myself to having to do an initdb for 7.3beta2.)
I was reading my backlog of email and thinking, "Oh, things are shaping
up well", then I hit this message. Let me try to collect open items
tomorrow and get a plan together. I have caught up on my email. I am
heading to bed.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
I think the following three states may enable a closer match to an actually
desired (Peter said mandated by SQL99) behavior.1. okay as implicit cast in expression or assignment
2. okay as implicit cast in expression or assignment but needs runtime check
(precision loss possible)
3. okay only as explicit cast (precision loss possible)The runtime checks are there already, eg
regression=# select 123456789::int4::int2;
ERROR: i4toi2: '123456789' causes int2 overflowHowever this does not help us much; the critical point is that if we
want function overloading to work in a sane fashion, we have to prefer
up-conversions to down-conversions *at parse time*, at least for the
operands of functions and operators (which is what I meant by "in
expressions"). Runtime checks are irrelevant to this problem.
I think there is some confusion here. The runtime checks Andreas was
talking about was allowing a double of 64.0 to cast to an int4 while
disallowing 64.1 from being cast to an int4 because it is not a hole
number.
I am not sure doubles have enough precision to make such comparisons
functional (NUMERIC certainly does) but that was his proposal, and he
stated he thought the standard required it.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian <pgman@candle.pha.pa.us> writes:
I think there is some confusion here. The runtime checks Andreas was
talking about was allowing a double of 64.0 to cast to an int4 while
disallowing 64.1 from being cast to an int4 because it is not a hole
number.
I am not sure doubles have enough precision to make such comparisons
functional (NUMERIC certainly does) but that was his proposal, and he
stated he thought the standard required it.
It seems clear to me that the standard requires us NOT to reject that.
In the explicit-cast case, SQL92 6.10 <cast specification> saith:
3) If TD is exact numeric, then
Case:
a) If SD is exact numeric or approximate numeric, then
Case:
i) If there is a representation of SV in the data type TD
that does not lose any leading significant digits after
rounding or truncating if necessary, then TV is that rep-
resentation. The choice of whether to round or truncate is
implementation-defined.
ii) Otherwise, an exception condition is raised: data exception-
numeric value out of range.
So we are *only* allowed to throw an error for overflow; having to round
is not an error condition.
In the implicit-cast case, section 9.2 Store assignment has
k) If the data type of T is numeric and there is an approxi-
mation obtained by rounding or truncation of the numerical
value of V for the data type of T, then the value of T is set
to such an approximation.
If there is no such approximation, then an exception condi-
tion is raised: data exception-numeric value out of range.
If the data type of T is exact numeric, then it is implementation-
defined whether the approximation is obtained by rounding or
by truncation.
which is different wording but seems to boil down to the same thing: the
only error condition is out-of-range.
regards, tom lane
What I will do instead is adjust parse_coerce.c so that a
length-coercion function can have either of the signatures
foo(foo,int4) returns foo
or
foo(foo,int4,bool) returns foo
and then modify the above-mentioned length coercion functions to provide
the desired behavior. This has no direct impact on pg_cast because we
do not use pg_cast for length-coercion functions.
Sounds good to me.
When those are really truncated ESQL/C needs to set a warning in sqlca.sqlwarn
though, thus I think the second signature should also have an output flag to tell
whether truncation actually occurred.
Maybe this should be kept for a protocol change though, since I would not think
a NOTICE would be suitable here.
Andreas
Import Notes
Resolved by subject fallback
I think there is some confusion here. The runtime checks Andreas was
talking about was allowing a double of 64.0 to cast to an int4 while
disallowing 64.1 from being cast to an int4 because it is not a hole
number.
Yes, and Tom's proposal for numbers is sufficient for constants, since the 64.0
will initially be an int2 and thus do the correct thing together with an int4,
and the 64.1 constant will be a numeric, and thus also do the correct thing with
all other types.
It is not sufficient for the optimizer for joins though, since it cannot use the
int4 index when confronted with "where tab1.int4col = tab2.numericcol".
Here only a runtime (non aborting) check would help.
Maybe this could be overcome if the index access (or something inbetween) would allow
a "numeric" constant for an int4 index (If the "numeric" value does not cleanly convert
to int4, return no rows).
Andreas
Import Notes
Resolved by subject fallback
For numbers there is probably only the solution to invent an
"anynumber" generic type.Actually, I had been toying with the notion of doing the following:
1. A numeric literal is initially typed as the smallest type that will
hold it in the series int2, int4, int8, numeric (notice NOT float8).
Yes, that sounds like a good plan for all scenarios that can follow !
2. Allow implicit up-coercion int2->int4->int8->numeric->float4->float8,
but down-coercions aren't implicit except for assignment.
How about int2->int4->int8->numeric->float4->float8->numeric ?
That would also allow an upward path from float8.
3. Eliminate most or all of the cross-numeric-type operators
(eg, there is no reason to support int2+int4 as a separate operator).
Yes.
With this approach, an expression like "int4var = 42" would be initially
typed as int4 and int2, but then the constant would be coerced to int4
because int4=int4 is the closest-match operator. (int2=int2 would not
be considered because down-coercion isn't implicitly invokable.)
It would fix the constants issue, yes. How about where int2col=int4col
and it's indexability of int2col though ?
Also
we get more nearly SQL-standard behavior in expressions that combine
numeric with float4/float8: the preferred type will be float, which
accords with the spec's notions of exact numeric vs.
approximate numeric.
I do not understand the standard here.
Especially the following would seem awkward if that would switch to approximate:
set numericcol = numericcol * float4col;
Andreas
Import Notes
Resolved by subject fallback
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
When those are really truncated ESQL/C needs to set a warning in sqlca.sqlwarn
though, thus I think the second signature should also have an output flag to tell
whether truncation actually occurred.
Maybe this should be kept for a protocol change though, since I would not think
a NOTICE would be suitable here.
Again, I don't want to invent output arguments for functions today ;-).
I agree that a NOTICE would be overkill, and that we need a protocol
change to implement completion conditions (sqlca.sqlwarn) properly.
When that happens, I think the explicit-cast paths in the coercion
routines can easily call the "set a completion condition" routine for
themselves; I see no reason to pass back the condition one level
before doing so.
regards, tom lane
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
It is not sufficient for the optimizer for joins though, since it
cannot use the int4 index when confronted with "where tab1.int4col =
tab2.numericcol".
For cross-datatype joins, the proposal as I sketched it would result in
the parser producing, eg,
where tab1.int4col::numeric = tab2.numericcol
that is, we'd have a single-datatype operator and a runtime cast in the
expression.
The optimizer is today capable of producing a nested loop with inner
indexscan join from this --- so long as the inner indexscan is on the
uncasted column (numericcol in this case). It won't consider an int4
index on int4col for this. This seems okay to me, actually. It's
better than what you get now with a cross-datatype comparison operator
(neither side can be indexscanned since the operator matches neither
index opclass).
The major failing that needs to be rectified is that merge and hash
joins won't even be considered, because that code only works with
quals that are unadorned "Var = Var". I don't believe there is any
fundamental reason for this restriction. As long as the top operator
is merge/hashjoinable, any expression should work on either side.
It's just a matter of cleaning up a few unwarranted shortcuts in the
planner.
But that work does need to be done before we can rip out all the
cross-datatype operators ... so this is definitely not happening
for 7.3 ...
regards, tom lane
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
2. Allow implicit up-coercion int2->int4->int8->numeric->float4->float8,
but down-coercions aren't implicit except for assignment.
How about int2->int4->int8->numeric->float4->float8->numeric ?
That would also allow an upward path from float8.
Uh, what? That seems logically impossible to me ... or at least it
would reintroduce exactly the problem we need to get away from: casts
between float4, float8, numeric would be considered equally good in
either direction, creating ambiguity about which operator to use.
How are you envisioning it would work exactly?
Perhaps I should clarify what I had in mind: because the parser only
considers one level of type coercion when choosing a function or
assigning to a result column, it's actually necessary to have all thirty
cast combinations between the six numeric types available in pg_cast.
My notation "int2->int4->int8->numeric->float4->float8" is intended to
imply that of the thirty, these would be marked as implicitly coercible:
int2->int4
int2->int8
int2->numeric
int2->float4
int2->float8
int4->int8
int4->numeric
int4->float4
int4->float8
int8->numeric
int8->float4
int8->float8
numeric->float4
numeric->float8
float4->float8
while the fifteen reverse coercions would be assignment-only.
If we allow any circularity then we will have pairs of types with both
cast pathways marked as implicit, which will leave the parser unable to
choose which operator to use. This is exactly why "numeric = float8"
has failed in past versions: there are two alternatives that are equally
easy to reach.
It would fix the constants issue, yes. How about where int2col=int4col
and it's indexability of int2col though ?
See my other response. The current scheme of using a cross-datatype
operator isn't helpful for indexing such cases anyway...
Also
we get more nearly SQL-standard behavior in expressions that combine
numeric with float4/float8: the preferred type will be float, which
accords with the spec's notions of exact numeric vs.
approximate numeric.
I do not understand the standard here.
Especially the following would seem awkward if that would switch to
approximate:
set numericcol = numericcol * float4col;
Well, the spec's notion is that combining an "exact" number and an
"approximate" number must yield an "approximate" result. This logic
is hard to argue with, even though in our implementation it would
seem to make more sense for numeric to be the top of the hierarchy
on range and precision grounds.
Note that if you write, say,
set numericcol = numericcol * 3.14159;
my proposal would do the "right thing" since the constant would be typed
as numeric to start with and would stay that way. To do what you want
with a float variable, it'd be necessary to write
set numericcol = numericcol * float4col::numeric;
which is sort of ugly; but no uglier than
set float4col = float4col * numericcol::float4;
which is what you'd have to write if the system preferred numeric and
you wanted the other behavior.
I too have been thinking for a long time that I didn't like following
the spec's lead on this point; but I am now starting to think that it's
not all that bad. This approach to handling constants is *much* cleaner
than what we've done in the past, or even any of the unimplemented
proposals that I can recall. The behavior you'd get with combinations
of float and numeric variables is, well, debatable; from an
implementor's point of view preferring a numeric result makes sense,
but it's much less clear that users would automatically think the same.
Given the spec's position, I am starting to think that preferring float
is the right thing to do.
BTW, I am thinking that we don't need the notion of "preferred type" at
all in the numeric category if we use this approach. I have not worked
through the details for the other type categories, but perhaps if we
adopt similar systems of one-way implicit promotions in each category,
we could retire "preferred types" altogether --- which would let us get
rid of hardwired type categories, too.
regards, tom lane
I wrote:
I think we must extend pg_cast's castimplicit column to a three-way value:
* okay as implicit cast in expression (or in assignment)
* okay as implicit cast in assignment only
* okay only as explicit cast
Question: what shall we call these alternatives in CREATE CAST? The SQL99
phrase AS ASSIGNMENT looks like it should mean the second, but I think
the spec semantics require it to mean the first. Ugh. Perhaps AS
ASSIGNMENT ONLY for the second case?
On looking more closely, SQL99 appears to define user-defined casts as
invocable *only* in explicit cast and assignment contexts. Part 2 sez:
4.13 Data conversions
Explicit data conversions can be specified by a CAST operator.
A CAST operator defines how values of a source data type are
converted into a value of a target data type according to
the Syntax Rules and General Rules of Subclause 6.22, "<cast
specification>". Data conversions between predefined data types
and between constructed types are defined by the rules of this part
of ISO/IEC 9075. Data conversions between one or more user-defined
types are defined by a user-defined cast.
A user-defined cast identifies an SQL-invoked function, called the
cast function, that has one SQL parameter whose declared type is
the same as the source data type and a result data type that is the
target data type. A cast function may optionally be specified to
be implicitly invoked whenever values are assigned to targets of
its result data type. Such a cast function is called an implicitly
invocable cast function.
This seems to mean that we can get away with defining AS ASSIGNMENT to
mean my second category (implicit in assignment only), and then picking
some more natural term for my first category (implicit anywhere).
I favor using IMPLICIT, which would make the syntax of CREATE CAST be
CREATE CAST (sourcetype AS targettype)
WITH FUNCTION funcname (argtype)
[ AS ASSIGNMENT | IMPLICIT ]
CREATE CAST (sourcetype AS targettype)
WITHOUT FUNCTION
[ AS ASSIGNMENT | IMPLICIT ]
Or possibly it should be AS IMPLICIT?
Comments?
regards, tom lane
Tom Lane wrote:
I favor using IMPLICIT, which would make the syntax of CREATE CAST be
CREATE CAST (sourcetype AS targettype)
WITH FUNCTION funcname (argtype)
[ AS ASSIGNMENT | IMPLICIT ]CREATE CAST (sourcetype AS targettype)
WITHOUT FUNCTION
[ AS ASSIGNMENT | IMPLICIT ]Or possibly it should be AS IMPLICIT?
I think AS IMPLICIT would be better because we have other AS [var]
clauses.
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
Tom Lane wrote:
Note that if you write, say,
set numericcol = numericcol * 3.14159;
my proposal would do the "right thing" since the constant would be typed
as numeric to start with and would stay that way. To do what you want
with a float variable, it'd be necessary to write
set numericcol = numericcol * float4col::numeric;
which is sort of ugly; but no uglier than
set float4col = float4col * numericcol::float4;
which is what you'd have to write if the system preferred numeric and
you wanted the other behavior.
I need a clarification. In the non-assignment case, does:
WHERE numericcol = numericcol * 3.14159
evaluate "numericcol * 3.14159" as a numeric?
And does:
WHERE 5.55 = numericcol * 3.14159
evaluate "numericcol * 3.14159" as a numeric too?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073