Re: FWD: overlaps() bug?
Note the third row in the query result below is in error. The four hour
interval (2300UTC - 0300UTC) does not overlap the interval 1530UTC-1627UTC).
Is this a bug?
No. It conforms to (my reading of) the SQL99 spec. So it is a feature,
even if I misread the spec. Which I think I didn't ;) But if I did, then
we can change the implementation of course.
I've included the relevant part of the spec below. It seems clause (3)
requires that we reorder the arguments to OVERLAPS, though perhaps
someone would like to research whether TIME is allowed to be used with
OVERLAPS at all (if not, then we could make up the rules ourselves).
It would be cool if timetz (or time) datatypes were to wrap properly
across day boundaries (i.e. if start time < stop time then assume start time
is day before) but at the very least, the overlaps functions should not lie
to you!
Some parts of the spec aren't cool, or interfer with coolness. This may
be one of them. If everything conforms to the standard, then we can
start discussing whether that part of the standard is so brain-dead as
to be useless or likely to directly cause damage.
But in your case, choosing to record only times but then expecting the
code to respect a day boundary seems to be an assumption which could
bite you in other ways later. What happens when an interval happens to
be longer than a day??
hth
- Thomas
(omit some text defining the input as "(D1, E1) OVERLAPS (D2, E2)" as
the input to the OVERLAPS operator)
3) If D1 is the null value or if E1 < D1, then let S1 = E1 and let
T1 = D1. Otherwise, let S1 = D1 and let T1 = E1.
4) Case:
a) If the most specific type of the second field of <row value
expression 2> is a datetime data type, then let E2 be the
value of the second field of <row value expression 2>.
b) If the most specific type of the second field of <row value
expression 2> is INTERVAL, then let I2 be the value of the
second field of <row value expression 2>. Let E2 = D2 + I2.
5) If D2 is the null value or if E2 < D2, then let S2 = E2 and let
T2 = D2. Otherwise, let S2 = D2 and let T2 = E2.
6) The result of the <overlaps predicate> is the result of the
following expression:
( S1 > S2 AND NOT ( S1 >= T2 AND T1 >= T2 ) )
OR
( S2 > S1 AND NOT ( S2 >= T1 AND T2 >= T1 ) )
OR
( S1 = S2 AND ( T1 <> T2 OR T1 = T2 ) )
Import Notes
Reference msg id not found: 1549.1013642873@sss.pgh.pa.us
(back on list; it is an interesting discussion imho)
Thanks for the enlightening reply. It seems self-evident to me that if
following a specification results in a mis-assertion (as demonstrated in my
test case) then either the specification contains an error in logic or the
application logic is in violation of a predicating assumption inherent in
the specification. In this case, I think the latter applies. The "swap
inputs if E1 < D1" logic is predicated on the assumption of an Euclidean
space while time-of-day data points form an essentially cylindrical space.
The correct
logic for this type of space is:
if E1 < D1
return (D2, E2) not overlap (D1,E1)
else
return (D1,E1) overlap (D2,E2)
where the overlap function above is the Euclidean overlap as currently
defined.
Yup. But correct and intuitive may not be the same in this case, given
the guidance of the SQL99 spec as I understand it.
"But in your case, choosing to record only times but then expecting the
code to respect a day boundary ..."
The problem I am trying to solve is completely generic. Periodic schedules
are quite common and are usually _not_ associated with specific dates.
Examples are television broadcast schedules, shipping and routing schedules,
maintenance schedules and the like.
Sure, I agree. And changing the second argument to an interval does not
help, since the spec seems to call for some implicit math which turns it
into exactly the case you already see.
The problem you allude to (intervals that exceed 24 hours) would seem to me
to abuse of the time (of day) data type. As a programmer I would expect to
have to handle such corner cases as exceptions (although the overlap
function could easily handle this case since _any_ time interval overlaps an
interval that exceeds 24 hours!) What I don't expect is for a built-in
Boolean function to lie to me when used according to the published API!
Violating a specification's underlying assumption is the same as violating
the specification. One should either re-write the overlap function to
prperly handle time/timetz data points or eliminate the overlap function for
the time data altogether. As it stands, it is broken and dangerous.
Sorry, I haven't yet made the leap from taking the spec literally (as I
think we have done) to somehow violating the spec's underlying
assumption. Clearly the spec puts TIME and TIME WITH TIME ZONE into the
same "datetime data type" category discussed in the OVERLAPS definition.
What "underlying assumption" are you referring to? I *know* that this
particular case seems to lead to non-intuitive behavior, and I've made
the argument before that we should violate a spec if it is sufficiently
damaged, but I'm not sure that we should make that leap here. I'm not
actually arguing against it, other than we should be inclined by default
to follow the spec.
Comments?
- Thomas
Show quoted text
Note the third row in the query result below is in error. The four hour
interval (2300UTC - 0300UTC) does not overlap the interval1530UTC-1627UTC).
Is this a bug?
No. It conforms to (my reading of) the SQL99 spec. So it is a feature,
even if I misread the spec. Which I think I didn't ;) But if I did, then
we can change the implementation of course.I've included the relevant part of the spec below. It seems clause (3)
requires that we reorder the arguments to OVERLAPS, though perhaps
someone would like to research whether TIME is allowed to be used with
OVERLAPS at all (if not, then we could make up the rules ourselves).It would be cool if timetz (or time) datatypes were to wrap properly
across day boundaries (i.e. if start time < stop time then assume starttime
is day before) but at the very least, the overlaps functions should not
lie
to you!
Some parts of the spec aren't cool, or interfer with coolness. This may
be one of them. If everything conforms to the standard, then we can
start discussing whether that part of the standard is so brain-dead as
to be useless or likely to directly cause damage.But in your case, choosing to record only times but then expecting the
code to respect a day boundary seems to be an assumption which could
bite you in other ways later. What happens when an interval happens to
be longer than a day??hth
- Thomas
(omit some text defining the input as "(D1, E1) OVERLAPS (D2, E2)" as
the input to the OVERLAPS operator)3) If D1 is the null value or if E1 < D1, then let S1 = E1 and let
T1 = D1. Otherwise, let S1 = D1 and let T1 = E1.
4) Case:
a) If the most specific type of the second field of <row value
expression 2> is a datetime data type, then let E2 be the
value of the second field of <row value expression 2>.
b) If the most specific type of the second field of <row value
expression 2> is INTERVAL, then let I2 be the value of the
second field of <row value expression 2>. Let E2 = D2 + I2.
5) If D2 is the null value or if E2 < D2, then let S2 = E2 and let
T2 = D2. Otherwise, let S2 = D2 and let T2 = E2.
6) The result of the <overlaps predicate> is the result of the
following expression:
( S1 > S2 AND NOT ( S1 >= T2 AND T1 >= T2 ) )
OR
( S2 > S1 AND NOT ( S2 >= T1 AND T2 >= T1 ) )
OR( S1 = S2 AND ( T1 <> T2 OR T1 = T2 ) )
Import Notes
Reference msg id not found: 08E9E30FCA2CD5119BEA0090274066DF01D31305@axcs16.cos.agilent.com | Resolved by subject fallback
Subject: Re: FWD: overlaps() bug?
(back on list; it is an interesting discussion imho)
What I don't expect is for a built-in
Boolean function to lie to me when used according to the published API!
Violating a specification's underlying assumption is the same as
violating
the specification. One should either re-write the overlap function to
properly handle time/timetz data points or eliminate the overlap function
for the time data altogether. As it stands, it is broken and dangerous.
Sorry, I haven't yet made the leap from taking the spec literally (as I
think we have done) to somehow violating the spec's underlying
assumption. Clearly the spec puts TIME and TIME WITH TIME ZONE into the
same "datetime data type" category discussed in the OVERLAPS definition.
I have to disagree. The datetime data points form a non-periodic, Euclidean
space, extending forward and backward to what passes for forever. This is
_not_ the case with time/timetz data points. They form a periodic, wrapped
space which require different operators, much the same way that
trigonometric functions differ from their Euclidean counterparts.
What "underlying assumption" are you referring to?
..the assumption of a Euclidean space. It is not specifically spelled out in
the specification but the logic (swap inputs if end_point < start_point)is
only valid for a non-wrapping space. As mentioned above, TIME and TIME WITH
TIME ZONE data points are periodic and form a cylindrical, wrapped space.
I *know* that this
particular case seems to lead to non-intuitive behavior,
You mean non-intuitive as in incorrect??
and I've made
the argument before that we should violate a spec if it is sufficiently
damaged,
IMHO the spec is not damaged. It just doesn't cover the type of data we are
attempting to apply it to in this case.
but I'm not sure that we should make that leap here. I'm not
actually arguing against it, other than we should be inclined by default
to follow the spec.
Comments?
- Thomas
Specs are a good thing and should be adhered to. We should not however
blindly
follow them off a cliff. If a function can not be implemented that both
follows the spec and gives the right answer then IMHO the function should
not be implemented. At least this way the user knows he/she has to implement
thier own. The way it stands the result is the programmers worst enemy, the
silent error. However, by my reading of the spec, it is silent on the
correct implementation of overlap for TIME data and therefore we should be
free to do the right thing.
Jeff
Show quoted text
Note the third row in the query result below is in error. The four hour
interval (2300UTC - 0300UTC) does not overlap the interval1530UTC-1627UTC).
Is this a bug?
No. It conforms to (my reading of) the SQL99 spec. So it is a feature,
even if I misread the spec. Which I think I didn't ;) But if I did, then
we can change the implementation of course.I've included the relevant part of the spec below. It seems clause (3)
requires that we reorder the arguments to OVERLAPS, though perhaps
someone would like to research whether TIME is allowed to be used with
OVERLAPS at all (if not, then we could make up the rules ourselves).It would be cool if timetz (or time) datatypes were to wrap properly
across day boundaries (i.e. if start time < stop time then assume starttime
is day before) but at the very least, the overlaps functions should not
lie
to you!
Some parts of the spec aren't cool, or interfer with coolness. This may
be one of them. If everything conforms to the standard, then we can
start discussing whether that part of the standard is so brain-dead as
to be useless or likely to directly cause damage.But in your case, choosing to record only times but then expecting the
code to respect a day boundary seems to be an assumption which could
bite you in other ways later. What happens when an interval happens to
be longer than a day??hth
- Thomas
(omit some text defining the input as "(D1, E1) OVERLAPS (D2, E2)" as
the input to the OVERLAPS operator)3) If D1 is the null value or if E1 < D1, then let S1 = E1 and let
T1 = D1. Otherwise, let S1 = D1 and let T1 = E1.
4) Case:
a) If the most specific type of the second field of <row value
expression 2> is a datetime data type, then let E2 be the
value of the second field of <row value expression 2>.
b) If the most specific type of the second field of <row value
expression 2> is INTERVAL, then let I2 be the value of the
second field of <row value expression 2>. Let E2 = D2 + I2.
5) If D2 is the null value or if E2 < D2, then let S2 = E2 and let
T2 = D2. Otherwise, let S2 = D2 and let T2 = E2.
6) The result of the <overlaps predicate> is the result of the
following expression:
( S1 > S2 AND NOT ( S1 >= T2 AND T1 >= T2 ) )
OR
( S2 > S1 AND NOT ( S2 >= T1 AND T2 >= T1 ) )
OR( S1 = S2 AND ( T1 <> T2 OR T1 = T2 ) )
Attachments:
Import Notes
Resolved by subject fallback
...
Specs are a good thing and should be adhered to. We should not however
blindly follow them off a cliff.
Right.
If a function can not be implemented that both
follows the spec and gives the right answer then IMHO the function should
not be implemented. At least this way the user knows he/she has to implement
thier own. The way it stands the result is the programmers worst enemy, the
silent error. However, by my reading of the spec, it is silent on the
correct implementation of overlap for TIME data and therefore we should be
free to do the right thing.
Hmm. We are discussing this so that all viewpoints and interpretations
are uncovered. But afaict the spec is very clear on this by the fact
that it does *not* call for exceptions or differences in the
implementation for the "datetime data types". And in the spec it does
not specify only TIMESTAMP variants for use in OVERLAPS. If the spec
calls for a particular behavior (which it seems to in this case) then
*someone* is going to be disappointed here; either you because it does
not do what you want or someone else who knows the spec and finds that
it does not do what they expect.
Extra pairs of eyes are helpful here; can anyone see that TIME is
excluded from the types defined for OVERLAPS (which would free us to Do
It Our Way) or if the spec calls for an implementation different from
the part of the spec I found (which might be The Right Way)?
If we end up agreeing as a group on what the spec calls for, and if it
turns out that it doesn't call for a wrapping behavior on time
boundaries, then you *could* fix this with your own function which
checks for wrapping behavior and acts accordingly.
We could also implement a separate function for time types which Does
The Right Thing.
- Thomas
Thomas Lockhart writes:
Extra pairs of eyes are helpful here; can anyone see that TIME is
excluded from the types defined for OVERLAPS (which would free us to Do
It Our Way) or if the spec calls for an implementation different from
the part of the spec I found (which might be The Right Way)?
No, the current implementation is correct.
The drawback with redefining the time data type to be a circular number
line is that it leads to definitional problems in other areas of the
arithmetic. For example, what would the result of
time '3:00' - time '23:00'
have to be?
A wrapping time type would probably be useful, but not when it shadows the
standard type.
--
Peter Eisentraut peter_e@gmx.net