Absolute value of intervals
I couldn't find the operator '@' for intervals and found this thread
from over six years ago:
http://archives.postgresql.org/pgsql-general/2003-09/msg00292.php
| "Claudio Lapidus" <clapidus@hotmail.com> writes:
| > Bruce Momjian wrote:
| >> Why would you want an abolute value of a negative interval?
|
| > Because I'm trying to match pairs of records that satisfy certain criteria,
|
| Given that we have a unary-minus operator for intervals, I see no
| conceptual objection to having an absolute-value operator (and \do shows
| that interval is the only standard datatype that has the former but not
| the latter).
|
| However, given that it doesn't seem to be a really widely useful
| operator, I think this is the kind of itch that you'll have to scratch
| yourself. Send us a patch and it'll get into the next release ...
|
| regards, tom lane
Is this is the case now? I have some data that is related but requires
fuzzy joining on timestamps within a time interval.
I'd like to be able to do this:
select * from enviados e, recibidos r where @ (e.fecha - r.fecha) <
interval '1 second'
rather than this:
select * from enviados e, recibidos r where (e.fecha - r.fecha) <
interval '1 second' AND (r.fecha - e.fecha) < interval '1 second'
or this:
select * from enviados e, recibidos r where (r.fecha + interval '1
seconds', r.fecha - interval '1 seconds') OVERLAPS (e.fecha, e.fecha);
If such an operator doesn't exist yet, I'm keen to try to generate a
patch and tests; but I could use some pointers as to which project
files that should be related to such a change.
Regards,
-Joshua Berry
On Tue, Oct 27, 2009 at 11:27:17AM -0300, Joshua Berry wrote:
I couldn't find the operator '@' for intervals
A simple SQL implementation would look like:
CREATE FUNCTION absinterval(interval) RETURNS interval
IMMUTABLE LANGUAGE sql AS 'SELECT greatest($1,-$1)';
CREATE OPERATOR @ ( PROCEDURE = absinterval, RIGHTARG = interval );
or is a C version really needed?
--
Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> writes:
On Tue, Oct 27, 2009 at 11:27:17AM -0300, Joshua Berry wrote:
I couldn't find the operator '@' for intervals
A simple SQL implementation would look like:
CREATE FUNCTION absinterval(interval) RETURNS interval
IMMUTABLE LANGUAGE sql AS 'SELECT greatest($1,-$1)';
CREATE OPERATOR @ ( PROCEDURE = absinterval, RIGHTARG = interval );
or is a C version really needed?
I think this came up again recently and somebody pointed out that the
correct definition isn't as obvious as all that. The components of
an interval can have different signs, so should abs('-1 day 1 hour') be
'1 day -1 hour' or '1 day 1 hour'? Or what about corner cases like
'1 day -25 hours'?
regards, tom lane
On Tue, Oct 27, 2009 at 10:55:31AM -0400, Tom Lane wrote:
Sam Mason <sam@samason.me.uk> writes:
On Tue, Oct 27, 2009 at 11:27:17AM -0300, Joshua Berry wrote:
I couldn't find the operator '@' for intervals
A simple SQL implementation would look like:
CREATE FUNCTION absinterval(interval) RETURNS interval
IMMUTABLE LANGUAGE sql AS 'SELECT greatest($1,-$1)';
CREATE OPERATOR @ ( PROCEDURE = absinterval, RIGHTARG = interval );I think this came up again recently and somebody pointed out that the
correct definition isn't as obvious as all that.
Hum, I think it is! :)
The components of
an interval can have different signs, so should abs('-1 day 1 hour') be
'1 day -1 hour' or '1 day 1 hour'? Or what about corner cases like
'1 day -25 hours'?
Funny, I used exactly that example when playing---although I spelled it
'-1 day 25:00:00'!
It all comes down to how you define things. I'd say my quick hack does
the "right" thing, but yes I should have pointed out that the interval
type has subs-structure that makes it's behavior non-obvious. My
intuition as to why it's correct worked along these lines:
1) '10' can be defined as '1 hundred -90 units'.
2.1) negating '10' gives '-10'.
2.2) negating the other gives '-1 hundred 90 units'.
3) give 'hundred' the value of '100' and 'units' the value '1' and
check if things sum up.
If the absolute value of an interval was defined to strip out all the
negation signs you'd get the "wrong" answers out. The awkward thing
with intervals is the the components are not all of the same units, but
I think the argument stands.
--
Sam http://samason.me.uk/
On Tue, Oct 27, 2009 at 03:25:02PM +0000, Sam Mason wrote:
If the absolute value of an interval was defined to strip out all the
negation signs you'd get the "wrong" answers out.
Oops, forgot another reason! For maths to work (n) and (-(-n)) should
evaluate to the same value. Inverting all the signs, as negation does,
will ensure that these semantics remain.
--
Sam http://samason.me.uk/
Joshua Berry wrote:
I couldn't find the operator '@' for intervals and found this thread
from over six years ago:
http://archives.postgresql.org/pgsql-general/2003-09/msg00292.php| "Claudio Lapidus" <clapidus@hotmail.com> writes:
| > Bruce Momjian wrote:
| >> Why would you want an abolute value of a negative interval?
|
| > Because I'm trying to match pairs of records that satisfy certain criteria,
|
| Given that we have a unary-minus operator for intervals, I see no
| conceptual objection to having an absolute-value operator (and \do shows
| that interval is the only standard datatype that has the former but not
| the latter).
|
| However, given that it doesn't seem to be a really widely useful
| operator, I think this is the kind of itch that you'll have to scratch
| yourself. Send us a patch and it'll get into the next release ...
|
| regards, tom laneIs this is the case now? I have some data that is related but requires
fuzzy joining on timestamps within a time interval.I'd like to be able to do this:
select * from enviados e, recibidos r where @ (e.fecha - r.fecha) <
interval '1 second'rather than this:
select * from enviados e, recibidos r where (e.fecha - r.fecha) <
interval '1 second' AND (r.fecha - e.fecha) < interval '1 second'or this:
select * from enviados e, recibidos r where (r.fecha + interval '1
seconds', r.fecha - interval '1 seconds') OVERLAPS (e.fecha, e.fecha);If such an operator doesn't exist yet, I'm keen to try to generate a
patch and tests; but I could use some pointers as to which project
files that should be related to such a change.Regards,
-Joshua Berry
You should test for a positive or negative interval against INTERVAL '0
seconds' because you can have a positive interval that is a fraction of
a second.
But we've got two projects that implement a period data type, pgTemporal
and Chronos.
http://pgfoundry.org/projects/temporal/
http://pgfoundry.org/projects/timespan/
Scott Bailey
I think this came up again recently and somebody pointed out that the
correct definition isn't as obvious as all that. The components of
an interval can have different signs, so should abs('-1 day 1 hour') be
'1 day -1 hour' or '1 day 1 hour'? Or what about corner cases like
'1 day -25 hours'?
I agree with Sam. The absolute value of a negative interval should be
equidistant from zero, not the removal of negative signs. So abs('-1 day
1 hour') should be ('1 day -1 hour'). I don't think your corner case is
any different. So his function and operator should be perfectly valid.
But there is some ambiguity around the length of a month. So INTERVAL '1
month - 30 days' = INTERVAL '0 days' = INTERVAL '-1 month +30 days'.
But when added to a date, it makes no change for months with 30 days,
adds 1 day for months with 31 days and subtracts 2 days for February.
Scott Bailey
On Thu, 2009-10-29 at 16:39 -0700, Scott Bailey wrote:
But there is some ambiguity around the length of a month. So INTERVAL '1
month - 30 days' = INTERVAL '0 days' = INTERVAL '-1 month +30 days'.
But when added to a date, it makes no change for months with 30 days,
adds 1 day for months with 31 days and subtracts 2 days for February.
Yes, that is a strange case. When you can't tell if an interval is
positive or negative, how do you define the absolute value?
I think that's a strong argument not to provide an absolute value
function for INTERVALs.
Regards,
Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes:
Yes, that is a strange case. When you can't tell if an interval is
positive or negative, how do you define the absolute value?
That was the point of my '1 day -25 hours' example. Whether you
consider that positive or negative seems mighty arbitrary.
regards, tom lane
On Fri, Oct 30, 2009 at 12:55:51AM -0400, Tom Lane wrote:
Jeff Davis <pgsql@j-davis.com> writes:
Yes, that is a strange case. When you can't tell if an interval is
positive or negative, how do you define the absolute value?That was the point of my '1 day -25 hours' example. Whether you
consider that positive or negative seems mighty arbitrary.
My personal feeling is that when you provide any ordering operator and
negation you can easily provide an absolute value operator. We've
already (somewhat arbitrarily) decided that one of '1month -30days' and
'-1month 30days) is "greater" than the other, so why not provide an
operator that returns the "greater" of an interval value and its own
negation?
--
Sam http://samason.me.uk/
On 10/30/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Davis <pgsql@j-davis.com> writes:
Yes, that is a strange case. When you can't tell if an interval is
positive or negative, how do you define the absolute value?That was the point of my '1 day -25 hours' example. Whether you
consider that positive or negative seems mighty arbitrary.
If I can add it to a timestamp and get a deterministic result,
then we already have decided how to interpret the arbitrariness.
Might as well be consistent then.
--
marko
On Fri, Oct 30, 2009 at 01:45:24PM +0200, Marko Kreen wrote:
On 10/30/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
That was the point of my '1 day -25 hours' example. Whether you
consider that positive or negative seems mighty arbitrary.If I can add it to a timestamp and get a deterministic result,
then we already have decided how to interpret the arbitrariness.
The point is that it's only in relation to a specific timestamp do the
components of an interval actually receive comparable values. For
example, a day can be a varying number of hours, a month a varying
number of days and a year can also vary in its number of days. Once
you add this to a date then these components get fixed, but without any
specific timestamp to work with these components remain undefined.
I'd argue that it doesn't help to know that we can add it to a timestamp
and get out a deterministic result. It's the fact that we have a
deterministic comparison operator that helps us.
--
Sam http://samason.me.uk/
On 10/30/09, Sam Mason <sam@samason.me.uk> wrote:
On Fri, Oct 30, 2009 at 01:45:24PM +0200, Marko Kreen wrote:
On 10/30/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
That was the point of my '1 day -25 hours' example. Whether you
consider that positive or negative seems mighty arbitrary.If I can add it to a timestamp and get a deterministic result,
then we already have decided how to interpret the arbitrariness.The point is that it's only in relation to a specific timestamp do the
components of an interval actually receive comparable values. For
example, a day can be a varying number of hours, a month a varying
number of days and a year can also vary in its number of days. Once
you add this to a date then these components get fixed, but without any
specific timestamp to work with these components remain undefined.I'd argue that it doesn't help to know that we can add it to a timestamp
and get out a deterministic result. It's the fact that we have a
deterministic comparison operator that helps us.
Slightly makes sense, but only slightly. We deterministically know,
that we dont have certain timestamp, thus we need to use some default
values. We already have situation that does that:
extract(epoch from interval)
Yes, some cases the value returned is not the same value that would
be added to a specific timestamp, but so what? How is current situation
better that we force users to manually create potentially buggy
equivalent functionality?
--
marko
On Fri, Oct 30, 2009 at 02:14:31PM +0200, Marko Kreen wrote:
Slightly makes sense, but only slightly. We deterministically know,
that we dont have certain timestamp, thus we need to use some default
values. We already have situation that does that:extract(epoch from interval)
You're arguing the same point as me. Your extract code and my
comparison operator use exactly the same values as defaults when
normalizing their respective intervals. Neither of them have anything
to do with timestamps.
Yes, some cases the value returned is not the same value that would
be added to a specific timestamp, but so what? How is current situation
better that we force users to manually create potentially buggy
equivalent functionality?
Tom was arguing that it's fundamentally inappropriate to ask for the
absolute value of an interval. I was saying that we've already chosen
arbitrary values for the components of an interval for comparison and
you've just pointed out that we use the same values elsewhere. Once
we've chosen them I don't see why we shouldn't extend them to all the
places that they seem to fit, such as this absolute value operator.
I think the attached trivial bit code should do the right thing, however
I don't know what else is needed to hook everything up.
--
Sam http://samason.me.uk/
Attachments:
interval_abs.patchtext/x-diff; charset=us-asciiDownload+14-0
Sam Mason wrote:
+ Datum + interval_abs(PG_FUNCTION_ARGS) + { + Interval *interval1 = PG_GETARG_INTERVAL_P(0); + Interval *interval2 = PG_GETARG_INTERVAL_P(1);
Surely it must receive a single argument?
--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.
On Fri, Oct 30, 2009 at 11:39:26AM -0300, Alvaro Herrera wrote:
Sam Mason wrote:
+ Datum + interval_abs(PG_FUNCTION_ARGS) + { + Interval *interval1 = PG_GETARG_INTERVAL_P(0); + Interval *interval2 = PG_GETARG_INTERVAL_P(1);Surely it must receive a single argument?
Indeed it must, trying to write other code at the same time is a good
recipe for getting myself in a mess!
--
Sam http://samason.me.uk/
Attachments:
interval_abs.patchtext/x-diff; charset=us-asciiDownload+18-0
My personal feeling is that when you provide any ordering operator and
negation you can easily provide an absolute value operator. We've
already (somewhat arbitrarily) decided that one of '1month -30days' and
'-1month 30days) is "greater" than the other, so why not provide an
operator that returns the "greater" of an interval value and its own
negation?
Technically, greater doesn't arbitrarily decide one is greater than the
other. It determines the two are equivalent and (correctly) chooses the
leftmost one.
I think it is important to separate the concept of an interval with
addition of an interval with a timestamp. By (the interval type's)
definition a day is 24 hours, a month is 30 days, a year is 365.25 days.
And the user needs to understand that abs and extract epoch do their
calculations based on those definitions rather than what would happen
when applied to an arbitrary timestamp.
To say that extract epoch can determine the number of seconds in an
interval, while saying that you can not determine the absolute value of
an interval is not logical. Either you can do both or you can do neither.
Postgres intervals internally have an 8 byte microsecond part, a 4 byte
day part and a 4 byte month part. I would argue that there is no
ambiguity with the second (technically microsecond), and day parts of
intervals and that ambiguity is introduced with the month part. A day is
always 24 hours UTC. (However some times our timezones change.) And we
ignore leap seconds. All intervals that result timestamp subtraction
ONLY use the microsecond and day pieces in the resulting interval. This
is probably why most other databases have two interval types. One for
storing precise intervals (DAY TO SECOND) and one for fuzzy intervals
(YEAR TO MONTH).
Now I think that Postgres' interval implementation is much nicer to work
with than the others. But perhaps things like extract epoch and abs
should exhibit different behaviors when the month part is used.
Consider the following:
SELECT mos,
EXTRACT(EPOCH FROM INTERVAL '1 month' * mos) / 86400 AS days
FROM generate_series(9, 26) mos;
mos | days
-----+--------
9 | 270
10 | 300
11 | 330
12 | 365.25
13 | 395.25
14 | 425.25
15 | 455.25
16 | 485.25
17 | 515.25
18 | 545.25
19 | 575.25
20 | 605.25
21 | 635.25
22 | 665.25
23 | 695.25
24 | 730.5
25 | 760.5
26 | 790.5
On 30 Oct 2009, at 21:09, Scott Bailey wrote:
My personal feeling is that when you provide any ordering operator
and
negation you can easily provide an absolute value operator. We've
already (somewhat arbitrarily) decided that one of '1month -30days'
and
'-1month 30days) is "greater" than the other, so why not provide an
operator that returns the "greater" of an interval value and its own
negation?Technically, greater doesn't arbitrarily decide one is greater than
the other. It determines the two are equivalent and (correctly)
chooses the leftmost one.I think it is important to separate the concept of an interval with
addition of an interval with a timestamp. By (the interval type's)
definition a day is 24 hours, a month is 30 days, a year is 365.25
days. And the user needs to understand that abs and extract epoch do
their calculations based on those definitions rather than what would
happen when applied to an arbitrary timestamp.
There's a slight complication to this approach; what happens if you
ask for <timestamp> + abs(<interval>)?
You don't want to calculate the result of abs() based on a 24h day, a
30d month and a 365.25d year as there is a timestamp to base your
calculations on, but AFAIK you can't see that from within the abs()
function implementation. Unless you store that information in the
context somehow.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:737,4aec24e711071499813979!
On Fri, Oct 30, 2009 at 01:09:30PM -0700, Scott Bailey wrote:
Sam Mason wrote:
My personal feeling is that when you provide any ordering operator and
negation you can easily provide an absolute value operator. We've
already (somewhat arbitrarily) decided that one of '1month -30days' and
'-1month 30days) is "greater" than the other, so why not provide an
operator that returns the "greater" of an interval value and its own
negation?Technically, greater doesn't arbitrarily decide one is greater than the
other. It determines the two are equivalent and (correctly) chooses the
leftmost one.
where "correctly" has various provisos attached.
I think it is important to separate the concept of an interval with
addition of an interval with a timestamp. By (the interval type's)
definition a day is 24 hours, a month is 30 days, a year is 365.25 days.
When I was saying "arbitrary" above; it was in choosing these numbers.
They're reasonable defaults that do the right thing most of the time,
but it's possible to have other values that would give better results in
certain (rare) situations. I don't think we want to go changing things
though, the current values are what most people expect.
To say that extract epoch can determine the number of seconds in an
interval, while saying that you can not determine the absolute value of
an interval is not logical. Either you can do both or you can do neither.
Yes, I agree.
perhaps things like extract epoch and abs
should exhibit different behaviors when the month part is used.mos | days
11 | 330
12 | 365.25
You mean that it should trunc() the result of the months part to
complete days? Instead of doing:
result += ((double) DAYS_PER_YEAR * SECS_PER_DAY) * (interval->month / MONTHS_PER_YEAR);
it should be doing:
result += trunc((interval->month / MONTHS_PER_YEAR) * DAYS_PER_YEAR) * SECS_PER_DAY;
? Not sure if a change such as this could be made though.
--
Sam http://samason.me.uk/
Tom Lane wrote:
Sam Mason <sam@samason.me.uk> writes:
On Tue, Oct 27, 2009 at 11:27:17AM -0300, Joshua Berry wrote:
I couldn't find the operator '@' for intervals
A simple SQL implementation would look like:
CREATE FUNCTION absinterval(interval) RETURNS interval
IMMUTABLE LANGUAGE sql AS 'SELECT greatest($1,-$1)';
CREATE OPERATOR @ ( PROCEDURE = absinterval, RIGHTARG = interval );or is a C version really needed?
I think this came up again recently and somebody pointed out that the
correct definition isn't as obvious as all that. The components of
an interval can have different signs, so should abs('-1 day 1 hour') be
'1 day -1 hour' or '1 day 1 hour'? Or what about corner cases like
'1 day -25 hours'?
I'm writing this at about 8:35 p.m. New York time on October 31, 2009. From
now, adding interval '1 day -25 hours' yields right now, New York time.
--
Lew