Statistical Lacunae in Interval type
Kind people,
I just ran across this, and was wondering whether it's worth a
back-patch. The interval type has an aggregate for average (AVG), but
not one for standard deviation (STDDEV) or variance (VARIANCE).
Is this a bug? Is there some problem with defining variance over
intervals?
TIA for any pointers in the right direction...
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778
Remember to vote!
David Fetter wrote:
I just ran across this, and was wondering whether it's worth a
back-patch.
New features are not back-patched.
The interval type has an aggregate for average (AVG),
but not one for standard deviation (STDDEV) or variance (VARIANCE).Is this a bug?
No, it's a missing feature. :-)
Is there some problem with defining variance over
intervals?
If all the operations that are used as part of the calculation of stddev
are available for intervals, then I don't see one.
David Fetter <david@fetter.org> writes:
I just ran across this, and was wondering whether it's worth a
back-patch. The interval type has an aggregate for average (AVG), but
not one for standard deviation (STDDEV) or variance (VARIANCE).
AFAICS, stddev/variance require the concept of multiplying two input
values together (square, and also square root, are in the formulas).
I don't know what it means to multiply two intervals --- there's no
such operator in Postgres, anyway.
You could possibly approximate the behavior you want with something
like
stddev(extract(epoch from interval_col))
which mashes the intervals down to seconds.
regards, tom lane
Tom Lane wrote:
AFAICS, stddev/variance require the concept of multiplying two input
values together (square, and also square root, are in the formulas).
I don't know what it means to multiply two intervals --- there's no
such operator in Postgres, anyway.
The problem is not much different than recording temperature
measurements in a numeric column and then taking the standard
deviation. Kelvin squared does not make much sense, but it's only an
intermediate quantity.
The problem is that an interval datum already implies the units, so in
order to allow interval * interval we would have to add a new type
"interval squared", which would probably be considered to be a bit
foolish.
Peter Eisentraut <peter_e@gmx.net> writes:
The problem is that an interval datum already implies the units, so in
order to allow interval * interval we would have to add a new type
"interval squared", which would probably be considered to be a bit
foolish.
Not only foolish but complicated. Remember that interval internally
is "N months plus X seconds" (where N is integral but X needn't be).
To avoid losing information, a product datatype would have to look
something like "N months-squared plus X months-seconds plus Y
seconds-squared", which offers no intuition whatever about how to
operate on it. I doubt there's even a unique way to define
square-rooting this.
Add on top the fact that we really need to change interval to be
"M months plus N days plus X seconds" to solve the ever-popular
daylight-savings-transition issues, and a product datatype would
get out of hand altogether.
When I said "mash it down to seconds first", I was speaking very
literally...
regards, tom lane
On Mon, Jul 12, 2004 at 11:10:34AM -0400, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
The problem is that an interval datum already implies the units,
so in order to allow interval * interval we would have to add a
new type "interval squared", which would probably be considered to
be a bit foolish.Not only foolish but complicated. Remember that interval internally
is "N months plus X seconds" (where N is integral but X needn't be).
To avoid losing information, a product datatype would have to look
something like "N months-squared plus X months-seconds plus Y
seconds-squared", which offers no intuition whatever about how to
operate on it. I doubt there's even a unique way to define
square-rooting this.
That's kinda what I was afraid of. If an interval were defined
internally as a unique number of seconds, it would be easy.
Add on top the fact that we really need to change interval to be "M
months plus N days plus X seconds" to solve the ever-popular
daylight-savings-transition issues, and a product datatype would get
out of hand altogether.
Yeah.
When I said "mash it down to seconds first", I was speaking very
literally...
OK. So it looks like (oddly) interval can have a std. deviation,
which is measured in seconds, but not a variance. Is that pretty
close?
Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778
Remember to vote!