Interval aggregate regression failure (expected seems wrong)

Started by Michael Paesoldover 20 years ago49 messageshackers
Jump to latest
#1Michael Paesold
mpaesold@gmx.at

Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I get
a regression failure in the interval tests. I am no export for the
interval type, but the expected "9 days 28 hours" seem wrong, don't
they? The actual value seems to be the same.

Is it possible that this is broken on the platform where the expected
results were generated?

*** ./expected/interval.out Tue Oct 25 19:13:07 2005
--- ./results/interval.out  Mon Nov  7 09:11:27 2005
***************
*** 218,224 ****
   select avg(f1) from interval_tbl;
                          avg
   -------------------------------------------------
!  @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
   (1 row)
   -- test long interval input
--- 218,224 ----
   select avg(f1) from interval_tbl;
                          avg
   -------------------------------------------------
!  @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs
   (1 row)

-- test long interval input

The last commit to interval.out seems to be this one, and it changed
exactly this line.
revision 1.14
date: 2005/10/25 17:13:07; author: tgl; state: Exp; lines: +1 -1

Well, this is CVS tip, so there is a chance this is fixed in
REL_8_1_STABLE which has a 1.14.0.2. At least the release tarball should
be rebuilt, no?
Sorry, if this is just noise. Just wanted to be sure you know about it.

Best Regards,
Michael

#2Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Paesold (#1)
Re: Interval aggregate regression failure (expected seems wrong)

On Nov 7, 2005, at 17:24 , Michael Paesold wrote:

Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I
get a regression failure in the interval tests. I am no export for
the interval type, but the expected "9 days 28 hours" seem wrong,
don't they? The actual value seems to be the same.

Is it possible that this is broken on the platform where the
expected results were generated?

What platform are you testing on? With or without integer-datetimes?

I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3

test=# select version();

version
------------------------------------------------------------------------
----------------------------------------------------------------------
PostgreSQL 8.1.0 on powerpc-apple-darwin8.3.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.0 (GCC) 4.0.0 (Apple Computer, Inc.
build 5026)

I didn't have any regression failures. I'd also expect we'd see a lot
more failures on the build farm if it were the case that it was
broken just on the platform that the expected results were generated
on. From a quick look at the current build farm failures on HEAD and
REL8_1_STABLE, it doesn't look like any of the failures are failing
here.

Michael Glaesemann
grzm myrealbox com

#3Michael Paesold
mpaesold@gmx.at
In reply to: Michael Glaesemann (#2)
Re: Interval aggregate regression failure (expected seems

Michael Glaesemann wrote:

On Nov 7, 2005, at 17:24 , Michael Paesold wrote:

Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I
get a regression failure in the interval tests. I am no export for
the interval type, but the expected "9 days 28 hours" seem wrong,
don't they? The actual value seems to be the same.

Is it possible that this is broken on the platform where the expected
results were generated?

What platform are you testing on? With or without integer-datetimes?

Ok, forgot. This is *without* integer-datetimes, RHEL 3 (Linux 2.4.21,
glibc 2.3.2, gcc 3.2.3 20030502) on i686 (Xeon without x86-64).

I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3

[snip]

I didn't have any regression failures. I'd also expect we'd see a lot
more failures on the build farm if it were the case that it was broken
just on the platform that the expected results were generated on. From
a quick look at the current build farm failures on HEAD and
REL8_1_STABLE, it doesn't look like any of the failures are failing here.

I just started to wonder about buildfarm, too, but found that most build
farm members have --enable-integer-datetimes. Could that be an
explanation? Is it possible that the code is wrong with
--enable-integer-datetimes?

So what do you have in results/interval.out?
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no?

Tom wrote for that commit:
revision 1.14
date: 2005/10/25 17:13:07; author: tgl; state: Exp; lines: +1 -1
Remove justify_hours call from interval_mul and interval_div, and make
some small stylistic improvements in these functions. Also fix several
places where TMODULO() was being used with wrong-sized quotient argument,
creating a risk of overflow --- interval2tm was actually capable of going
into an infinite loop because of this.

Perhaps it is an intended behavior? If so, it still fails without
integer-datetimes.

Best Regards,
Michael Paesold

#4Michael Paesold
mpaesold@gmx.at
In reply to: Michael Paesold (#3)
Re: Interval aggregate regression failure (expected seems

Michael Paesold wrote:

On Nov 7, 2005, at 17:24 , Michael Paesold wrote:

Using both PostgreSQL 8.1.0 and CVS current of Nov 7, 9:00 am CET I
get a regression failure in the interval tests. I am no export for
the interval type, but the expected "9 days 28 hours" seem wrong,
don't they? The actual value seems to be the same.

Is it possible that this is broken on the platform where the
expected results were generated?

Perhaps it is an intended behavior? If so, it still fails without
integer-datetimes.

Well, no, it also fails with integer-datetimes for me in the same way.
pg_config output below. And yes, I did "cvs up; make distclean;
./configure... ; make ; make install ; make check".

Could this be DST-related? I thought plain interval was not affected by
DST changes.

BINDIR = /usr/local/postgresql-8cvs/bin
DOCDIR = /usr/local/postgresql-8cvs/doc
INCLUDEDIR = /usr/local/postgresql-8cvs/include
PKGINCLUDEDIR = /usr/local/postgresql-8cvs/include
INCLUDEDIR-SERVER = /usr/local/postgresql-8cvs/include/server
LIBDIR = /usr/local/postgresql-8cvs/lib
PKGLIBDIR = /usr/local/postgresql-8cvs/lib
LOCALEDIR =
MANDIR = /usr/local/postgresql-8cvs/man
SHAREDIR = /usr/local/postgresql-8cvs/share
SYSCONFDIR = /usr/local/postgresql-8cvs/etc
PGXS = /usr/local/postgresql-8cvs/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/usr/local/postgresql-8cvs' '--with-pgport=54321'
'--with-perl' 'CFLAGS=-O2 -mcpu=pentium4 -march=pentium4'
'--enable-casserts' '--enable-debug' '--enable-integer-datetimes'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -O2 -mcpu=pentium4 -march=pentium4 -Wall -Wmissing-prototypes
-Wpointer-arith -Winline -Wdeclaration-after-statement
-fno-strict-aliasing -g
CFLAGS_SL = -fpic
LDFLAGS = -Wl,-rpath,/usr/local/postgresql-8cvs/lib
LDFLAGS_SL =
LIBS = -lpgport -lz -lreadline -lncurses -lcrypt -lresolv -lnsl -ldl -lm
-lbsd
VERSION = PostgreSQL 8.2devel

Best Regards,
Michael Paesold

#5Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Paesold (#3)
Re: Interval aggregate regression failure (expected seems wrong)

On Nov 7, 2005, at 18:28 , Michael Paesold wrote:

Ok, forgot. This is *without* integer-datetimes, RHEL 3 (Linux
2.4.21, glibc 2.3.2, gcc 3.2.3 20030502) on i686 (Xeon without
x86-64).

I just ran make check on for PostgreSQL 8.1.0 on Mac OS X 10.4.3

Heh. I forgot, too ;) My test was also without integer-datetimes.

[snip]

I didn't have any regression failures. I'd also expect we'd see a
lot more failures on the build farm if it were the case that it
was broken just on the platform that the expected results were
generated on. From a quick look at the current build farm
failures on HEAD and REL8_1_STABLE, it doesn't look like any of
the failures are failing here.

I just started to wonder about buildfarm, too, but found that most
build farm members have --enable-integer-datetimes. Could that be
an explanation? Is it possible that the code is wrong with --enable-
integer-datetimes?

So what do you have in results/interval.out?
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no?

select avg(f1) from interval_tbl;
avg
-------------------------------------------------
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
(1 row)

The point of the change to the interval datatype in 8.1 is to keep
track of months, days, and seconds (which in turn are represented as
hours, minutes and seconds). Previous releases tracked only months
and seconds. This has advantages for using intervals with dates and
timestamps that involve daylight saving time changes. Admittedly, it
looks odd at first, but it falls out of the change in behavior of the
interval datatype. There are two new functions, justify_days and
justify_hours, that you can use to put intervals into more
traditional forms.

http://developer.postgresql.org/docs/postgres/functions-datetime.html

Doesn't explain why you're getting a regression failure though.

Michael Glaesemann
grzm myrealbox com

#6Michael Paesold
mpaesold@gmx.at
In reply to: Michael Glaesemann (#5)
Re: Interval aggregate regression failure (expected seems

Michael Glaesemann wrote:

So what do you have in results/interval.out?
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs seems wrong to me, no?

select avg(f1) from interval_tbl;
avg
-------------------------------------------------
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
(1 row)

The point of the change to the interval datatype in 8.1 is to keep
track of months, days, and seconds (which in turn are represented as
hours, minutes and seconds). Previous releases tracked only months and
seconds. This has advantages for using intervals with dates and
timestamps that involve daylight saving time changes. Admittedly, it
looks odd at first, but it falls out of the change in behavior of the
interval datatype. There are two new functions, justify_days and
justify_hours, that you can use to put intervals into more traditional
forms.

http://developer.postgresql.org/docs/postgres/functions-datetime.html

Thank you very much for the insight.

Doesn't explain why you're getting a regression failure though.

Well, I have something now. It seems to be a compiler/optimization issue.

I wrote:

CFLAGS = -O2 -mcpu=pentium4 -march=pentium4 -Wall -Wmissing-prototypes
-Wpointer-arith -Winline -Wdeclaration-after-statement
-fno-strict-aliasing -g

I had set CFLAGS to -O2 -mcpu=pentium4 -march=pentium4. I have been
using these settings for testing PostgreSQL tip for some time now and
never had any problems.

Removing the cpu and architecture optimization part changes the behavior
of the interval aggrate, so the results/interval.out now also looks like
the expected output.
select avg(f1) from interval_tbl;
avg
-------------------------------------------------
@ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
(1 row)

Switching -mcpu=pentium4 -march=pentium4 back on, results in wrong
output. This is 100% reproducable. Can somebody with more knowledge
explain why the compiler should stumble over just this? Pure luck?

I have tested these combination of CFLAGS:
-O2 OK
-O2 -mcpu=i686 -march=i686 OK (good, RPMS are built with these)
-O2 -mcpu=pentium4 -march=i686 OK
-O2 -mcpu=pentium4 -march=pentium4 fails

I am definatly not going to use -march=pentium4 in any production
system. Should I open a bug report with RedHat (gcc vendor)?

Best Regards,
Michael Paesold

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paesold (#6)
Re: Interval aggregate regression failure (expected seems

Michael Paesold <mpaesold@gmx.at> writes:

I have tested these combination of CFLAGS:
-O2 OK
-O2 -mcpu=i686 -march=i686 OK (good, RPMS are built with these)
-O2 -mcpu=pentium4 -march=i686 OK
-O2 -mcpu=pentium4 -march=pentium4 fails

I am definatly not going to use -march=pentium4 in any production
system. Should I open a bug report with RedHat (gcc vendor)?

Yeah, but they'll probably want a smaller test case than "Postgres fails
its regression tests" :-(

My guess is that the problem is misoptimization of the interval_div()
function (look in utils/adt/timestamp.c), and that you could probably
construct a nice small test case for them out of that function.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: Interval aggregate regression failure (expected seems

I wrote:

Michael Paesold <mpaesold@gmx.at> writes:

I am definatly not going to use -march=pentium4 in any production
system. Should I open a bug report with RedHat (gcc vendor)?

Yeah, but they'll probably want a smaller test case than "Postgres fails
its regression tests" :-(

I have just confirmed that the problem still exists in FC4's current
compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the
priority of the complaint quite a long way in Red Hat's eyes.

I've also confirmed that the problem is in interval_div; you can
reproduce the failure with

select '41 years 1 mon 11 days'::interval / 10;

which should give '4 years 1 mon 9 days 26:24:00', but when
timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4",
you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes
is not relevant because the interesting part is all double/integer
arithmetic.

Looking at this, though, I wonder if the pentium4 answer isn't "more
correct". If I'm doing the math by hand correctly, what we end up
with is having to cascade 3/10ths of a month down into the days field,
and since the conversion factor is supposed to be 30 days/month, that
should be exactly 9 days. Plus the one full day from the 11/10 days
gives 10 days. I think what is happening on all the non-Pentium
platforms is that (3.0/10.0)*30.0 is producing something just a shade
less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly
due to rearrangement of the calculation. I think we can still file this
as a compiler bug, because I'm pretty sure the C spec does not allow
rearrangement of floating-point calculations ... but we might want to
think about tweaking the code's roundoff behavior just a bit.

An example that's a little easier to look at is

select '41 years 1 mon'::interval / 10;

I get '4 years 1 mon 9 days' with the pentium4 optimization, and
'4 years 1 mon 8 days 24:00:00' without, and the former seems more
correct...

regards, tom lane

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: Interval aggregate regression failure (expected seems

Tom Lane <tgl@sss.pgh.pa.us> writes:

I think we can still file this as a compiler bug, because I'm pretty sure
the C spec does not allow rearrangement of floating-point calculations ...

It may have more to do with whether the floating point value can stay in a
floating point register long enough to complete the calculation.

IIRC, floating point registers are actually longer than a double so if the
entire calculation is done in registers and then the result rounded off to
store in memory it may get the right answer. Whereas if it loses the extra
bits on the intermediate values (the infinite repeating fractions) that might
be where you get the imprecise results.

It makes some sense that -mcpu and -march give the compiler enough information
to keep the intermediate results in registers more effectively.

--
greg

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: Interval aggregate regression failure (expected seems

c.f.:

`-ffloat-store'
Do not store floating point variables in registers, and inhibit
other options that might change whether a floating point value is
taken from a register or memory.

This option prevents undesirable excess precision on machines such
as the 68000 where the floating registers (of the 68881) keep more
precision than a `double' is supposed to have. Similarly for the
x86 architecture. For most programs, the excess precision does
only good, but a few programs rely on the precise definition of
IEEE floating point. Use `-ffloat-store' for such programs, after
modifying them to store all pertinent intermediate computations
into variables.

--
greg

#11Gregory Maxwell
gmaxwell@gmail.com
In reply to: Bruce Momjian (#9)
Re: Interval aggregate regression failure (expected seems

On 07 Nov 2005 14:22:37 -0500, Greg Stark <gsstark@mit.edu> wrote:

IIRC, floating point registers are actually longer than a double so if the
entire calculation is done in registers and then the result rounded off to
store in memory it may get the right answer. Whereas if it loses the extra
bits on the intermediate values (the infinite repeating fractions) that might
be where you get the imprecise results.

Hm. I thought -march=pentium4 -mcpu=pentium4 implies -mfpmath=sse.
SSE is a much better choice on P4 for performance reasons, and never
has excess precision. I'm guessing from the above that I'm incorrect,
in which case we should always be compiled with -mfpmath=sse -msse2
when we are complied -march=pentium4, this should remove problems
caused by excess precision. The same behavior can be had on non sse
platforms with -ffloat-store.

#12Michael Paesold
mpaesold@gmx.at
In reply to: Gregory Maxwell (#11)
Re: Interval aggregate regression failure (expected seems

Gregory Maxwell wrote:

On 07 Nov 2005 14:22:37 -0500, Greg Stark <gsstark@mit.edu> wrote:

IIRC, floating point registers are actually longer than a double so if the
entire calculation is done in registers and then the result rounded off to
store in memory it may get the right answer. Whereas if it loses the extra
bits on the intermediate values (the infinite repeating fractions) that might
be where you get the imprecise results.

Hm. I thought -march=pentium4 -mcpu=pentium4 implies -mfpmath=sse.
SSE is a much better choice on P4 for performance reasons, and never
has excess precision. I'm guessing from the above that I'm incorrect,
in which case we should always be compiled with -mfpmath=sse -msse2
when we are complied -march=pentium4, this should remove problems
caused by excess precision. The same behavior can be had on non sse
platforms with -ffloat-store.

Just for the record (and those interested): using 'CFLAGS=-O2
-mcpu=pentium4 -march=pentium4 -mfpmath=sse -msse2' actually passes the
regression tests.

Best Regards,
Michael Paesold

#13Michael Paesold
mpaesold@gmx.at
In reply to: Tom Lane (#8)
Re: Interval aggregate regression failure (expected seems

Tom Lane wrote:

I wrote:

Michael Paesold <mpaesold@gmx.at> writes:

I am definatly not going to use -march=pentium4 in any production
system. Should I open a bug report with RedHat (gcc vendor)?

Yeah, but they'll probably want a smaller test case than "Postgres fails
its regression tests" :-(

I have just confirmed that the problem still exists in FC4's current
compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the
priority of the complaint quite a long way in Red Hat's eyes.

I've also confirmed that the problem is in interval_div; you can
reproduce the failure with

select '41 years 1 mon 11 days'::interval / 10;

[snip]

Would you mind reporting this to RedHat Bugzilla? I believe a bug report
from you would have more weight then mine, because you actually
understand what's going on here. :-)

Otherwise I am going to do do my best...

Best Regards,
Michael Paesold

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paesold (#13)
Re: Interval aggregate regression failure (expected seems

Michael Paesold <mpaesold@gmx.at> writes:

Would you mind reporting this to RedHat Bugzilla? I believe a bug report
from you would have more weight then mine, because you actually
understand what's going on here. :-)

Actually, given the thought that this may be an artifact of keeping an
intermediate value in a wider-than-normal register rather than genuinely
rearranging the computation, I'm not certain it is a compiler bug.
We'd have to study it a lot more closely before filing it as one, anyway.

If you accept the idea that the pentium4 answer is the right one,
then what we really need to do is focus on a better rounding rule than
"strict truncation". I was toying with the notion of adding the
equivalent of half a microsecond to the fractional-day value before
truncating it to integer. But I'm not certain that that wouldn't have
some bad effects in other cases.

regards, tom lane

#15Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#14)
Re: Interval aggregate regression failure (expected seems

Tom Lane wrote:

Michael Paesold <mpaesold@gmx.at> writes:

Would you mind reporting this to RedHat Bugzilla? I believe a bug report
from you would have more weight then mine, because you actually
understand what's going on here. :-)

Actually, given the thought that this may be an artifact of keeping an
intermediate value in a wider-than-normal register rather than genuinely
rearranging the computation, I'm not certain it is a compiler bug.
We'd have to study it a lot more closely before filing it as one, anyway.

If you accept the idea that the pentium4 answer is the right one,
then what we really need to do is focus on a better rounding rule than
"strict truncation". I was toying with the notion of adding the
equivalent of half a microsecond to the fractional-day value before
truncating it to integer. But I'm not certain that that wouldn't have
some bad effects in other cases.

Looking at the code, do we need additional rint() calls in there, or
rint(x + 0.5)? Frankly, I am confused why interval_div() has caused
such problems for us? Are we going at this the right way?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#16Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#8)
Re: Interval aggregate regression failure (expected seems

I assume no more progress has been made on this?

---------------------------------------------------------------------------

Tom Lane wrote:

I wrote:

Michael Paesold <mpaesold@gmx.at> writes:

I am definatly not going to use -march=pentium4 in any production
system. Should I open a bug report with RedHat (gcc vendor)?

Yeah, but they'll probably want a smaller test case than "Postgres fails
its regression tests" :-(

I have just confirmed that the problem still exists in FC4's current
compiler (gcc 4.0.1, gcc-4.0.1-4.fc4), which probably will boost up the
priority of the complaint quite a long way in Red Hat's eyes.

I've also confirmed that the problem is in interval_div; you can
reproduce the failure with

select '41 years 1 mon 11 days'::interval / 10;

which should give '4 years 1 mon 9 days 26:24:00', but when
timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4",
you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes
is not relevant because the interesting part is all double/integer
arithmetic.

Looking at this, though, I wonder if the pentium4 answer isn't "more
correct". If I'm doing the math by hand correctly, what we end up
with is having to cascade 3/10ths of a month down into the days field,
and since the conversion factor is supposed to be 30 days/month, that
should be exactly 9 days. Plus the one full day from the 11/10 days
gives 10 days. I think what is happening on all the non-Pentium
platforms is that (3.0/10.0)*30.0 is producing something just a shade
less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly
due to rearrangement of the calculation. I think we can still file this
as a compiler bug, because I'm pretty sure the C spec does not allow
rearrangement of floating-point calculations ... but we might want to
think about tweaking the code's roundoff behavior just a bit.

An example that's a little easier to look at is

select '41 years 1 mon'::interval / 10;

I get '4 years 1 mon 9 days' with the pentium4 optimization, and
'4 years 1 mon 8 days 24:00:00' without, and the former seems more
correct...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

#17Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#16)
Re: Interval aggregate regression failure (expected seems

Tom Lane wrote:

I've also confirmed that the problem is in interval_div; you can
reproduce the failure with

select '41 years 1 mon 11 days'::interval / 10;

which should give '4 years 1 mon 9 days 26:24:00', but when
timestamp.o is compiled with "-mcpu=pentium4 -march=pentium4",
you get '4 years 1 mon 10 days 02:24:00'. --enable-integer-datetimes
is not relevant because the interesting part is all double/integer
arithmetic.

Looking at this, though, I wonder if the pentium4 answer isn't "more
correct". If I'm doing the math by hand correctly, what we end up
with is having to cascade 3/10ths of a month down into the days field,
and since the conversion factor is supposed to be 30 days/month, that
should be exactly 9 days. Plus the one full day from the 11/10 days
gives 10 days. I think what is happening on all the non-Pentium
platforms is that (3.0/10.0)*30.0 is producing something just a shade
less than 9.0, whereas the Pentium gives 9.0 or a shade over, possibly
due to rearrangement of the calculation. I think we can still file
this
as a compiler bug, because I'm pretty sure the C spec does not allow
rearrangement of floating-point calculations ... but we might want to
think about tweaking the code's roundoff behavior just a bit.

I took a naive look at this, but have been unable to come up with a
satisfactory solution. Here's an even simpler example that exhibits
the behavior:

# select '41 mon'::interval / 10;
?column?
------------------------
4 mons 2 days 24:00:00

And one using a non-integer divisor:

# select '2 mon'::interval / 1.5;
?column?
-----------------------
1 mon 9 days 24:00:00

Here's the relevant code in interval_div (timestamp.c c2573).
month_remainder holds the fractional part of the months field of the
original interval (41) divided by the divisor (10) as a double.

/* fractional months full days into days */
month_remainder_days = month_remainder * DAYS_PER_MONTH;
result->day += (int32) month_remainder_days;
/* fractional months partial days into time */
day_remainder += month_remainder_days - (int32) month_remainder_days;

#ifdef HAVE_INT64_TIMESTAMP
result->time = rint(span->time / factor + day_remainder *
USECS_PER_DAY);
#else
result->time = span->time / factor + day_remainder * SECS_PER_DAY;
#endif

My understanding is that as month_remainder is a float (as is
month_remainder_days), month_remainder_days may be equal to 24 hours
after rounding. As we're converting from months to days, and from
days to time, rather than from months to time directly, we're
assuming that we should only have time less than 24 hours remaining
in the month_remainder_days when it's added to day_remainder. If
month_remainder_days is equal to 24 hours, we should increment result-

day rather than carrying that down into result->time.

With that in mind, I produced this hack:

#ifdef HAVE_INT64_TIMESTAMP
month_remainder_seconds = month_remainder_day_fraction *
USECS_PER_DAY;

if ( USECS_PER_DAY == rint(month_remainder_seconds) )
result->day++;
else if ( USECS_PER_DAY == - rint(month_remainder_seconds) )
result->day--;
else day_remainder += month_remainder_day_fraction;

result->time = rint(span->time / factor + day_remainder *
USECS_PER_DAY);
#else
month_remainder_seconds = month_remainder_day_fraction *
SECS_PER_DAY;

if ( SECS_PER_DAY == (int32) month_remainder_seconds )
result->day++;
else if ( SECS_PER_DAY == - (int32) month_remainder_seconds)
result->day--;
else day_remainder += month_remainder_day_fraction;

result->time = span->time / factor + day_remainder * SECS_PER_DAY;
#endif

It works okay with --enable-integer-datetimes

postgres=# select '41 mon'::interval/10;
?column?
---------------
4 mons 3 days
(1 row)

postgres=# select '2 mon'::interval/1.5;
?column?
---------------
1 mon 10 days
(1 row)

It also changes the result of the aggregate test for intervals, but I
think that's to be expected.

*** ./expected/interval.out	Tue Mar  7 07:49:17 2006
--- ./results/interval.out	Thu Jun 22 22:52:41 2006
***************
*** 218,224 ****
   select avg(f1) from interval_tbl;
                          avg
   -------------------------------------------------
!  @ 4 years 1 mon 9 days 28 hours 18 mins 23 secs
   (1 row)
   -- test long interval input
--- 218,224 ----
   select avg(f1) from interval_tbl;
                          avg
   -------------------------------------------------
!  @ 4 years 1 mon 10 days 4 hours 18 mins 23 secs
   (1 row)

-- test long interval input

But doesn't work without --enable-integer-datetimes. It still gives
the same answer as before. I don't have a firm grasp of floating
point math so I'm fully aware this might be entirely the wrong way to
go about this. It definitely feels kludgy (especially my sign-
handling), but I submit this in the hope it moves the discussion
forward a little. If someone wanted to point me in the right
direction, I'll try to finish this up, updating the regression test
and adding a few more to test this more thoroughly.

Thanks!

Michael Glaesemann
grzm myrealbox com

#18Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#17)
Re: Interval aggregate regression failure (expected seems

On Jun 23, 2006, at 9:47 , Michael Glaesemann wrote:

It also changes the result of the aggregate test for intervals, but
I think that's to be expected.

My goodness. Of course it changes the aggregate test results. That
was what brought this up in the first place. (*kicks self for not
reading the subject line when responding*)

Michael Glaesemann
grzm seespotcode net

#19Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#17)
Re: Interval aggregate regression failure (expected seems

On Jun 23, 2006, at 9:47 , Michael Glaesemann wrote:

# select '41 mon'::interval / 10;
?column?
------------------------
4 mons 2 days 24:00:00

My understanding is that as month_remainder is a float (as is
month_remainder_days), month_remainder_days may be equal to 24
hours after rounding. As we're converting from months to days, and
from days to time, rather than from months to time directly, we're
assuming that we should only have time less than 24 hours remaining
in the month_remainder_days when it's added to day_remainder.

This behavior is the same as applying justify_hours before adding the
days and time components to the result. With this in mind, I rewrote
interval_div to call interval_justify_hours. Good news is that --
enable-integer-datetimes works as expected. Bad news is that without
--enable-integer-datetimes, still get the current behavior.

I also came across something I think is odd:

# select version();

version
------------------------------------------------------------------------
----------------------------------------------------------------------
PostgreSQL 8.1.4 on powerpc-apple-darwin8.7.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.
build 5341)
(1 row)

select justify_hours(a/10) as divided_and_justified
, justify_hours(b) as justified
, a/10 = b as are_equal
from (select '41 mon'::interval,'4 mons 2 days 24:00:00'::interval)
as s(a,b);

without --enable-integer-datetimes:

divided_and_justified | justified | are_equal
------------------------+---------------+-----------
4 mons 2 days 24:00:00 | 4 mons 3 days | t
(1 row)

with --enable-integer-datetimes:

divided_and_justified | justified | are_equal
-----------------------+---------------+-----------
4 mons 3 days | 4 mons 3 days | t

I think this just confirms that there is some kind of rounding (or
lack of) in interval_div. Kind of frustrating that it's not visible
in the result.

Anyway, there's another data point.

Michael Glaesemann
grzm seespotcode net

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Glaesemann (#19)
Re: Interval aggregate regression failure (expected seems

Michael Glaesemann <grzm@seespotcode.net> writes:

... I think this just confirms that there is some kind of rounding (or
lack of) in interval_div. Kind of frustrating that it's not visible
in the result.

I think the fundamental problem is that the float8 results of division
are inaccurate, and yet we're assuming that we can (for instance) coerce
them to integer and get exactly the right answer. For instance, in the
'41 months'/10 example, I get month_remainder_days being computed as

(gdb) p month_remainder
$19 = 0.099999999999999645
(gdb) s
2575 result->day += (int32) month_remainder_days;
(gdb) p month_remainder_days
$20 = 2.9999999999999893

The only way we can really fix this is to be willing to round off
the numbers, and I think the only principled way to do that is to
settle on a specific target accuracy, probably 1 microsecond.
Then the thing to do would be to scale up all the intermediate
float results to microseconds and apply rint(). Something like
(untested)

month_remainder = rint(span->month * USECS_PER_MONTH / factor);
day_remainder = rint(span->day * USECS_PER_DAY / factor);
result->month = (int32) (month_remainder / USECS_PER_MONTH);
result->day = (int32) (day_remainder / USECS_PER_DAY);
month_remainder -= result->month * USECS_PER_MONTH;
day_remainder -= result->day * USECS_PER_DAY;

/*
* Handle any fractional parts the same way as in interval_mul.
*/

/* fractional months full days into days */
month_remainder_days = month_remainder * DAYS_PER_MONTH;
extra_days = (int32) (month_remainder_days / USECS_PER_DAY);
result->day += extra_days;
/* fractional months partial days into time */
day_remainder += month_remainder_days - extra_days * USECS_PER_DAY;

#ifdef HAVE_INT64_TIMESTAMP
result->time = rint(span->time / factor + day_remainder);
#else
result->time = rint(span->time * 1.0e6 / factor + day_remainder) / 1.0e6;
#endif

This might need a few more rint() calls --- I'm assuming that float ops
with exact integral inputs will be OK, which is an assumption used
pretty widely in the datetime code, but ...

Per the comment, if we do this here we probably want to make
interval_mul work similarly.

regards, tom lane

#21Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#20)
#22Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#20)
#24Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#23)
#25Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#24)
#26Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#25)
#27Bruce Momjian
bruce@momjian.us
In reply to: Michael Glaesemann (#26)
#28Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Michael Glaesemann (#28)
#30Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#29)
#31Bruce Momjian
bruce@momjian.us
In reply to: Michael Glaesemann (#30)
#32Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#31)
#33Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#31)
#34Bruce Momjian
bruce@momjian.us
In reply to: Michael Glaesemann (#32)
#35Bruce Momjian
bruce@momjian.us
In reply to: Michael Glaesemann (#33)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#34)
#37Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#36)
#38Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#37)
#39Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#38)
#40Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#39)
#41Bruce Momjian
bruce@momjian.us
In reply to: Michael Glaesemann (#40)
#42Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#41)
#43Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#42)
#44Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#43)
#45Michael Glaesemann
grzm@seespotcode.net
In reply to: Bruce Momjian (#41)
#46Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#44)
#47Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#45)
#48Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Glaesemann (#45)
#49Michael Glaesemann
grzm@seespotcode.net
In reply to: Michael Glaesemann (#46)