Followup on the UnixWare Optimizer bug.

Started by Larry Rosenmanover 20 years ago6 messages
#1Larry Rosenman
ler@lerctr.org

The following is from my SCO Internal contact about the bug. It's
definitely their bug. Towards the end of the
Exact diagnosis, is a suggested work-around for now, as well as a (possible)
memory leak.

-----
Dave and I were convinced that the CSE optimization was correct and
manufactured data that we pushed through the interval_div() functions with
and without the CSE optimization was producing the same results.

It was only when we identified the exact input values from the interval
regression test were we able to see the problem

With span->month = 493
span->day = 11
factor = 10;

The CSE optimization problemd occures with the following code from
interval_div() and I am assuming also in interval_mul()

/* Cascade fractions to lower units */
/* fractional months full days into days */
result->day += month_remainder * DAYS_PER_MONTH;
/* fractional months partial days into time */
day_remainder += (month_remainder * DAYS_PER_MONTH) -
(int)(month_remainder *
DAYS_PER_MONTH);

At the point of failure:

month_remainder = 49.3 - 49 = .3 (INEXACT) and represented in the
80 bit FP register as .29999999999999999

month_remainder * 30 = 8.99999999999999997 (also inexact)

That results in result->day = 8 + 1 = 9, but day_remainder is
.999999999999997 + the .1 left from the earlier division.

The later call to interval_justify_hours subtracts the 1 days worth of
seconds from time ( day_mainder * SECS_PER_DAY + time portion)......
and bumps result->day by 1 ==> 10.

The FAILURE is because the compiler is trying to reduce the 3 FP multiples
to 1 multiply; using the value 3 times. The problem occurs because the
8.999999999999997 is inexact and when the CSE values is stored as a
"double", it is rounded to 9.0.

On the 2nd and 3rd uses of the value, the 9.0 (64-bit FP data) is used; but
the 1st use is still using the 80-bit (8.9999999999999) value. So the inter
truncation still gets 8, but the calculation day_remainder now is (9.0 - 9)
+ .1 (previous day_remainder contents).

Our bug is that either:

- the CSE value should have been preserved as an 80-bit long double
since that is how the internal calculations are being done.....

This is the "correct" fix and will take us some time to make
certain that we haven't broken anything.

- the CSE value have been rounded (to 64-bit precision) before use
in all 3 points in the code.

This would have resulted in result->day = 9 + 1 = 10 and the
resuilt->time would have been correctly less than 1 day. The
interval_justify_hours() would make not adjustments and result
would be identical - as expected.

As I said, this will take us some time to work up the fix and revalidate the
compiler. Since you have release coming up, I want to suggest the follow
work-around for a Common Subexpression Elimination (CSE) bug in "some"
compiler.......

For both interval_div() and interval_mul()

double CSE;

/* Cascade fractions to lower units */
/* fractional months full days into days */
CSE = month_remainder * DAYS_PER_MONTH;
result->day += CSE;
/* fractional months partial days into time */
day_remainder += (CSE) - (int)(CSE);

Also note that there appears to be a memory leak in the interval_****
routines. For example interval_div() allocates a "result" Interval.
It eventually passes this result through to interval_justify_hours() which
allocates another Interval "result" and that "result" is what gets passed
back to caller on interval_div(). The 1st Interval allocated appears to be
left around.......
------

I will get a pre-release copy of the compiler to test, but it will take a
while, since they have to revalidate it.

Comments?

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: ler@lerctr.org
US Mail: 3535 Gaspar Drive, Dallas, TX 75220-3611 US

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Larry Rosenman (#1)
1 attachment(s)
Re: [HACKERS] Followup on the UnixWare Optimizer bug.

Larry Rosenman wrote:

The following is from my SCO Internal contact about the bug. It's
definitely their bug. Towards the end of the
Exact diagnosis, is a suggested work-around for now, as well as a (possible)
memory leak.

...

Also note that there appears to be a memory leak in the interval_****
routines. For example interval_div() allocates a "result" Interval.
It eventually passes this result through to interval_justify_hours() which
allocates another Interval "result" and that "result" is what gets passed
back to caller on interval_div(). The 1st Interval allocated appears to be
left around.......

Good catch on the memory leak. I have applied the following fix.
Thanks to SCO for the report.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Attachments:

/bjm/difftext/plainDownload
Index: doc/src/sgml/func.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.282
diff -c -c -r1.282 func.sgml
*** doc/src/sgml/func.sgml	24 Aug 2005 20:49:35 -0000	1.282
--- doc/src/sgml/func.sgml	25 Aug 2005 01:26:36 -0000
***************
*** 5189,5194 ****
--- 5189,5200 ----
      </table>
  
     <para>
+     If you are using both <function>justify_hours</> and <function>justify_days</>,
+     it is best to use <function>justify_hours</> first so any additional days will
+     justified by <function>justify_days</>.
+    </para>
+ 
+    <para>
      In addition to these functions, the SQL <literal>OVERLAPS</> operator is
      supported:
  <synopsis>
Index: src/backend/utils/adt/timestamp.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/timestamp.c,v
retrieving revision 1.148
diff -c -c -r1.148 timestamp.c
*** src/backend/utils/adt/timestamp.c	12 Aug 2005 18:23:54 -0000	1.148
--- src/backend/utils/adt/timestamp.c	25 Aug 2005 01:26:39 -0000
***************
*** 1892,1898 ****
  {
  	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
  	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
! 	Interval   *result;
  
  	result = (Interval *) palloc(sizeof(Interval));
  
--- 1892,1898 ----
  {
  	Timestamp	dt1 = PG_GETARG_TIMESTAMP(0);
  	Timestamp	dt2 = PG_GETARG_TIMESTAMP(1);
! 	Interval   *result, *result2;
  
  	result = (Interval *) palloc(sizeof(Interval));
  
***************
*** 1914,1922 ****
  	result->month = 0;
  	result->day = 0;
  
! 	result = DatumGetIntervalP(DirectFunctionCall1(interval_justify_hours,
  												IntervalPGetDatum(result)));
! 	PG_RETURN_INTERVAL_P(result);
  }
  
  /*	interval_justify_hours()
--- 1914,1923 ----
  	result->month = 0;
  	result->day = 0;
  
! 	result2 = DatumGetIntervalP(DirectFunctionCall1(interval_justify_hours,
  												IntervalPGetDatum(result)));
! 	pfree(result);
! 	PG_RETURN_INTERVAL_P(result2);
  }
  
  /*	interval_justify_hours()
***************
*** 2263,2269 ****
  	Interval   *span = PG_GETARG_INTERVAL_P(0);
  	float8		factor = PG_GETARG_FLOAT8(1);
  	double		month_remainder, day_remainder;
! 	Interval   *result;
  
  	result = (Interval *) palloc(sizeof(Interval));
  
--- 2264,2270 ----
  	Interval   *span = PG_GETARG_INTERVAL_P(0);
  	float8		factor = PG_GETARG_FLOAT8(1);
  	double		month_remainder, day_remainder;
! 	Interval   *result, *result2;
  
  	result = (Interval *) palloc(sizeof(Interval));
  
***************
*** 2289,2297 ****
  					day_remainder * SECS_PER_DAY);
  #endif
  
! 	result = DatumGetIntervalP(DirectFunctionCall1(interval_justify_hours,
  												IntervalPGetDatum(result)));
! 	PG_RETURN_INTERVAL_P(result);
  }
  
  Datum
--- 2290,2299 ----
  					day_remainder * SECS_PER_DAY);
  #endif
  
! 	result2 = DatumGetIntervalP(DirectFunctionCall1(interval_justify_hours,
  												IntervalPGetDatum(result)));
! 	pfree(result);
! 	PG_RETURN_INTERVAL_P(result2);
  }
  
  Datum
***************
*** 2310,2316 ****
  	Interval   *span = PG_GETARG_INTERVAL_P(0);
  	float8		factor = PG_GETARG_FLOAT8(1);
  	double		month_remainder, day_remainder;
! 	Interval   *result;
  
  	result = (Interval *) palloc(sizeof(Interval));
  
--- 2312,2318 ----
  	Interval   *span = PG_GETARG_INTERVAL_P(0);
  	float8		factor = PG_GETARG_FLOAT8(1);
  	double		month_remainder, day_remainder;
! 	Interval   *result, *result2;
  
  	result = (Interval *) palloc(sizeof(Interval));
  
***************
*** 2341,2349 ****
  	result->time = JROUND(result->time);
  #endif
  
! 	result = DatumGetIntervalP(DirectFunctionCall1(interval_justify_hours,
  												IntervalPGetDatum(result)));
! 	PG_RETURN_INTERVAL_P(result);
  }
  
  /*
--- 2343,2352 ----
  	result->time = JROUND(result->time);
  #endif
  
! 	result2 = DatumGetIntervalP(DirectFunctionCall1(interval_justify_hours,
  												IntervalPGetDatum(result)));
! 	pfree(result);
! 	PG_RETURN_INTERVAL_P(result2);
  }
  
  /*
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#2)
Re: [HACKERS] Followup on the UnixWare Optimizer bug.

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Good catch on the memory leak. I have applied the following fix.

These explicit pfrees are a waste of time, and probably actually
counterproductive as far as speed goes, because these functions
will always be invoked in relatively short-lived memory contexts.

I wouldn't object except that they seem to make the code noticeably
more obscure.

regards, tom lane

#4Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#3)
Re: [HACKERS] Followup on the UnixWare Optimizer bug.

Tom Lane wrote:

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Good catch on the memory leak. I have applied the following fix.

These explicit pfrees are a waste of time, and probably actually
counterproductive as far as speed goes, because these functions
will always be invoked in relatively short-lived memory contexts.

I wouldn't object except that they seem to make the code noticeably
more obscure.

OK, patch reverted. I am often unsure where we do quick cleanup and
were we don't.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#1)
Re: Followup on the UnixWare Optimizer bug.

"Larry Rosenman" <ler@lerctr.org> forwards:

Also note that there appears to be a memory leak in the interval_****
routines. For example interval_div() allocates a "result" Interval.
It eventually passes this result through to interval_justify_hours() which
allocates another Interval "result" and that "result" is what gets passed
back to caller on interval_div(). The 1st Interval allocated appears to be
left around.......

Just to clarify my discussion with Bruce: this is a leak from the point
of view of these specific routines, but we do not care, because the
memory is leaked in a short-lived palloc context that will soon be reset
("soon" being "the next tuple cycle" in most cases). We rely on this
behavior in a lot of places --- for example, detoasting a toasted input
datum leaks memory that in most places isn't explicitly cleaned up.
Cleaning up just a dozen or so bytes is really not worth the cycles
needed to do it. Despite that, I wouldn't have objected to Bruce's
patch if it hadn't made the code noticeably more obscure.

As a general rule, datatype-specific functions only need to be paranoid
about not leaking memory if they may be invoked by index operations;
btree indexes, at least, call such functions in a query-lifespan memory
context. (I think GIST was recently fixed to not do this, but btree
still does.)

regards, tom lane

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Larry Rosenman (#1)
Re: Followup on the UnixWare Optimizer bug.

"Larry Rosenman" <ler@lerctr.org> forwards:

As I said, this will take us some time to work up the fix and revalidate the
compiler. Since you have release coming up, I want to suggest the follow
work-around for a Common Subexpression Elimination (CSE) bug in "some"
compiler.......

Done. I think the code is more legible this way anyway.

regards, tom lane