6.5.0 - Overflow bug in AVG( )

Started by Gene Sokolovover 26 years ago16 messages

hook@aktrad.ru

over 26 years ago

% psql test1
Welcome to the POSTGRESQL interactive sql monitor:
Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]

type \? for help on slash commands
type \q to quit
type \g or terminate with semicolon to execute query
You are currently connected to the database: test1

test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
count| max| min| avg
------+-------+-----+----
677719|3075717|61854|-251
(1 row)

Overflow, perhaps?

Gene Sokolov.

Thomas Lockhart

lockhart@alumni.caltech.edu

over 26 years ago

In reply to: Gene Sokolov (#1)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

[PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
count| max| min| avg
------+-------+-----+----
677719|3075717|61854|-251
(1 row)
Overflow, perhaps?

Of course. These are integer fields? I've been considering changing
all accumulators (and results) for integer aggregate functions to
float8, but have not done so yet. I was sort of waiting for a v7.0
release, but am not sure why...

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

Gene Sokolov

hook@aktrad.ru

over 26 years ago

In reply to: Gene Sokolov (#1)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

[PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
test1=> select count(*), max("ID"), min("ID"), avg("ID") from

"ItemsBars";

count| max| min| avg
------+-------+-----+----
677719|3075717|61854|-251
(1 row)
Overflow, perhaps?

Of course. These are integer fields? I've been considering changing

Yes, the fields are int4

all accumulators (and results) for integer aggregate functions to
float8, but have not done so yet. I was sort of waiting for a v7.0
release, but am not sure why...

Float8 accumulator seems to be a good solution if AVG is limited to
int/float types. I wonder if it could produce system dependency in AVG due
to rounding errors. Some broader solution should be considered though if you
want AVG to work on numeric/decimal as well.

Gene Sokolov.

Thomas Lockhart

lockhart@alumni.caltech.edu

over 26 years ago

In reply to: Gene Sokolov (#1)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

Float8 accumulator seems to be a good solution if AVG is limited to
int/float types. I wonder if it could produce system dependency in AVG due
to rounding errors. Some broader solution should be considered though if you
want AVG to work on numeric/decimal as well.

The implementation can be specified for each datatype individually, so
that's not a problem. afaik the way numeric/decimal work it would be
fine to use those types as their own accumulators. It's mostly the
int2/int4/int8 types which are the problem, since they silently
overflow (on most machines?).

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

Noname

wieck@debis.com

over 26 years ago

In reply to: Thomas Lockhart (#2)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

Thomas Lockhart wrote:

[PostgreSQL 6.5.0 on i386-unknown-freebsd3.2, compiled by gcc 2.7.2.1]
test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
count| max| min| avg
------+-------+-----+----
677719|3075717|61854|-251
(1 row)
Overflow, perhaps?

Of course. These are integer fields? I've been considering changing
all accumulators (and results) for integer aggregate functions to
float8, but have not done so yet. I was sort of waiting for a v7.0
release, but am not sure why...

Wouldn't it be better to use NUMERIC for the avg(int) state
values? It will never loose any significant digit.

Jan

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#========================================= wieck@debis.com (Jan Wieck) #

Thomas Lockhart

lockhart@alumni.caltech.edu

over 26 years ago

In reply to: Noname (#5)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

Of course. These are integer fields? I've been considering changing
all accumulators (and results) for integer aggregate functions to
float8, but have not done so yet. I was sort of waiting for a v7.0
release, but am not sure why...

Wouldn't it be better to use NUMERIC for the avg(int) state
values? It will never loose any significant digit.

Sure. It would be fast, right? avg(int) is likely to be used a lot,
and should be as fast as possible.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Thomas Lockhart (#6)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

"Gene Sokolov" <hook@aktrad.ru> writes:

test1=> select count(*), max("ID"), min("ID"), avg("ID") from "ItemsBars";
count| max| min| avg
------+-------+-----+----
677719|3075717|61854|-251

Overflow, perhaps?

sum() and avg() for int fields use int accumulators. You might want
to use avg(float8(field)) to get a less-likely-to-overflow result.

Someday it'd be a good idea to revise the sum() and avg() aggregates
to use float or numeric accumulators in all cases. This'd require
inventing a few more cross-data-type operators...

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed16Jun1999104139+040018a101beb7c34a0aa5700d8cdac3@aktrad.ru | Resolved by subject fallback

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Tom Lane (#7)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

Some broader solution should be considered though if you
want AVG to work on numeric/decimal as well.

The implementation can be specified for each datatype individually,

In the current implementation, each datatype does use its own type as
the accumulator --- and also as the counter. float8 and numeric are
fine, float4 is sort of OK (a float8 accumulator would be better for
accuracy reasons), int4 loses, int2 loses *bad*.

To fix it we'd need to invent operators that do the appropriate cross-
data-type operations. For example, int4 avg using float8 accumulator
would need "float8 + int4 yielding float8" and "float8 / int4 yielding
int4", neither of which are to be found in pg_proc at the moment. But
it's a straightforward thing to do.

int8 is the only integer type that I wouldn't want to use a float8
accumulator for. Maybe numeric would be the appropriate thing here,
slow though it be.

Note that switching over to float accumulation would *not* be real
palatable until we have fixed the memory-leak issue. avg() on int4
doesn't leak memory currently, but it would with a float accumulator...

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed16Jun1999140338+00003767AEBA.E7C006B9@alumni.caltech.edu | Resolved by subject fallback

Jackson, DeJuan

djackson@cpsgroup.com

over 26 years ago

In reply to: Tom Lane (#8)

RE: [HACKERS] 6.5.0 - Overflow bug in AVG( )

What does the spec have to say? It bothers me somewhat that an AVG is
expected to return an integer result at all. Isn't the Average of 1 and 2,
1.5 not 1?

just my $0.02,
-DEJ

Show quoted text

-----Original Message-----
From: Tom Lane [SMTP:tgl@sss.pgh.pa.us]
Sent: Wednesday, June 16, 1999 9:52 AM
To: Thomas Lockhart
Cc: Gene Sokolov; pgsql-hackers@postgreSQL.org
Subject: Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

Some broader solution should be considered though if you
want AVG to work on numeric/decimal as well.

The implementation can be specified for each datatype individually,

In the current implementation, each datatype does use its own type as
the accumulator --- and also as the counter. float8 and numeric are
fine, float4 is sort of OK (a float8 accumulator would be better for
accuracy reasons), int4 loses, int2 loses *bad*.

To fix it we'd need to invent operators that do the appropriate cross-
data-type operations. For example, int4 avg using float8 accumulator
would need "float8 + int4 yielding float8" and "float8 / int4 yielding
int4", neither of which are to be found in pg_proc at the moment. But
it's a straightforward thing to do.

int8 is the only integer type that I wouldn't want to use a float8
accumulator for. Maybe numeric would be the appropriate thing here,
slow though it be.

Note that switching over to float accumulation would *not* be real
palatable until we have fixed the memory-leak issue. avg() on int4
doesn't leak memory currently, but it would with a float accumulator...

regards, tom lane

Import Notes

Resolved by subject fallback

#10

Thomas Lockhart

lockhart@alumni.caltech.edu

over 26 years ago

In reply to: Jackson, DeJuan (#9)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

What does the spec have to say? It bothers me somewhat that an AVG is
expected to return an integer result at all. Isn't the Average of 1 and 2,
1.5 not 1?

Yeah, well, it's a holdover from the original Postgres code. We just
haven't made an effort to change it yet, but it seems a good candidate
for a makeover, no?

I'm pretty sure that the spec would suggest a float8 return value for
avg(int), but I haven't looked recently to refresh my memory.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#11

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Thomas Lockhart (#10)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

"Jackson, DeJuan" <djackson@cpsgroup.com> writes:

What does the spec have to say? It bothers me somewhat that an AVG is
expected to return an integer result at all. Isn't the Average of 1
and 2, 1.5 not 1?

That bothered me too. The draft spec that I have sez:

b) If SUM is specified and DT is exact numeric with scale
S, then the data type of the result is exact numeric with
implementation-defined precision and scale S.

c) If AVG is specified and DT is exact numeric, then the data
type of the result is exact numeric with implementation-
defined precision not less than the precision of DT and
implementation-defined scale not less than the scale of DT.

d) If DT is approximate numeric, then the data type of the
result is approximate numeric with implementation-defined
precision not less than the precision of DT.

65)Subclause 6.5, "<set function specification>": The precision of
the value derived from application of the SUM function to a data
type of exact numeric is implementation-defined.

66)Subclause 6.5, "<set function specification>": The precision and
scale of the value derived from application of the AVG function
to a data type of exact numeric is implementation-defined.

67)Subclause 6.5, "<set function specification>": The preci-
sion of the value derived from application of the SUM func-
tion or AVG function to a data type of approximate numeric is
implementation-defined.

This would seem to give license for the result of AVG() on an int4 field
to be NUMERIC with a fraction part, but not FLOAT. But I suspect we
could get away with making it be FLOAT anyway. Anyone know what other
databases do?

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed16Jun1999102128-0500D05EF808F2DFD211AE4A00105AA1B5D22656AC@cpsmail | Resolved by subject fallback

#12

Thomas Lockhart

lockhart@alumni.caltech.edu

over 26 years ago

In reply to: Tom Lane (#11)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

This would seem to give license for the result of AVG() on an int4 field
to be NUMERIC with a fraction part, but not FLOAT. But I suspect we
could get away with making it be FLOAT anyway.

Sure, that can't be worse in practice than what we do now. But it is
interesting that we are currently SQL92 conforming (except for that
nasty overflow business; they probably don't mention that ;).

For int2/int4, we could bump the accumulator to int8 (certainly faster
than our numeric implementation?), but there are a very few platforms
which don't support int8 and we shouldn't break the aggregates for
them. We could get around that by defining explicit routines to be
used in the aggregates, and then having some #ifdef alternate code if
int8 is not available...

Tom, do you think that a hack in the aggregate support code which
compares the pointer returned to the pointer input, then pfree'ing the
input area if they differ, would fix the major leakage? We could even
have a backend global variable which enables/disables the feature to
allow performance tuning.

- Thomas

--
Thomas Lockhart lockhart@alumni.caltech.edu
South Pasadena, California

#13

Tom Lane

tgl@sss.pgh.pa.us

over 26 years ago

In reply to: Thomas Lockhart (#12)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:

For int2/int4, we could bump the accumulator to int8 (certainly faster
than our numeric implementation?), but there are a very few platforms
which don't support int8 and we shouldn't break the aggregates for
them.

Right, that's why I preferred the idea of using float8.

Note that any reasonable floating-point implementation will deliver an
exact result for the sum of integer inputs, up to the point at which the
sum exceeds the number of mantissa bits in a float (2^52 or so in IEEE
float8). After that you start to lose accuracy. Using int8 would give
an exact sum up to 2^63, but if we want to start delivering a fractional
average then float still looks like a better deal...

Tom, do you think that a hack in the aggregate support code which
compares the pointer returned to the pointer input, then pfree'ing the
input area if they differ, would fix the major leakage?

Yeah, that would probably work OK, although you'd have to be careful of
the initial condition --- is the initial value always safely pfreeable?

We could even have a backend global variable which enables/disables
the feature to allow performance tuning.

Seems unnecessary.

regards, tom lane

Import Notes

Reply to msg id not found: YourmessageofWed16Jun1999154704+00003767C6F8.2083C35A@alumni.caltech.edu | Resolved by subject fallback

#14

José Soares

jose@sferacarta.com

over 26 years ago

In reply to: Tom Lane (#11)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

PostgreSQL:
^^^^^^^^^^^
prova=> select min(a), max(a), avg(a) from aa;
min|max|avg
---+---+---
1| 2| 1
(1 row)

informix:----------- hygea@hygea ------------ Press CTRL-W for Help --------

^^^^^^^^^
(min) (max) (avg)

1 2 1.50000000000000

oracle:
^^^^^^^
SQL> select min(a), max(a), avg(a) from aa;

MIN(A) MAX(A) AVG(A)
---------- ---------- ----------
1 2 1.5

Tom Lane ha scritto:

"Jackson, DeJuan" <djackson@cpsgroup.com> writes:

What does the spec have to say? It bothers me somewhat that an AVG is
expected to return an integer result at all. Isn't the Average of 1
and 2, 1.5 not 1?

That bothered me too. The draft spec that I have sez:

b) If SUM is specified and DT is exact numeric with scale
S, then the data type of the result is exact numeric with
implementation-defined precision and scale S.

c) If AVG is specified and DT is exact numeric, then the data
type of the result is exact numeric with implementation-
defined precision not less than the precision of DT and
implementation-defined scale not less than the scale of DT.

d) If DT is approximate numeric, then the data type of the
result is approximate numeric with implementation-defined
precision not less than the precision of DT.

65)Subclause 6.5, "<set function specification>": The precision of
the value derived from application of the SUM function to a data
type of exact numeric is implementation-defined.

66)Subclause 6.5, "<set function specification>": The precision and
scale of the value derived from application of the AVG function
to a data type of exact numeric is implementation-defined.

67)Subclause 6.5, "<set function specification>": The preci-
sion of the value derived from application of the SUM func-
tion or AVG function to a data type of approximate numeric is
implementation-defined.

This would seem to give license for the result of AVG() on an int4 field
to be NUMERIC with a fraction part, but not FLOAT. But I suspect we
could get away with making it be FLOAT anyway. Anyone know what other
databases do?

regards, tom lane

______________________________________________________________
PostgreSQL 6.5.0 on i586-pc-linux-gnu, compiled by gcc 2.7.2.3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jose'

#15

Noname

wieck@debis.com

over 26 years ago

In reply to: Thomas Lockhart (#6)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

Thomas Lockhart wrote:

Of course. These are integer fields? I've been considering changing
all accumulators (and results) for integer aggregate functions to
float8, but have not done so yet. I was sort of waiting for a v7.0
release, but am not sure why...

Wouldn't it be better to use NUMERIC for the avg(int) state
values? It will never loose any significant digit.

Sure. It would be fast, right? avg(int) is likely to be used a lot,
and should be as fast as possible.

I think it would be fast enough, even if I have things in
mind how to speed it up. But that would result in a total
rewrite of NUMERIC from scratch.

The only math function of NUMERIC which is time critical for
AVG() is ADD. And even for int8 the number of digits it has
to perform is relatively small. I expect the time spent on
that is negligible compared to the heap scanning required to
get all the values.

Jan

#16

Vince Vielhaber

vev@michvhf.com

over 26 years ago

In reply to: José Soares (#14)

Re: [HACKERS] 6.5.0 - Overflow bug in AVG( )

On Wed, 16 Jun 1999, [iso-8859-1] Jos�� Soares wrote:

PostgreSQL:
^^^^^^^^^^^
prova=> select min(a), max(a), avg(a) from aa;
min|max|avg
---+---+---
1| 2| 1
(1 row)

Sybase - I'm guessing/ass-u-me ing it's around version 4.9

1> select min(a), max(a), avg(a) from aa
2> go

----------- ----------- -----------
1 2 1

(1 row affected)
1>

This would seem to give license for the result of AVG() on an int4 field
to be NUMERIC with a fraction part, but not FLOAT. But I suspect we
could get away with making it be FLOAT anyway. Anyone know what other
databases do?

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com flame-mail: /dev/null
# include <std/disclaimers.h> TEAM-OS2
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================