BUG #17329: Aggregate Functions Precision Error

Started by PG Bug reporting formover 4 years ago3 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 17329
Logged by: Max Neverov
Email address: neverov.max@gmail.com
PostgreSQL version: 13.3
Operating system: Alpine 10.3.1_git20210424
Description:

Aggregate functions (described here
https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE)
that are defined for double precision type suffer from loss of
significance.
Corresponding code see
https://github.com/postgres/postgres/blob/49407dc32a2931550e4ff1dea314b6a25afdfc35/src/backend/utils/adt/float.c#L3401.

Consider the following:
drop table if exists test;
create table test(y numeric, x numeric);
insert into test values
(1, 3),
(2, 3.0e+22),
(3, -3);
select covar_pop(y, x) from test;
covar_pop
-------------------
699050.6666666666
(1 row)

truncate table test;

insert into test values
(1, 3),
(3, -3),
(2, 3.0e+22);
select covar_pop(y, x) from test;
covar_pop
-----------
-2
(1 row)

truncate table test;

insert into test values
(2, 3.0e+22),
(3, -3),
(1, 3);
select covar_pop(y, x) from test;
covar_pop
--------------------
-699050.6666666666
(1 row)

The expected result is -2.
The result depends on the order of values although it shouldn't. This
happens because operations with 3.0e+22 lead to the loss of precision since
the type can hold only 15 decimal digits precision.
Even if the functions defined for double precision type I would expect
Postgres either to report an error or to return the correct result.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #17329: Aggregate Functions Precision Error

PG Bug reporting form <noreply@postgresql.org> writes:

Aggregate functions (described here
https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE)
that are defined for double precision type suffer from loss of
significance.

This is pretty much inherent in all uses of float arithmetic.
You might be happier using the numeric type (of course, that's
much slower).

Another possibility, for some aggregates, is to order the inputs
in a way that minimizes error accumulation. For example,

select sum(f1 order by abs(f1)) from ...

I don't know offhand what the best such incantation is for covar_pop;
it might depend on the problem.

regards, tom lane

#3Max Neverov
neverov.max@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #17329: Aggregate Functions Precision Error

You might be happier using the numeric type

Postgres defines aggregate functions for the numeric type only for 6
functions of 18.

Another possibility, for some aggregates, is to order the inputs
in a way that minimizes error accumulation.

That is not always possible. As I understand this thread
https://postgrespro.com/list/thread-id/1858486#CAKJS1f9L95TySOtBf0AgeZhiLf60BcrgXjOA4NtWptLGkNJFZw@mail.gmail.com
,
the parallel calculation for the aggregates was introduced, so the result
depends on the order of float8_regr_combine functions.

BR,
Max

On Thu, Dec 9, 2021 at 8:41 AM Max Neverov <neverov.max@gmail.com> wrote:

Show quoted text

You might be happier using the numeric type

Postgres defines aggregate functions for the numeric type only for 6
functions of 18.

Another possibility, for some aggregates, is to order the inputs
in a way that minimizes error accumulation.

That is not always possible. As I understand this thread
https://postgrespro.com/list/thread-id/1858486#CAKJS1f9L95TySOtBf0AgeZhiLf60BcrgXjOA4NtWptLGkNJFZw@mail.gmail.com
,
the parallel calculation for the aggregates was introduced, so the result
depends on the order of float8_regr_combine functions.

BR,
Max

On Wed, Dec 8, 2021 at 10:45 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

Aggregate functions (described here

https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE
)

that are defined for double precision type suffer from loss of
significance.

This is pretty much inherent in all uses of float arithmetic.
You might be happier using the numeric type (of course, that's
much slower).

Another possibility, for some aggregates, is to order the inputs
in a way that minimizes error accumulation. For example,

select sum(f1 order by abs(f1)) from ...

I don't know offhand what the best such incantation is for covar_pop;
it might depend on the problem.

regards, tom lane