Function trunc() behaves in unexpected manner with different data types

Started by Nathan M. Davalosover 15 years ago9 messagesbugs

Jump to latest

Nathan M. Davalos

n.davalos@sharedmarketing.com

over 15 years ago

I ran into something interesting with using trunc() and different data
types:

The following is a simplified from the statement we're using and
produces the same results:

select trunc( ((cast(2183.68 as numeric) - cast(1 as numeric)) )*100)
/100

yields 2184.68

select trunc(((cast(2183.68 as numeric) - cast(1 as double precision)
))*100) /100

yields 2184.67

select trunc(cast(2184.68 as double precision) *100)

yields 218467 instead of 218468

This only happens on certain ranges of numbers. Doing the same thing
with the number 3183.68 yields the same result in both cases. It only
appears to happen when a number is declared as a double and there is no
number past the last significant digit or the number past the last
significant digit is a zero AND falls within a certain range of numbers.
For instance select trunc(cast(2184.681 as double precision) *100)
yields 218468, but select trunc(cast(2184.680 as double precision) *100)
yields 218467.

I already made sure everything we're using is just defined as numeric to
avoid the issue.

Merlin Moncure

mmoncure@gmail.com

over 15 years ago

In reply to: Nathan M. Davalos (#1)

Re: Function trunc() behaves in unexpected manner with different data types

On Thu, Feb 24, 2011 at 1:01 PM, Nathan M. Davalos
<n.davalos@sharedmarketing.com> wrote:

I ran into something interesting with using trunc() and different data
types:

The following is a simplified from the statement we’re using and produces
the same results:

select trunc( ((cast(2183.68 as numeric) - cast(1 as numeric)) )*100) /100

yields 2184.68

the root issue I think here is that the string version of the double
precision math is approximated:
postgres=# create table v as select floor(2183.68::float8 * 100) as v;
postgres=# select * from v;
v
--------
218367
(1 row)

postgres=# select floor(v) from v;
floor
--------
218367

postgres=# insert into v select 218368;
INSERT 0 1
(1 row)

postgres=# select distinct v from v;
v
--------
218368
218368
(2 rows)

As you can see, even though the string versions are the same, the
internal representation is different. You could dump the data and
restore it and get different results. Also the text/binary protocols
would send different data to the client. I don't know if this is a
bug in postgresql floating point implementation or not: i think the
backend would either have to print 218367.999999999999999ish number or
spend the time to look for these cases and round them internally.
Floating point is a headache :-).

merlin

Bruce Momjian

bruce@momjian.us

over 15 years ago

In reply to: Merlin Moncure (#2)

Re: Function trunc() behaves in unexpected manner with different data types

On Thu, Feb 24, 2011 at 7:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

the root issue I think here is that the string version of the double
precision math is approximated:

No, it's simpler than that, all double precision math is approximated.
The root issue is that 2183.67 is not representable in a floating
point binary number. Just like 1/3 isn't representable in base 10
(decimal) numbers many fractions aren't representable in base 2
(binary) numbers. The result are repeated decimals like 0.3333... if you
multiply that by three you get 0.99999 and if you truncate that you
get 0 insted of 1.

It's the trunc() that's exposing the imprecision because like "=" it
depends on the precise value of the number down to the last digit.
Though depending on the arithmetic you can always make the precision
expand beyond the last digit anyways -- when you multiply by 100 you
magnify that imprecision too.

--
greg

Merlin Moncure

mmoncure@gmail.com

over 15 years ago

In reply to: Bruce Momjian (#3)

Re: Function trunc() behaves in unexpected manner with different data types

On Thu, Feb 24, 2011 at 8:03 PM, Greg Stark <gsstark@mit.edu> wrote:

On Thu, Feb 24, 2011 at 7:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

the root issue I think here is that the string version of the double
precision math is approximated:

No, it's simpler than that, all double precision math is approximated.
The root issue is that 2183.67 is not representable in a floating
point binary number. Just like 1/3 isn't representable in base 10
(decimal) numbers many fractions aren't representable in base 2
(binary) numbers. The result are repeated decimals like 0.3333... if you
multiply that by three you get 0.99999 and if you truncate that you
get 0 insted of 1.

It's the trunc() that's exposing the imprecision because like "=" it
depends on the precise value of the number down to the last digit.
Though depending on the arithmetic you can always make the precision
expand beyond the last digit anyways -- when you multiply by 100 you
magnify that imprecision too.

right -- in understand how floating point works -- but are you are
saying that you are ok with the fact that (for example) a table with a
floating point unique key could dump and not restore? more
specifically, a binary dump would restore but a text dump would not.
I think this is a problem with our implementation -- not all versions
of 2183.68 as outputted from the server are the same internally.

put another way, text output from the server should unambiguously
match what sourced the text. in the case of floating point, it does
not...there are N versions of internal data that can match particular
text output. I am speculating that the rounding is happening in the
wrong place maybe.

merlin

Tom Lane

tgl@sss.pgh.pa.us

over 15 years ago

In reply to: Merlin Moncure (#4)

Re: Function trunc() behaves in unexpected manner with different data types

Merlin Moncure <mmoncure@gmail.com> writes:

right -- in understand how floating point works -- but are you are
saying that you are ok with the fact that (for example) a table with a
floating point unique key could dump and not restore? more
specifically, a binary dump would restore but a text dump would not.

pg_dump takes measures against that (see extra_float_digits).

I think this is a problem with our implementation -- not all versions
of 2183.68 as outputted from the server are the same internally.

It's an inherent property of float math. Yes, we could set the default
value of extra_float_digits high enough that distinct internal values
always had distinct text representations, but trust me, you would not
like it.

regards, tom lane

Merlin Moncure

mmoncure@gmail.com

over 15 years ago

In reply to: Tom Lane (#5)

Re: Function trunc() behaves in unexpected manner with different data types

On Fri, Feb 25, 2011 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

right -- in understand how floating point works -- but are you are
saying that you are ok with the fact that (for example) a table with a
floating point unique key could dump and not restore? more
specifically, a binary dump would restore but a text dump would not.

pg_dump takes measures against that (see extra_float_digits).

I think this is a problem with our implementation -- not all versions
of 2183.68 as outputted from the server are the same internally.

It's an inherent property of float math. Yes, we could set the default
value of extra_float_digits high enough that distinct internal values
always had distinct text representations, but trust me, you would not
like it.

no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).

update foo set val=val + 1 where val = 2183.68;

merlin

Alvaro Herrera

alvherre@2ndquadrant.com

over 15 years ago

In reply to: Merlin Moncure (#6)

Re: Function trunc() behaves in unexpected manner with different data types

Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:

no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).

update foo set val=val + 1 where val = 2183.68;

I think the mere idea of using floating point equality on a
WHERE clause is bogus, regardless of text or binary format.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Merlin Moncure

mmoncure@gmail.com

over 15 years ago

In reply to: Alvaro Herrera (#7)

Re: Function trunc() behaves in unexpected manner with different data types

On Fri, Feb 25, 2011 at 9:40 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:

no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).

update foo set val=val + 1 where val = 2183.68;

I think the mere idea of using floating point equality on a
WHERE clause is bogus, regardless of text or binary format.

That's a bridge to far -- akin to saying floating point should not
support equality operator. select count(*) from foo where val >=
2183.68? you are ok getting different answers depending on method of
transmission of 2183.68 to the server?

merlin

Merlin Moncure

mmoncure@gmail.com

over 15 years ago

In reply to: Merlin Moncure (#8)

Re: Function trunc() behaves in unexpected manner with different data types

On Fri, Feb 25, 2011 at 9:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Feb 25, 2011 at 9:40 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:

no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).

update foo set val=val + 1 where val = 2183.68;

I think the mere idea of using floating point equality on a
WHERE clause is bogus, regardless of text or binary format.

That's a bridge to[sic] far -- akin to saying floating point should not
support equality operator. select count(*) from foo where val >=
2183.68? you are ok getting different answers depending on method of
transmission of 2183.68 to the server?

I stand corrected -- I did some digging and Postgres's handling of
this issue is afaict correct: you are supposed to round on
presentation only, and equality matching on floating point in sql
(just like in C) is capricious exercise at best, at least without some
defenses. So, we can definitely file under 'not a bug'.

merlin