Function trunc() behaves in unexpected manner with different data types
I ran into something interesting with using trunc() and different data
types:
The following is a simplified from the statement we're using and
produces the same results:
select trunc( ((cast(2183.68 as numeric) - cast(1 as numeric)) )*100)
/100
yields 2184.68
select trunc(((cast(2183.68 as numeric) - cast(1 as double precision)
))*100) /100
yields 2184.67
select trunc(cast(2184.68 as double precision) *100)
yields 218467 instead of 218468
This only happens on certain ranges of numbers. Doing the same thing
with the number 3183.68 yields the same result in both cases. It only
appears to happen when a number is declared as a double and there is no
number past the last significant digit or the number past the last
significant digit is a zero AND falls within a certain range of numbers.
For instance select trunc(cast(2184.681 as double precision) *100)
yields 218468, but select trunc(cast(2184.680 as double precision) *100)
yields 218467.
I already made sure everything we're using is just defined as numeric to
avoid the issue.
On Thu, Feb 24, 2011 at 1:01 PM, Nathan M. Davalos
<n.davalos@sharedmarketing.com> wrote:
I ran into something interesting with using trunc() and different data
types:The following is a simplified from the statement we’re using and produces
the same results:select trunc( ((cast(2183.68 as numeric) - cast(1 as numeric)) )*100) /100
yields 2184.68
the root issue I think here is that the string version of the double
precision math is approximated:
postgres=# create table v as select floor(2183.68::float8 * 100) as v;
postgres=# select * from v;
v
--------
218367
(1 row)
postgres=# select floor(v) from v;
floor
--------
218367
postgres=# insert into v select 218368;
INSERT 0 1
(1 row)
postgres=# select distinct v from v;
v
--------
218368
218368
(2 rows)
As you can see, even though the string versions are the same, the
internal representation is different. You could dump the data and
restore it and get different results. Also the text/binary protocols
would send different data to the client. I don't know if this is a
bug in postgresql floating point implementation or not: i think the
backend would either have to print 218367.999999999999999ish number or
spend the time to look for these cases and round them internally.
Floating point is a headache :-).
merlin
On Thu, Feb 24, 2011 at 7:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
the root issue I think here is that the string version of the double
precision math is approximated:
No, it's simpler than that, all double precision math is approximated.
The root issue is that 2183.67 is not representable in a floating
point binary number. Just like 1/3 isn't representable in base 10
(decimal) numbers many fractions aren't representable in base 2
(binary) numbers. The result are repeated decimals like 0.3333... if you
multiply that by three you get 0.99999 and if you truncate that you
get 0 insted of 1.
It's the trunc() that's exposing the imprecision because like "=" it
depends on the precise value of the number down to the last digit.
Though depending on the arithmetic you can always make the precision
expand beyond the last digit anyways -- when you multiply by 100 you
magnify that imprecision too.
--
greg
On Thu, Feb 24, 2011 at 8:03 PM, Greg Stark <gsstark@mit.edu> wrote:
On Thu, Feb 24, 2011 at 7:31 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
the root issue I think here is that the string version of the double
precision math is approximated:No, it's simpler than that, all double precision math is approximated.
The root issue is that 2183.67 is not representable in a floating
point binary number. Just like 1/3 isn't representable in base 10
(decimal) numbers many fractions aren't representable in base 2
(binary) numbers. The result are repeated decimals like 0.3333... if you
multiply that by three you get 0.99999 and if you truncate that you
get 0 insted of 1.It's the trunc() that's exposing the imprecision because like "=" it
depends on the precise value of the number down to the last digit.
Though depending on the arithmetic you can always make the precision
expand beyond the last digit anyways -- when you multiply by 100 you
magnify that imprecision too.
right -- in understand how floating point works -- but are you are
saying that you are ok with the fact that (for example) a table with a
floating point unique key could dump and not restore? more
specifically, a binary dump would restore but a text dump would not.
I think this is a problem with our implementation -- not all versions
of 2183.68 as outputted from the server are the same internally.
put another way, text output from the server should unambiguously
match what sourced the text. in the case of floating point, it does
not...there are N versions of internal data that can match particular
text output. I am speculating that the rounding is happening in the
wrong place maybe.
merlin
Merlin Moncure <mmoncure@gmail.com> writes:
right -- in understand how floating point works -- but are you are
saying that you are ok with the fact that (for example) a table with a
floating point unique key could dump and not restore? more
specifically, a binary dump would restore but a text dump would not.
pg_dump takes measures against that (see extra_float_digits).
I think this is a problem with our implementation -- not all versions
of 2183.68 as outputted from the server are the same internally.
It's an inherent property of float math. Yes, we could set the default
value of extra_float_digits high enough that distinct internal values
always had distinct text representations, but trust me, you would not
like it.
regards, tom lane
On Fri, Feb 25, 2011 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Merlin Moncure <mmoncure@gmail.com> writes:
right -- in understand how floating point works -- but are you are
saying that you are ok with the fact that (for example) a table with a
floating point unique key could dump and not restore? more
specifically, a binary dump would restore but a text dump would not.pg_dump takes measures against that (see extra_float_digits).
I think this is a problem with our implementation -- not all versions
of 2183.68 as outputted from the server are the same internally.It's an inherent property of float math. Yes, we could set the default
value of extra_float_digits high enough that distinct internal values
always had distinct text representations, but trust me, you would not
like it.
no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).
update foo set val=val + 1 where val = 2183.68;
merlin
Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:
no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).update foo set val=val + 1 where val = 2183.68;
I think the mere idea of using floating point equality on a
WHERE clause is bogus, regardless of text or binary format.
--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Feb 25, 2011 at 9:40 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:
no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).update foo set val=val + 1 where val = 2183.68;
I think the mere idea of using floating point equality on a
WHERE clause is bogus, regardless of text or binary format.
That's a bridge to far -- akin to saying floating point should not
support equality operator. select count(*) from foo where val >=
2183.68? you are ok getting different answers depending on method of
transmission of 2183.68 to the server?
merlin
On Fri, Feb 25, 2011 at 9:48 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Fri, Feb 25, 2011 at 9:40 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:Excerpts from Merlin Moncure's message of vie feb 25 12:28:25 -0300 2011:
no I wouldn't, and the pg_dump extra_float_digits setting addresses my
primary concern. The client has a similar issue though -- suppose it
fetches a value from the server and updates it back -- which record
gets the update? You would get different results if the client was
using binary or text features of the protocol. Not saying this is
wrong or needs to be fixed, just pointing it out :-).update foo set val=val + 1 where val = 2183.68;
I think the mere idea of using floating point equality on a
WHERE clause is bogus, regardless of text or binary format.That's a bridge to[sic] far -- akin to saying floating point should not
support equality operator. select count(*) from foo where val >=
2183.68? you are ok getting different answers depending on method of
transmission of 2183.68 to the server?
I stand corrected -- I did some digging and Postgres's handling of
this issue is afaict correct: you are supposed to round on
presentation only, and equality matching on floating point in sql
(just like in C) is capricious exercise at best, at least without some
defenses. So, we can definitely file under 'not a bug'.
merlin