float output precision questions

Started by Pedro Miguel Frazao Fernandes Ferreiraabout 23 years ago33 messages

pfrazao@ualg.pt

about 23 years ago

Hi All,

I wrote a Matlab interface to PostgreSQL based on libpq. It is working
fine, supports multiple connections, supports all data types and arrays.
It is a C program to do the interface and some Matlab wrapper functions
around it to do the job at application level.

Matlab has an ODBC toolbox but we dont want to buy it since the
interface does the job and we have been using PostgreSQL for long time.
We want to use PostgreSQl to store numeric data from simulation,
computation and data acquisition sources. Basically a big bunch of float
numbers.

There is still one problem, regarding float output formatting in querys
and dumps, which essential for this type of application.

If I have a float8 field (named real8 below) in a table and insert the
following,

insert into test(real8) values (4503599627370496);
INSERT 21192 1

A select produces,

select real8 from test;
real8
---------------------
4.5035996273705e+15
(1 row)

This is the string I would get from libpq's PQgetvalue(). PQftype()
correctly returns float8. pg_dump produces the same result. If I convert
PQgetvalue() to a C double I wont get the same number I inserted. If I
do a restore from a dump I also wont have the same number. I can see
that the original number is correctly stored because,

select to_char(real8,'9999999999999999999.99999') from test;
to_char
----------------------
4503599627370496
(1 row)

This way PQftype wont tell this is a float8.

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

Matlab has a toolbox fucntion, claiming maximum precision, to convert
from its double type (PostgreSQL float8) to string which does a
sprintf('%25.18g',number).

Would it be possible to have a a parameter which could be changed by a
SET command, in order to control output precision ? I searched the docs
but could not find this.

--
----------------------------------------------------------------------
Pedro Miguel Frazao Fernandes Ferreira
Universidade do Algarve
Faculdade de Ciencias e Tecnologia
Campus de Gambelas
8000-117 Faro
Portugal
Tel./Fax: (+351) 289 800950 / 289 819403
http://w3.ualg.pt/~pfrazao

Pedro Miguel Frazao Fernandes Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Pedro Miguel Frazao Fernandes Ferreira (#1)

Re: float output precision questions

Sorry. I forgot to thank for any help from all of you in the previous
message. Thanks! :)

Just one more thing:

I now I can go to the source and change the code which converts floats
to strings, and have my problem solved. But this wont be general. Others
might need this kind of application.

I could post this interface to postgres interfaces site. Do you think
this is ok ?

Thanks again !

Pedro M. Ferreira

Pedro Miguel Frazao Fernandes Ferreira wrote:

Hi All,

I wrote a Matlab interface to PostgreSQL based on libpq. It is working
fine, supports multiple connections, supports all data types and arrays.
It is a C program to do the interface and some Matlab wrapper functions
around it to do the job at application level.

Matlab has an ODBC toolbox but we dont want to buy it since the
interface does the job and we have been using PostgreSQL for long time.
We want to use PostgreSQl to store numeric data from simulation,
computation and data acquisition sources. Basically a big bunch of float
numbers.

There is still one problem, regarding float output formatting in querys
and dumps, which essential for this type of application.

If I have a float8 field (named real8 below) in a table and insert the
following,

insert into test(real8) values (4503599627370496);
INSERT 21192 1

A select produces,

select real8 from test;
real8
---------------------
4.5035996273705e+15
(1 row)

This is the string I would get from libpq's PQgetvalue(). PQftype()
correctly returns float8. pg_dump produces the same result. If I convert
PQgetvalue() to a C double I wont get the same number I inserted. If I
do a restore from a dump I also wont have the same number. I can see
that the original number is correctly stored because,

select to_char(real8,'9999999999999999999.99999') from test;
to_char
----------------------
4503599627370496
(1 row)

This way PQftype wont tell this is a float8.

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

Matlab has a toolbox fucntion, claiming maximum precision, to convert
from its double type (PostgreSQL float8) to string which does a
sprintf('%25.18g',number).

Would it be possible to have a a parameter which could be changed by a
SET command, in order to control output precision ? I searched the docs
but could not find this.

Bruce Momjian

pgman@candle.pha.pa.us

about 23 years ago

In reply to: Pedro Miguel Frazao Fernandes Ferreira (#1)

Re: float output precision questions

TODO has:

o Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT
using printf args

so we have not implemented it yet.

---------------------------------------------------------------------------

Pedro Miguel Frazao Fernandes Ferreira wrote:

Hi All,

I wrote a Matlab interface to PostgreSQL based on libpq. It is working
fine, supports multiple connections, supports all data types and arrays.
It is a C program to do the interface and some Matlab wrapper functions
around it to do the job at application level.

Matlab has an ODBC toolbox but we dont want to buy it since the
interface does the job and we have been using PostgreSQL for long time.
We want to use PostgreSQl to store numeric data from simulation,
computation and data acquisition sources. Basically a big bunch of float
numbers.

There is still one problem, regarding float output formatting in querys
and dumps, which essential for this type of application.

If I have a float8 field (named real8 below) in a table and insert the
following,

insert into test(real8) values (4503599627370496);
INSERT 21192 1

A select produces,

select real8 from test;
real8
---------------------
4.5035996273705e+15
(1 row)

This is the string I would get from libpq's PQgetvalue(). PQftype()
correctly returns float8. pg_dump produces the same result. If I convert
PQgetvalue() to a C double I wont get the same number I inserted. If I
do a restore from a dump I also wont have the same number. I can see
that the original number is correctly stored because,

select to_char(real8,'9999999999999999999.99999') from test;
to_char
----------------------
4503599627370496
(1 row)

This way PQftype wont tell this is a float8.

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

Matlab has a toolbox fucntion, claiming maximum precision, to convert
from its double type (PostgreSQL float8) to string which does a
sprintf('%25.18g',number).

Would it be possible to have a a parameter which could be changed by a
SET command, in order to control output precision ? I searched the docs
but could not find this.

--
----------------------------------------------------------------------
Pedro Miguel Frazao Fernandes Ferreira
Universidade do Algarve
Faculdade de Ciencias e Tecnologia
Campus de Gambelas
8000-117 Faro
Portugal
Tel./Fax: (+351) 289 800950 / 289 819403
http://w3.ualg.pt/~pfrazao

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Peter Eisentraut

peter_e@gmx.net

about 23 years ago

In reply to: Pedro Miguel Frazao Fernandes Ferreira (#1)

Re: float output precision questions

Pedro Miguel Frazao Fernandes Ferreira writes:

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

There isn't a way right now, but it's planned to be able to dump
floating-point numbers in some binary form (like printf("%A")) to be able
to restore them exactly. Not sure how that would satisfy the needs of
client interfaces, though.

Matlab has a toolbox fucntion, claiming maximum precision, to convert
from its double type (PostgreSQL float8) to string which does a
sprintf('%25.18g',number).

Do we have some mathematical guarantee that this is sufficient and
necessary? If so, then it might do.

--
Peter Eisentraut peter_e@gmx.net

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Bruce Momjian (#3)

Re: float output precision questions

Bruce Momjian <pgman@candle.pha.pa.us> writes:

TODO has:
o Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT
using printf args
so we have not implemented it yet.

IIRC, the last time it was discussed there was disagreement about how
it should work; check the pghackers archives for details. The feature
probably won't go anywhere until those issues are resolved.

regards, tom lane

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 23 years ago

In reply to: Peter Eisentraut (#4)

Re: float output precision questions

On Tue, 29 Oct 2002, Peter Eisentraut wrote:

Pedro Miguel Frazao Fernandes Ferreira writes:

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

How do you define maximum precision and "same"? With simple test programs
in C, using two digits more than DBL_DIG for printf specifier, it's easy
for me to find numbers that "change" decimal string representation in the
decimal representation -> double -> decimal representation conversion(*).
The final double you get from the second conversion should be the same as
the first, but is that what you need or do you need a stronger guarantee
than that?

Bruno Wolff III

bruno@wolff.to

about 23 years ago

In reply to: Peter Eisentraut (#4)

Re: float output precision questions

On Tue, Oct 29, 2002 at 23:19:05 +0100,
Peter Eisentraut <peter_e@gmx.net> wrote:

There isn't a way right now, but it's planned to be able to dump
floating-point numbers in some binary form (like printf("%A")) to be able
to restore them exactly. Not sure how that would satisfy the needs of
client interfaces, though.

Why not print it as a floating binary number instead of a floating decimal
number? I would think that would give you better portability than a system
specific binary representation.

Pedro Miguel Frazao Fernandes Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Peter Eisentraut (#4)

Re: float output precision questions

Peter Eisentraut wrote:

Matlab has a toolbox fucntion, claiming maximum precision, to convert
from its double type (PostgreSQL float8) to string which does a
sprintf('%25.18g',number).

Do we have some mathematical guarantee that this is sufficient and
necessary? If so, then it might do.

It is necessary if you want to do this type of (huge amount of number
storage) application:

[Some client] (insert) [PostgreSQL] (query) [Some client]
(double number a)-------->(float8 number)------->(double number b)

In order for a=b this is necessary. With current float8 output this is
not allways true.

Here is the help for that particular Matlab function:

NUM2MSTR Convert number to string in maximum precision.
S = NUM2MSTR(N) converts real numbers of input
matrix N to string output vector S, in
maximum precision.

See also NUM2STR.

If you want I can try to contact the guys who coded this function to know
if this is sufficient.

Thanks,
Pedro M. Ferreira
--
----------------------------------------------------------------------
Pedro Miguel Frazao Fernandes Ferreira
Universidade do Algarve
Faculdade de Ciencias e Tecnologia
Campus de Gambelas
8000-117 Faro
Portugal
Tel./Fax: (+351) 289 800950 / 289 819403
http://w3.ualg.pt/~pfrazao

Pedro Miguel Frazao Fernandes Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#6)

Re: float output precision questions

Stephan Szabo wrote:

On Tue, 29 Oct 2002, Peter Eisentraut wrote:

Pedro Miguel Frazao Fernandes Ferreira writes:

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

How do you define maximum precision and "same"? With simple test programs
in C, using two digits more than DBL_DIG for printf specifier, it's easy
for me to find numbers that "change" decimal string representation in the
decimal representation -> double -> decimal representation conversion(*).
The final double you get from the second conversion should be the same as
the first, but is that what you need or do you need a stronger guarantee
than that?

When I say "same" I am talking about having a number 'stored' in double
format in some client, inserting it in PostgreSQL float8 field and get
it to the client as it was before:

[Some client] (insert) [PostgreSQL] (query) [Some client]
(double number a)-------->(float8 number)------->(double number b)

"same" is so that a==b is true.
With current float8 output this is not allways true.

I believe this should allways be true for numbers which are originally
stored in double format.

Thanks,
Pedro M. Ferreira

#10

Pedro Miguel Frazao Fernandes Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Pedro Miguel Frazao Fernandes Ferreira (#1)

Re: float output precision questions

Bruno Wolff III wrote:

On Tue, Oct 29, 2002 at 23:19:05 +0100,
Peter Eisentraut <peter_e@gmx.net> wrote:

There isn't a way right now, but it's planned to be able to dump
floating-point numbers in some binary form (like printf("%A")) to be able
to restore them exactly. Not sure how that would satisfy the needs of
client interfaces, though.

Why not print it as a floating binary number instead of a floating decimal
number? I would think that would give you better portability than a system
specific binary representation.

Having a way to get the binary storage representation for float numbers
would be good and would suffice within the same float number standard.

Thanks,
Pedro M. Ferreira

#11

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 23 years ago

In reply to: Pedro Miguel Frazao Fernandes Ferreira (#9)

Re: float output precision questions

On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote:

Stephan Szabo wrote:

On Tue, 29 Oct 2002, Peter Eisentraut wrote:

Pedro Miguel Frazao Fernandes Ferreira writes:

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

How do you define maximum precision and "same"? With simple test programs
in C, using two digits more than DBL_DIG for printf specifier, it's easy
for me to find numbers that "change" decimal string representation in the
decimal representation -> double -> decimal representation conversion(*).
The final double you get from the second conversion should be the same as
the first, but is that what you need or do you need a stronger guarantee
than that?

When I say "same" I am talking about having a number 'stored' in double
format in some client, inserting it in PostgreSQL float8 field and get
it to the client as it was before:

[Some client] (insert) [PostgreSQL] (query) [Some client]
(double number a)-------->(float8 number)------->(double number b)

"same" is so that a==b is true.
With current float8 output this is not allways true.

I believe this should allways be true for numbers which are originally
stored in double format.

The problem is that there are two competing needs here. One is the above,
the other other is that you get something that has the same decimal
representation (within the float's ability to store the number). Right now
the system does the latter since for most people, that seems to be the
guarantee they want.

This would probably make sense as an option, so why don't you look at the
past discussions and see if you can come up with a solution that keeps
everyone happy (and preferably implement it, but...) :)

#12

Pedro Miguel Frazao Fernandes Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#11)

Re: float output precision questions

Stephan Szabo wrote:

On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote:

Stephan Szabo wrote:

On Tue, 29 Oct 2002, Peter Eisentraut wrote:

Pedro Miguel Frazao Fernandes Ferreira writes:

Is there a way to set query output precision to maximum precision ?
For the type of application I mentioned this is crucial. People want to
get the 'same' numbers, from querys or dumps, as they inserted them.

How do you define maximum precision and "same"? With simple test programs
in C, using two digits more than DBL_DIG for printf specifier, it's easy
for me to find numbers that "change" decimal string representation in the
decimal representation -> double -> decimal representation conversion(*).
The final double you get from the second conversion should be the same as
the first, but is that what you need or do you need a stronger guarantee
than that?

When I say "same" I am talking about having a number 'stored' in double
format in some client, inserting it in PostgreSQL float8 field and get
it to the client as it was before:

[Some client] (insert) [PostgreSQL] (query) [Some client]
(double number a)-------->(float8 number)------->(double number b)

"same" is so that a==b is true.
With current float8 output this is not allways true.

I believe this should allways be true for numbers which are originally
stored in double format.

The problem is that there are two competing needs here. One is the above,
the other other is that you get something that has the same decimal
representation (within the float's ability to store the number). Right now
the system does the latter since for most people, that seems to be the
guarantee they want.

Look at this example:

1.79769313486231571e+308 is the largest floating point number
representable by a C double in x86.

In C this is possible:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
double v;
char a[30];

v=1.79769313486231571e+308;

printf(" Stored double number: %25.18g\n",v);
sprintf(a,"%25.18g",v);
printf(" Converted to string: %s\n",a);
v=atof(a);
printf("Converted from string to double: %25.18g\n",v);
}

Using standard PostgreSQL query output it would not be possible to get
this number, which has representation as a double.

I fetched the PostgreSQL source from Debian, changed
src/backend/utils/adt/float.c to do sprintf(ascii, "%25.18g", num)
instead of sprintf(ascii, "%.*g", DBL_DIG, num), compiled and installed.
Now I can get the number as is.

I understand that if people insert a value of 1.1 in a double, they want
to get 1.1 without knowing that in fact the stored number is
1.10000000000000009. But do you understand that if some people insert,
for example, a value of 1.79769313486231571e+308 they dont want to get
1.79769313486232e+308 which does not compare equal (in Matlab or C) to
the first ? This is a bug.

This would probably make sense as an option, so why don't you look at the
past discussions and see if you can come up with a solution that keeps
everyone happy (and preferably implement it, but...) :)

but ???

I have a sugestion:

To have parameters, say DOUBLE_FORMAT and FLOAT_FORMAT, which could have
option values of 'SHORT' and 'LONG'.
Option 'SHORT' would be default and produce the standard sprintf(ascii,...
Option 'LONG' would produce sprintf(ascii, "%25.18g", num).

Other way would be to have number parameters to be used in the sprintf
calls, in place of 25 and 18, in the format string.

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#13

Peter Eisentraut

peter_e@gmx.net

about 23 years ago

In reply to: Bruno Wolff III (#7)

Re: float output precision questions

Bruno Wolff III writes:

On Tue, Oct 29, 2002 at 23:19:05 +0100,
Peter Eisentraut <peter_e@gmx.net> wrote:

There isn't a way right now, but it's planned to be able to dump
floating-point numbers in some binary form (like printf("%A")) to be able
to restore them exactly. Not sure how that would satisfy the needs of
client interfaces, though.

Why not print it as a floating binary number instead of a floating decimal
number? I would think that would give you better portability than a system
specific binary representation.

The printf("%A") output is not system-specific.

--
Peter Eisentraut peter_e@gmx.net

#14

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 23 years ago

In reply to: Pedro Miguel Frazao Fernandes Ferreira (#12)

Re: float output precision questions

On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote:

In C this is possible:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
double v;
char a[30];

v=1.79769313486231571e+308;

printf(" Stored double number: %25.18g\n",v);
sprintf(a,"%25.18g",v);
printf(" Converted to string: %s\n",a);
v=atof(a);
printf("Converted from string to double: %25.18g\n",v);
}

AFAICT, this is not guaranteed to give you the same representation that
you're using in the initializer however.

Using standard PostgreSQL query output it would not be possible to get
this number, which has representation as a double.

I fetched the PostgreSQL source from Debian, changed
src/backend/utils/adt/float.c to do sprintf(ascii, "%25.18g", num)
instead of sprintf(ascii, "%.*g", DBL_DIG, num), compiled and installed.
Now I can get the number as is.

I understand that if people insert a value of 1.1 in a double, they want
to get 1.1 without knowing that in fact the stored number is
1.10000000000000009. But do you understand that if some people insert,
for example, a value of 1.79769313486231571e+308 they dont want to get
1.79769313486232e+308 which does not compare equal (in Matlab or C) to
the first ? This is a bug.

I disagree to some extent. I'm not sure it's meaningful to expect that
(what if the database and the client are on different architectures) in
general. In any case, you're effectively going from decimal
representation to double to decimal representation (the string you used
to insert it -> internal representation -> string used to output it) and
that's only guaranteed to be correct up to DBL_DIG digits as far as I can
tell. I think it'd be nice to have an option to get more digits for those
sorts of applications, however.

This would probably make sense as an option, so why don't you look at the
past discussions and see if you can come up with a solution that keeps
everyone happy (and preferably implement it, but...) :)

but ???

, but I realize that you might not be interested in doing such. (I figured
the last part was implied)

I have a sugestion:

To have parameters, say DOUBLE_FORMAT and FLOAT_FORMAT, which could have
option values of 'SHORT' and 'LONG'.
Option 'SHORT' would be default and produce the standard sprintf(ascii,...
Option 'LONG' would produce sprintf(ascii, "%25.18g", num).

Other way would be to have number parameters to be used in the sprintf
calls, in place of 25 and 18, in the format string.

From what Tom said, something similar was suggested and there were issues
brought up. I don't know what they were, since I wasn't personally
terribly interested, but it should be in the archives. If there were any
concerns, you'll probably need to deal with those as well.

#15

Bruno Wolff III

bruno@wolff.to

about 23 years ago

In reply to: Peter Eisentraut (#13)

Re: float output precision questions

On Wed, Oct 30, 2002 at 19:27:57 +0100,
Peter Eisentraut <peter_e@gmx.net> wrote:

The printf("%A") output is not system-specific.

Just out of curiosity, can you tell me a web page or keywords to use
in a search to see what that format does? I tried using google, but
searching for printf with and "A" format didn't go to well. I only
found one correct reference and it didn't explain what %A did.
Thanks.

#16

Pedro M. Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#14)

Re: float output precision questions

Stephan Szabo wrote:

On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote:

I understand that if people insert a value of 1.1 in a double, they want
to get 1.1 without knowing that in fact the stored number is
1.10000000000000009. But do you understand that if some people insert,
for example, a value of 1.79769313486231571e+308 they dont want to get
1.79769313486232e+308 which does not compare equal (in Matlab or C) to
the first ? This is a bug.

I disagree to some extent. I'm not sure it's meaningful to expect that
(what if the database and the client are on different architectures) in
general. In any case, you're effectively going from decimal
representation to double to decimal representation (the string you used
to insert it -> internal representation -> string used to output it) and
that's only guaranteed to be correct up to DBL_DIG digits as far as I can
tell. I think it'd be nice to have an option to get more digits for those
sorts of applications, however.

In the previous email example, in C, I was going from decimal to double
and so on, but this is not the case when I do some simulation. In this
case it will allways be from Matlab double to PostgreSQL float8 and from
libpq PQgetvalue() string to Matlab double. The example was just a x86
number example where (got the string from Matlab double) query output
would fail.

This would probably make sense as an option, so why don't you look at the
past discussions and see if you can come up with a solution that keeps
everyone happy (and preferably implement it, but...) :)

but ???

, but I realize that you might not be interested in doing such. (I figured
the last part was implied)

ok! :)

No problem. I have seen the GUC thing in the source (guc.c etc...) and it
does not look too dificult. It has lots of examples in the code itself.
What I am saying is that I can do it if pg-people agree on the (some)
way to do it.

To have parameters, say DOUBLE_FORMAT and FLOAT_FORMAT, which could have
option values of 'SHORT' and 'LONG'.
Option 'SHORT' would be default and produce the standard sprintf(ascii,...
Option 'LONG' would produce sprintf(ascii, "%25.18g", num).

Other way would be to have number parameters to be used in the sprintf
calls, in place of 25 and 18, in the format string.

From what Tom said, something similar was suggested and there were issues

brought up. I don't know what they were, since I wasn't personally
terribly interested, but it should be in the archives. If there were any
concerns, you'll probably need to deal with those as well.

I looked at some of these emails and it seemed to me that the problem
was that Tom did'nt want a parameter that would force people to know
about printf number formatting. I think the first solution above (the
SHORT and LONG way) is simple, maintains usual output as default and
enables 'maximum' precision at request.

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#17

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 23 years ago

In reply to: Pedro M. Ferreira (#16)

Re: float output precision questions

On Wed, 30 Oct 2002, Pedro M. Ferreira wrote:

Stephan Szabo wrote:

On Wed, 30 Oct 2002, Pedro Miguel Frazao Fernandes Ferreira wrote:

I understand that if people insert a value of 1.1 in a double, they want
to get 1.1 without knowing that in fact the stored number is
1.10000000000000009. But do you understand that if some people insert,
for example, a value of 1.79769313486231571e+308 they dont want to get
1.79769313486232e+308 which does not compare equal (in Matlab or C) to
the first ? This is a bug.

I disagree to some extent. I'm not sure it's meaningful to expect that
(what if the database and the client are on different architectures) in
general. In any case, you're effectively going from decimal
representation to double to decimal representation (the string you used
to insert it -> internal representation -> string used to output it) and
that's only guaranteed to be correct up to DBL_DIG digits as far as I can
tell. I think it'd be nice to have an option to get more digits for those
sorts of applications, however.

In the previous email example, in C, I was going from decimal to double
and so on, but this is not the case when I do some simulation. In this
case it will allways be from Matlab double to PostgreSQL float8 and from
libpq PQgetvalue() string to Matlab double. The example was just a x86
number example where (got the string from Matlab double) query output
would fail.

I was just responding to it being a bug. I don't think that expecting
a float->db->float (double->db->double) giving the same double is always
safe when you assume that the PostgreSQL machine might be on a system with
different guarantees about precision. In practice, it's probably not a
big deal.

, but I realize that you might not be interested in doing such. (I figured
the last part was implied)

ok! :)

No problem. I have seen the GUC thing in the source (guc.c etc...) and it
does not look too dificult. It has lots of examples in the code itself.
What I am saying is that I can do it if pg-people agree on the (some)
way to do it.

Yeah, I didn't think it'd be hard, but sometimes people are unable or
unwilling to do C code for things.

I looked at some of these emails and it seemed to me that the problem
was that Tom did'nt want a parameter that would force people to know
about printf number formatting. I think the first solution above (the
SHORT and LONG way) is simple, maintains usual output as default and
enables 'maximum' precision at request.

That seems reasonable then, Tom'll probably give any other objections he
might have if he has any.

#18

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Stephan Szabo (#17)

Re: float output precision questions

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

On Wed, 30 Oct 2002, Pedro M. Ferreira wrote:

I looked at some of these emails and it seemed to me that the problem
was that Tom did'nt want a parameter that would force people to know
about printf number formatting. I think the first solution above (the
SHORT and LONG way) is simple, maintains usual output as default and
enables 'maximum' precision at request.

That seems reasonable then, Tom'll probably give any other objections he
might have if he has any.

My recollection is that other people (perhaps Peter?) were the ones
objecting before. However I'd be somewhat unhappy with the proposal
as given:

Option 'SHORT' would be default and produce the standard sprintf(ascii,...
Option 'LONG' would produce sprintf(ascii, "%25.18g", num).

since this seems to me to hardwire inappropriate assumptions about the
number of significant digits in a double. (Yes, I know practically
everyone uses IEEE floats these days. But it's inappropriate for PG
to assume that.)

AFAICT the real issue here is that binary float representations will
have a fractional decimal digit of precision beyond what DBL_DIG claims.
I think I could support adding an option that switches between the
current output format:
sprintf(ascii, "%.*g", DBL_DIG, num);
and:
sprintf(ascii, "%.*g", DBL_DIG+1, num);
and similarly for float4. Given carefully written float I/O routines,
reading the latter output should reproduce the originally stored value.
(And if the I/O routines are not carefully written, you probably lose
anyway.) I don't see a need for allowing more flexibility than that.

Comments?

regards, tom lane

#19

Stephan Szabo

sszabo@megazone23.bigpanda.com

about 23 years ago

In reply to: Tom Lane (#18)

Re: float output precision questions

On Wed, 30 Oct 2002, Tom Lane wrote:

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

On Wed, 30 Oct 2002, Pedro M. Ferreira wrote:

I looked at some of these emails and it seemed to me that the problem
was that Tom did'nt want a parameter that would force people to know
about printf number formatting. I think the first solution above (the
SHORT and LONG way) is simple, maintains usual output as default and
enables 'maximum' precision at request.

That seems reasonable then, Tom'll probably give any other objections he
might have if he has any.

My recollection is that other people (perhaps Peter?) were the ones
objecting before. However I'd be somewhat unhappy with the proposal
as given:

Option 'SHORT' would be default and produce the standard sprintf(ascii,...
Option 'LONG' would produce sprintf(ascii, "%25.18g", num).

since this seems to me to hardwire inappropriate assumptions about the
number of significant digits in a double. (Yes, I know practically
everyone uses IEEE floats these days. But it's inappropriate for PG
to assume that.)

True (which I actually was trying to get at in my messages as well). I'll
admit to having not read the precise proposal. It's really pretty outside
what I work with in any case.

AFAICT the real issue here is that binary float representations will
have a fractional decimal digit of precision beyond what DBL_DIG claims.
I think I could support adding an option that switches between the
current output format:
sprintf(ascii, "%.*g", DBL_DIG, num);
and:
sprintf(ascii, "%.*g", DBL_DIG+1, num);
and similarly for float4. Given carefully written float I/O routines,
reading the latter output should reproduce the originally stored value.
(And if the I/O routines are not carefully written, you probably lose
anyway.) I don't see a need for allowing more flexibility than that.

Well, on my system, it doesn't look like doing the above sprintfs will
actually work for all numbers. I did a simple program using an arbitrary
big number and the DBL_DIG+1 output when stuck into another double
actually was a different double value. DBL_DIG+2 worked on my system,
but...

#20

Pedro M. Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#17)

Re: float output precision questions

Tom Lane wrote:

My recollection is that other people (perhaps Peter?) were the ones
objecting before. However I'd be somewhat unhappy with the proposal
as given:

Option 'SHORT' would be default and produce the standard sprintf(ascii,...
Option 'LONG' would produce sprintf(ascii, "%25.18g", num).

since this seems to me to hardwire inappropriate assumptions about the
number of significant digits in a double. (Yes, I know practically
everyone uses IEEE floats these days. But it's inappropriate for PG
to assume that.)

I understand this. Unfortunately I only have IEEE compliant stuff.

AFAICT the real issue here is that binary float representations will
have a fractional decimal digit of precision beyond what DBL_DIG claims.

In fact, for some numbers I have been testing with, the double
representation can distinguish up to DBL_BIG+2.

I think I could support adding an option that switches between the
current output format:
sprintf(ascii, "%.*g", DBL_DIG, num);
and:
sprintf(ascii, "%.*g", DBL_DIG+1, num);

Easy to find numbers with double representation which would need DBL_BIG+2.

and similarly for float4. Given carefully written float I/O routines,
reading the latter output should reproduce the originally stored value.

For some numbers it does not. Not true as I said above.

(And if the I/O routines are not carefully written, you probably lose
anyway.) I don't see a need for allowing more flexibility than that.

Tests like a==b will fail for some numbers with DBL_BIG+1.
Its like I said before, the guys from matlab (in x86 IEEE float) go to
DBL_BIG+3 to have 'maximum precision'.

Comments?

Yes. I think there are several options.
I checked the sprintf(ascii, "%A", num) output format and all the
numbers that would fail because of DBL_DIG=15 are ok. After insertion on
a table and conversion to double after a query, comparison a==b holds.
AFAICT "%A" is system independent.

I would (if I may) propose the following:

Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT
would select from decimal output or normalized output. EXTRA_DIGITS
would add the required extra digits, from 0 (default) to 3, when output
is decimal.

EXTRA_DIGITS:
in the range [0:3]. 0 as defualt.

DOUBLE_OUTPUT:

'DECIMAL': sprintf(ascii, "%.*g", DBL_DIG+EXTRA_DIGITS, num); (default)
'NORMALIZED': sprintf(ascii, "%A", num);

The same could be done for floats (float4).

This way PG does not assume anything (DOUBLE_OUTPUT as 'NORMALIZED'), it
does not hardwire 'inappropriate' assumptions about the number of
significant digits in a double (default EXTRA_DIGITS=0), and it gives
flexibility (EXTRA_DIGITS!=0) if needed.
I think this is functional and reasonable.

Regards,
Pedro M. Ferreira

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#21

Pedro M. Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#17)

Re: float output precision questions

Just one more note,

Maybe it makes sense that in the proposal below the parameter
EXTRA_DIGITS could be SIGNIFICANT_DIGITS with a default value of 15 and
maximum 18.
Its more 'documentable' and maybe easy to understand in general.

Pedro M. Ferreira wrote:

Yes. I think there are several options.
I checked the sprintf(ascii, "%A", num) output format and all the
numbers that would fail because of DBL_DIG=15 are ok. After insertion on
a table and conversion to double after a query, comparison a==b holds.
AFAICT "%A" is system independent.

I would (if I may) propose the following:

Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT
would select from decimal output or normalized output. EXTRA_DIGITS
would add the required extra digits, from 0 (default) to 3, when output
is decimal.

EXTRA_DIGITS:
in the range [0:3]. 0 as defualt.

DOUBLE_OUTPUT:

'DECIMAL': sprintf(ascii, "%.*g", DBL_DIG+EXTRA_DIGITS, num); (default)
'NORMALIZED': sprintf(ascii, "%A", num);

The same could be done for floats (float4).

This way PG does not assume anything (DOUBLE_OUTPUT as 'NORMALIZED'), it
does not hardwire 'inappropriate' assumptions about the number of
significant digits in a double (default EXTRA_DIGITS=0), and it gives
flexibility (EXTRA_DIGITS!=0) if needed.
I think this is functional and reasonable.

Regards,
Pedro M. Ferreira

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

#22

Zeugswetter Andreas SB SD

ZeugswetterA@spardat.at

about 23 years ago

In reply to: Pedro M. Ferreira (#21)

Re: float output precision questions

Maybe it makes sense that in the proposal below the parameter
EXTRA_DIGITS could be SIGNIFICANT_DIGITS with a default value
of 15 and maximum 18.
Its more 'documentable' and maybe easy to understand in general.

Yes agree (or double_significant_digits or format_double_digits ?),
but default to DBL_DIG and allow range between 1 and DBL_DIG + 3.
format_* could be used for all future output format tweaks.

Unfortunately %A is not portable :-(

Andreas

Import Notes

Resolved by subject fallback

#23

Pedro M. Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Zeugswetter Andreas SB SD (#22)

Re: float output precision questions

Zeugswetter Andreas SB SD wrote:

Yes agree (or double_significant_digits or format_double_digits ?),
but default to DBL_DIG and allow range between 1 and DBL_DIG + 3.
format_* could be used for all future output format tweaks.

Unfortunately %A is not portable :-(

What do you mean ?
It is C99, introduced in glibc 2.1.

What are the requirements for PostgreSQL ?

Pedro

#24

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Stephan Szabo (#19)

Re: float output precision questions

Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:

On Wed, 30 Oct 2002, Tom Lane wrote:

sprintf(ascii, "%.*g", DBL_DIG+1, num);
and similarly for float4. Given carefully written float I/O routines,
reading the latter output should reproduce the originally stored value.

Well, on my system, it doesn't look like doing the above sprintfs will
actually work for all numbers. I did a simple program using an arbitrary
big number and the DBL_DIG+1 output when stuck into another double
actually was a different double value. DBL_DIG+2 worked on my system,
but...

Oh, you're right; I had forgotten about the effects of scale.
DBL_DIG=15 means that the system claims to distinguish all 15-digit
values, but in a binary system there's more headroom at the bottom end
of a decimal order of magnitude. For example, 15-digit values are fine:

regression=# select 100000000000001::float8 - 100000000000000::float8;
?column?
----------
1
(1 row)

regression=# select 999999999999999::float8 - 999999999999998::float8;
?column?
----------
1
(1 row)

but the 9-etc values are over three binary orders of magnitude larger
than the 1-etc values, and so they have three less spare bits at the
right end. The system would be lying to claim DBL_DIG=16:

regression=# select 9999999999999999::float8 - 9999999999999998::float8;
?column?
----------
2
(1 row)

even though values a little over 1e15 are represented perfectly
accurately:

regression=# select 1000000000000001::float8 - 1000000000000000::float8;
?column?
----------
1
(1 row)

If you experiment with 17-digit values, you find that the representable
values are about 2 counts apart near 1e16:

regression=# select 10000000000000001::float8 - 10000000000000000::float8;
?column?
----------
0
(1 row)

regression=# select 10000000000000002::float8 - 10000000000000000::float8;
?column?
----------
2
(1 row)

but they're about 16 counts apart near 9e16:

regression=# select 99999999999999992::float8 - 99999999999999990::float8;
?column?
----------
16
(1 row)

regression=# select 99999999999999991::float8 - 99999999999999990::float8;
?column?
----------
0
(1 row)

which is exactly what you'd expect seeing that the values are about a
factor of 8 apart.

Bottom line: if DBL_DIG=15 and the float arithmetic is binary, then
there are some double values that require 17 displayed digits to
distinguish, even though not all 16-digit numbers are distinct.

So I retract my original proposal and instead suggest that we offer
a switch to display either DBL_DIG or DBL_DIG+2 significant digits
(and correspondingly increase the digits for float4). The DBL_DIG+2
case should handle the need for exact dump/reload.

regards, tom lane

#25

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Pedro M. Ferreira (#23)

Re: float output precision questions

"Pedro M. Ferreira" <pfrazao@ualg.pt> writes:

Zeugswetter Andreas SB SD wrote:

Unfortunately %A is not portable :-(

What do you mean ?

Just what he said.

It is C99, introduced in glibc 2.1.

What are the requirements for PostgreSQL ?

glibc does not define the universe; nor are all platforms supporting C99
yet.

regards, tom lane

#26

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Pedro M. Ferreira (#20)

Re: float output precision questions

"Pedro M. Ferreira" <pfrazao@ualg.pt> writes:

Its like I said before, the guys from matlab (in x86 IEEE float) go to
DBL_BIG+3 to have 'maximum precision'.

Apparently they have not read the canonical papers in the field.
[ googles for a moment... ] See

How to read floating point numbers accurately
William D. Clinger

How to print floating-point numbers accurately
Guy L. Steele, Jr., Jon L. White

both published at the 1990 ACM Conference on Programming Language Design
and Implementation and subsequently reprinted in ACM SIGPLAN Notices
Volume 25, Issue 6 (June 1990).

I was misremembering these papers to claim DBL_DIG+1 is enough, but
actually they prove that DBL_DIG+2 is necessary and sufficient (and
give code to do it correctly, too).

Printing DBL_DIG+3 is just producing an extra garbage digit; it won't
help matters. Any reasonably well-written C library is going to be
able to reproduce a double value with DBL_DIG+2 digits of I/O; and if
it's not well-written, I would have no confidence in its ability to do
so with DBL_DIG+3 digits...

regards, tom lane

#27

Pedro M. Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#17)

Re: float output precision questions

Tom Lane wrote:

I was misremembering these papers to claim DBL_DIG+1 is enough, but
actually they prove that DBL_DIG+2 is necessary and sufficient (and
give code to do it correctly, too).

Yeahh! If there's a proof its safe to implement.
I also Googled a bit and found another paper saying that 17 is the
minimum number of significant digits guaranteed to distinguish among
IEEE double-precision floating point numbers:

"Robert G. Burger and R. Kent Dybvig. Printing floating-point numbers
quickly and accurately. In Proceedings of the ACM SIGPLAN '96 Conference
on Programming Language Design and Implementation, pages 108--116"
http://citeseer.nj.nec.com/28233.html

Printing DBL_DIG+3 is just producing an extra garbage digit; it won't
help matters. Any reasonably well-written C library is going to be
able to reproduce a double value with DBL_DIG+2 digits of I/O; and if
it's not well-written, I would have no confidence in its ability to do
so with DBL_DIG+3 digits...

Off course. This is also good in terms of dump storage for big float8
databases. Its one byte less for every float8.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

#28

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Pedro M. Ferreira (#20)

Re: float output precision questions

"Pedro M. Ferreira" <pfrazao@ualg.pt> writes:

Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT
would select from decimal output or normalized output. EXTRA_DIGITS
would add the required extra digits, from 0 (default) to 3, when output
is decimal.

I'm not happy with adding the hex-output option, since it's not
very portable and doesn't seem necessary to solve the problem anyway.

But I think an EXTRA_DIGITS setting might be interesting. In
particular, suppose we allowed EXTRA_DIGITS to be negative? Setting
it to -1 or -2 would go a long way towards eliminating our problems
with platform variations in the geometry regression test.

Perhaps something like

extra_float_digits int range -2 to 2, default 0

extra_float_digits adjusts the number of digits displayed for float4 and
float8 output; the base value of 0 means we output FLT_DIG or DBL_DIG
digits respectively.

Per discussion, there's no reason to allow a value greater than 2, but
I'm not as sure what the lower limit should be --- maybe there's some
use in setting it less than -2?

regards, tom lane

#29

Pedro M. Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#19)

Re: float output precision questions

Tom Lane wrote:

Bottom line: if DBL_DIG=15 and the float arithmetic is binary, then
there are some double values that require 17 displayed digits to
distinguish, even though not all 16-digit numbers are distinct.

So I retract my original proposal and instead suggest that we offer
a switch to display either DBL_DIG or DBL_DIG+2 significant digits
(and correspondingly increase the digits for float4). The DBL_DIG+2
case should handle the need for exact dump/reload.

Nice. This will be good for number storage purposes.

Shall it be done with two parameters, 'DOUBLE_FORMAT' and
'SINGLE_FORMAT', with options 'SHORT' and 'LONG' controlling how the
sprintf's are done ?

Will someone from pg-people do it or shall I do it for you ?
As I said previously, I have seen the GUC stuff and it seem's ok for me
to do it. I really do not know if there are any restrictions on who
implements what respecting PostgreSQL.
Tomorrow we have an holyday in Portugal and I shall leave for the whole
week-end, but I can do it on monday.

Best regards,
Pedro M. Ferreira

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

#30

Pedro M. Ferreira

pfrazao@ualg.pt

about 23 years ago

In reply to: Stephan Szabo (#17)

Re: float output precision questions

I sent an email before receiving the one below. I am happier also with
the extra_digits way (from the previous email I thought the options were
DBL_DIG or DBL_DIG+2).

I'm not happy with adding the hex-output option, since it's not
very portable and doesn't seem necessary to solve the problem anyway.

Agree.

Perhaps something like

extra_float_digits int range -2 to 2, default 0

extra_float_digits adjusts the number of digits displayed for float4 and
float8 output; the base value of 0 means we output FLT_DIG or DBL_DIG
digits respectively.

Agree.

Per discussion, there's no reason to allow a value greater than 2, but
I'm not as sure what the lower limit should be --- maybe there's some
use in setting it less than -2?

I could see some use. At least in my type of application. When people
are shure they only need p significant digits, they can set
extra_float_digits to an apropriate negative value and spare a lot in
storage for dumps and backups. In this case it would make sense to let
extra_float_digits go to -13.

Regards,
Pedro

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#31

Bruno Wolff III

bruno@wolff.to

about 23 years ago

In reply to: Tom Lane (#28)

Re: float output precision questions

On Thu, Oct 31, 2002 at 12:58:21 -0500,
Tom Lane <tgl@sss.pgh.pa.us> wrote:

But I think an EXTRA_DIGITS setting might be interesting. In
particular, suppose we allowed EXTRA_DIGITS to be negative? Setting
it to -1 or -2 would go a long way towards eliminating our problems
with platform variations in the geometry regression test.

My attempt to avoid this problem with the earthdistance regression used
a cast to numeric to limit the number of digits to the right of the
decimal point.

If the normal number of digits displayed is different between two systems,
than displaying a fixed number less digits is still going to result in
differences.

#32

Tom Lane

tgl@sss.pgh.pa.us

about 23 years ago

In reply to: Bruno Wolff III (#31)

Re: float output precision questions

Bruno Wolff III <bruno@wolff.to> writes:

If the normal number of digits displayed is different between two systems,
than displaying a fixed number less digits is still going to result in
differences.

That is, however, not the problem we have with the geometry test.

regards, tom lane

#33

Bruce Momjian

pgman@candle.pha.pa.us

about 23 years ago

In reply to: Tom Lane (#28)

Re: float output precision questions

Added to TODO:

* Add GUC variables extra_float_digits and extra_double_digits
to control output digits

---------------------------------------------------------------------------

Tom Lane wrote:

"Pedro M. Ferreira" <pfrazao@ualg.pt> writes:

Have two parameters, say DOUBLE_OUTPUT and EXTRA_DIGITS. DOUBLE_OUTPUT
would select from decimal output or normalized output. EXTRA_DIGITS
would add the required extra digits, from 0 (default) to 3, when output
is decimal.

I'm not happy with adding the hex-output option, since it's not
very portable and doesn't seem necessary to solve the problem anyway.

But I think an EXTRA_DIGITS setting might be interesting. In
particular, suppose we allowed EXTRA_DIGITS to be negative? Setting
it to -1 or -2 would go a long way towards eliminating our problems
with platform variations in the geometry regression test.

Perhaps something like

extra_float_digits int range -2 to 2, default 0

extra_float_digits adjusts the number of digits displayed for float4 and
float8 output; the base value of 0 means we output FLT_DIG or DBL_DIG
digits respectively.

Per discussion, there's no reason to allow a value greater than 2, but
I'm not as sure what the lower limit should be --- maybe there's some
use in setting it less than -2?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073