floating point representation
Hi all,
I have a question of PostgreSQL's floating point
representation.
create table t (fl1 float4, fl2 float4, fl3 float4);
insert into t values (1.234567, 1.23457, 1.23457);
select * from t;
fl1 | fl2 | fl3
---------+---------+---------
1.23457 | 1.23457 | 1.23457
(1 row)
select * from t where fl1=fl2;
fl1 | fl2 | fl3
-----+-----+-----
(0 rows)
select * from t where t where fl2=fl3;
fl1 | fl2 | fl3
---------+---------+---------
1.23457 | 1.23457 | 1.23457
(1 row)
OK, fl1 != fl2 and fl2 == fl3 but
copy t to stdout;
1.23457 1.23457 1.23457
The output of pg_dump is same. Then
after restoring from the pg_dump
output, we would get a tuple such
that fl1==fl2==fl3.
Is it reasonable ?
In addtion this makes a client library like ODBC
driver very unhappy with the handlig of floating
point data. For example, once a floating point
data like fl1(1.234567) was stored, MS-Access
couldn't update the tuple any more.
Is there a way to change the precision of floating
point representation from clients ?
Regards,
Hiroshi Inoue
Hiroshi Inoue writes:
Is there a way to change the precision of floating
point representation from clients ?
Not currently, but I image it couldn't be too hard to introduce a
parameter that changes the format string used by float*out to something
else.
The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.
--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes:
The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.
Here's what I find in the C99 draft:
a,A A double argument representing a (finite) floating-
point number is converted in the style
[-]0xh.hhhhp�d, where there is one hexadecimal digit
^ � == "+/-" ... tgl
(which is nonzero if the argument is a normalized
floating-point number and is otherwise unspecified)
before the decimal-point character (219) and the
number of hexadecimal digits after it is equal to
the precision; if the precision is missing and
FLT_RADIX is a power of 2, then the precision is
sufficient for an exact representation of the value;
if the precision is missing and FLT_RADIX is not a
power of 2, then the precision is sufficient to
distinguish (220) values of type double, except that
trailing zeros may be omitted; if the precision is
zero and the # flag is not specified, no decimal-
point character appears. The letters abcdef are
used for a conversion and the letters ABCDEF for A
conversion. The A conversion specifier produces a
number with X and P instead of x and p. The
exponent always contains at least one digit, and
only as many more digits as necessary to represent
the decimal exponent of 2. If the value is zero,
the exponent is zero.
A double argument representing an infinity or NaN is
converted in the style of an f or F conversion
specifier.
____________________
219Binary implementations can choose the hexadecimal digit
to the left of the decimal-point character so that
subsequent digits align to nibble (4-bit) boundaries.
220The precision p is sufficient to distinguish values of
the source type if 16p-1>bn where b is FLT_RADIX and n is
the number of base-b digits in the significand of the
source type. A smaller p might suffice depending on the
implementation's scheme for determining the digit to the
left of the decimal-point character.
7.19.6.1 Library 7.19.6.1
314 Committee Draft -- August 3, 1998 WG14/N843
So, it looks like C99-compliant libc implementations will have this,
but I'd hesitate to rely on it for pg_dump purposes; it would certainly
not be very portable for awhile yet.
Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?
regards, tom lane
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]Peter Eisentraut <peter_e@gmx.net> writes:
The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.
[snip]
So, it looks like C99-compliant libc implementations will have this,
but I'd hesitate to rely on it for pg_dump purposes; it would certainly
not be very portable for awhile yet.
Agreed.
Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?
Currently the precision of float display format is FLT_DIG(DBL_DIG).
It's not sufficent to distinguish float values. As Peter already suggested,
the quickest solution would be to change XXX_DIG constants to variables
and provide a routine to SET the variables. Strictly speaking the precision
needed to distigush float values seems OS-dependent. It seems preferable
to have a symbol to specify the precision.
Regards,
Hiroshi Inoue
I wrote:
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]Peter Eisentraut <peter_e@gmx.net> writes:
[snip]
Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?Currently the precision of float display format is FLT_DIG(DBL_DIG).
It's not sufficent to distinguish float values. As Peter already suggested,
the quickest solution would be to change XXX_DIG constants to variables
and provide a routine to SET the variables. Strictly speaking the precision
needed to distigush float values seems OS-dependent. It seems preferable
to have a symbol to specify the precision.
The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..
Or must we postpone to fix it ?
Regards,
Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..
Or must we postpone to fix it ?
This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use. The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using). This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.
regards, tom lane
Tom Lane wrote:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..Or must we postpone to fix it ?
This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use.
Agreed.
The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).
Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ?
This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.
Regards,
Hiroshi Inoue
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
Tom Lane wrote:
The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).
Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ?
I didn't check my numbers. If the current behavior is '%.6g','%.15g'
then we should stay with that as the default.
Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings. Do we support any platforms where these are not 6 & 15?
regards, tom lane
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..Or must we postpone to fix it ?
This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use. The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using). This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.
Added to TODO:
* Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hiroshi Inoue writes:
The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..
I'd prefer names that go with the SQL type names:
REAL_FORMAT
DOUBLE_PRECISION_FORMAT
Seems a bit tacky, but a lot of work has been put in to make these names
more prominent.
--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Hiroshi Inoue writes:
The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..Or must we postpone to fix it ?
Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.
And what about the geometry types that are based on floats?
--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Bruce Momjian <pgman@candle.pha.pa.us> writes:
Added to TODO:
* Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args
foo_PRECISION is not the right name if these variables will contain
printf format strings. Perhaps foo_FORMAT? Anyone have a better idea?
After further thought I think that we ought to standardize on %.6g and
%.15g even if the local <float.h> offers slightly different values of
FLT_DIG and DBL_DIG. IEEE or near-IEEE float math is so close to
universal that I don't think it's worth worrying about the possibility
that different precisions would be more appropriate for some platforms.
Furthermore, having cross-platform consistency of display format seems
more useful than not.
Something else we should perhaps think about, though we are very late
in beta: once these variables exist, we could have the geometry regress
test set them to suppress a couple of digits, and eliminate most if not
all of the need for platform-specific geometry results. Doing this
would be a no-brainer at any other time in the development cycle, but
right now I am worried about whether we'd be able to reconfirm regress
results on all the currently-supported platforms before release.
Comments?
regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes:
Or must we postpone to fix it ?
Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.
Hmm... that wouldn't matter, except for this %a format. Maybe we'd
better not try to make this happen in the waning days of the 7.1 cycle.
And what about the geometry types that are based on floats?
They should track the float8 format, certainly.
regards, tom lane
Hiroshi Inoue writes:
The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..I'd prefer names that go with the SQL type names:
REAL_FORMAT
DOUBLE_PRECISION_FORMATSeems a bit tacky, but a lot of work has been put in to make these names
more prominent.
TODO updated:
* Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT using printf args
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Peter Eisentraut <peter_e@gmx.net> writes:
Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.
Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).
However, it's becoming quite clear to me that this feature needs more
thought than first appeared. Accordingly, I now vote that we not try
to fit it into 7.1, but do it in a more considered fashion for 7.2.
regards, tom lane
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).
From the first I don't want to change the current default
output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.
However, it's becoming quite clear to me that this feature needs more
thought than first appeared. Accordingly, I now vote that we not try
to fit it into 7.1, but do it in a more considered fashion for 7.2.
The simplest way to fix it quickly would be to not provide
XXXX_IN_FORMAT and restrict XXXX_FORMAT to "%.*g" at present.
Regards,
Hiroshi Inoue
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).From the first I don't want to change the current default
output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.
But we run regress with the proper setting, right? How does giving
people the ability to change the defaults affect the regression tests?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote:
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).From the first I don't want to change the current default
output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.But we run regress with the proper setting, right?> How does giving
people the ability to change the defaults affect the regression tests?
Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).
Regards,
Hiroshi Inoue
From the first I don't want to change the current default
output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.But we run regress with the proper setting, right?> How does giving
people the ability to change the defaults affect the regression tests?Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).
But are they going to change the default to run the regression tests?
How do they change it? in ~/.psqlrc?
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings. Do we support any platforms where these are not 6 & 15?
In principle, yes. VAX does not use IEEE math (by default anyway) and
has less range and more precision. Most machines nowadays use the IEEE
definitions, but having at least one counterexample will help keep us
honest ;)
- Thomas