floating point representation

Started by Hiroshi Inoueabout 25 years ago38 messageshackers
Jump to latest
#1Hiroshi Inoue
Inoue@tpf.co.jp

Hi all,

I have a question of PostgreSQL's floating point
representation.

create table t (fl1 float4, fl2 float4, fl3 float4);
insert into t values (1.234567, 1.23457, 1.23457);
select * from t;
fl1 | fl2 | fl3
---------+---------+---------
1.23457 | 1.23457 | 1.23457
(1 row)

select * from t where fl1=fl2;
fl1 | fl2 | fl3
-----+-----+-----
(0 rows)

select * from t where t where fl2=fl3;
fl1 | fl2 | fl3
---------+---------+---------
1.23457 | 1.23457 | 1.23457
(1 row)

OK, fl1 != fl2 and fl2 == fl3 but

copy t to stdout;
1.23457 1.23457 1.23457

The output of pg_dump is same. Then
after restoring from the pg_dump
output, we would get a tuple such
that fl1==fl2==fl3.

Is it reasonable ?

In addtion this makes a client library like ODBC
driver very unhappy with the handlig of floating
point data. For example, once a floating point
data like fl1(1.234567) was stored, MS-Access
couldn't update the tuple any more.

Is there a way to change the precision of floating
point representation from clients ?

Regards,
Hiroshi Inoue

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Hiroshi Inoue (#1)
Re: floating point representation

Hiroshi Inoue writes:

Is there a way to change the precision of floating
point representation from clients ?

Not currently, but I image it couldn't be too hard to introduce a
parameter that changes the format string used by float*out to something
else.

The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#2)
Re: floating point representation

Peter Eisentraut <peter_e@gmx.net> writes:

The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.

Here's what I find in the C99 draft:

a,A A double argument representing a (finite) floating-
point number is converted in the style
[-]0xh.hhhhp�d, where there is one hexadecimal digit
^ � == "+/-" ... tgl
(which is nonzero if the argument is a normalized
floating-point number and is otherwise unspecified)
before the decimal-point character (219) and the
number of hexadecimal digits after it is equal to
the precision; if the precision is missing and
FLT_RADIX is a power of 2, then the precision is
sufficient for an exact representation of the value;
if the precision is missing and FLT_RADIX is not a
power of 2, then the precision is sufficient to
distinguish (220) values of type double, except that
trailing zeros may be omitted; if the precision is
zero and the # flag is not specified, no decimal-
point character appears. The letters abcdef are
used for a conversion and the letters ABCDEF for A
conversion. The A conversion specifier produces a
number with X and P instead of x and p. The
exponent always contains at least one digit, and
only as many more digits as necessary to represent
the decimal exponent of 2. If the value is zero,
the exponent is zero.

A double argument representing an infinity or NaN is
converted in the style of an f or F conversion
specifier.

____________________

219Binary implementations can choose the hexadecimal digit
to the left of the decimal-point character so that
subsequent digits align to nibble (4-bit) boundaries.

220The precision p is sufficient to distinguish values of
the source type if 16p-1>bn where b is FLT_RADIX and n is
the number of base-b digits in the significand of the
source type. A smaller p might suffice depending on the
implementation's scheme for determining the digit to the
left of the decimal-point character.

7.19.6.1 Library 7.19.6.1

314 Committee Draft -- August 3, 1998 WG14/N843

So, it looks like C99-compliant libc implementations will have this,
but I'd hesitate to rely on it for pg_dump purposes; it would certainly
not be very portable for awhile yet.

Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?

regards, tom lane

#4Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tom Lane (#3)
RE: floating point representation

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Peter Eisentraut <peter_e@gmx.net> writes:

The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.

[snip]

So, it looks like C99-compliant libc implementations will have this,
but I'd hesitate to rely on it for pg_dump purposes; it would certainly
not be very portable for awhile yet.

Agreed.

Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?

Currently the precision of float display format is FLT_DIG(DBL_DIG).
It's not sufficent to distinguish float values. As Peter already suggested,
the quickest solution would be to change XXX_DIG constants to variables
and provide a routine to SET the variables. Strictly speaking the precision
needed to distigush float values seems OS-dependent. It seems preferable
to have a symbol to specify the precision.

Regards,
Hiroshi Inoue

#5Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Hiroshi Inoue (#4)
Re: floating point representation

I wrote:

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Peter Eisentraut <peter_e@gmx.net> writes:

[snip]

Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?

Currently the precision of float display format is FLT_DIG(DBL_DIG).
It's not sufficent to distinguish float values. As Peter already suggested,
the quickest solution would be to change XXX_DIG constants to variables
and provide a routine to SET the variables. Strictly speaking the precision
needed to distigush float values seems OS-dependent. It seems preferable
to have a symbol to specify the precision.

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

Regards,
Hiroshi Inoue

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#5)
Re: floating point representation

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use. The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using). This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.

regards, tom lane

#7Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Hiroshi Inoue (#4)
Re: floating point representation

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use.

Agreed.

The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).

Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ?

This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.

Regards,
Hiroshi Inoue

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#7)
Re: floating point representation

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Tom Lane wrote:

The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).

Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ?

I didn't check my numbers. If the current behavior is '%.6g','%.15g'
then we should stay with that as the default.

Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings. Do we support any platforms where these are not 6 & 15?

regards, tom lane

#9Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#6)
Re: floating point representation

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use. The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using). This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.

Added to TODO:

* Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Peter Eisentraut
peter_e@gmx.net
In reply to: Hiroshi Inoue (#5)
Re: floating point representation

Hiroshi Inoue writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

I'd prefer names that go with the SQL type names:

REAL_FORMAT
DOUBLE_PRECISION_FORMAT

Seems a bit tacky, but a lot of work has been put in to make these names
more prominent.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Hiroshi Inoue (#5)
Re: floating point representation

Hiroshi Inoue writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

And what about the geometry types that are based on floats?

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: floating point representation

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Added to TODO:
* Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args

foo_PRECISION is not the right name if these variables will contain
printf format strings. Perhaps foo_FORMAT? Anyone have a better idea?

After further thought I think that we ought to standardize on %.6g and
%.15g even if the local <float.h> offers slightly different values of
FLT_DIG and DBL_DIG. IEEE or near-IEEE float math is so close to
universal that I don't think it's worth worrying about the possibility
that different precisions would be more appropriate for some platforms.
Furthermore, having cross-platform consistency of display format seems
more useful than not.

Something else we should perhaps think about, though we are very late
in beta: once these variables exist, we could have the geometry regress
test set them to suppress a couple of digits, and eliminate most if not
all of the need for platform-specific geometry results. Doing this
would be a no-brainer at any other time in the development cycle, but
right now I am worried about whether we'd be able to reconfirm regress
results on all the currently-supported platforms before release.
Comments?

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: floating point representation

Peter Eisentraut <peter_e@gmx.net> writes:

Or must we postpone to fix it ?

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Hmm... that wouldn't matter, except for this %a format. Maybe we'd
better not try to make this happen in the waning days of the 7.1 cycle.

And what about the geometry types that are based on floats?

They should track the float8 format, certainly.

regards, tom lane

#14Bruce Momjian
bruce@momjian.us
In reply to: Peter Eisentraut (#10)
Re: floating point representation

Hiroshi Inoue writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

I'd prefer names that go with the SQL type names:

REAL_FORMAT
DOUBLE_PRECISION_FORMAT

Seems a bit tacky, but a lot of work has been put in to make these names
more prominent.

TODO updated:

* Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT using printf args

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: floating point representation

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

However, it's becoming quite clear to me that this feature needs more
thought than first appeared. Accordingly, I now vote that we not try
to fit it into 7.1, but do it in a more considered fashion for 7.2.

regards, tom lane

#16Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Peter Eisentraut (#11)
Re: floating point representation

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

From the first I don't want to change the current default
output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

However, it's becoming quite clear to me that this feature needs more
thought than first appeared. Accordingly, I now vote that we not try
to fit it into 7.1, but do it in a more considered fashion for 7.2.

The simplest way to fix it quickly would be to not provide
XXXX_IN_FORMAT and restrict XXXX_FORMAT to "%.*g" at present.

Regards,
Hiroshi Inoue

#17Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#16)
Re: floating point representationu

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

From the first I don't want to change the current default

output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

But we run regress with the proper setting, right? How does giving
people the ability to change the defaults affect the regression tests?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#18Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#17)
Re: floating point representationu

Bruce Momjian wrote:

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

From the first I don't want to change the current default

output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

But we run regress with the proper setting, right?> How does giving
people the ability to change the defaults affect the regression tests?

Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).

Regards,
Hiroshi Inoue

#19Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#18)
Re: floating point representationu

From the first I don't want to change the current default

output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

But we run regress with the proper setting, right?> How does giving
people the ability to change the defaults affect the regression tests?

Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).

But are they going to change the default to run the regression tests?
How do they change it? in ~/.psqlrc?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#20Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Hiroshi Inoue (#4)
Re: floating point representation

Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings. Do we support any platforms where these are not 6 & 15?

In principle, yes. VAX does not use IEEE math (by default anyway) and
has less range and more precision. Most machines nowadays use the IEEE
definitions, but having at least one counterexample will help keep us
honest ;)

- Thomas

#21Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#19)
#22Bruce Momjian
bruce@momjian.us
In reply to: Hiroshi Inoue (#21)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#19)
#24Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#9)
#25Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#23)
#26Bruce Momjian
bruce@momjian.us
In reply to: Thomas Lockhart (#24)
#27Pete Forman
pete.forman@westerngeco.com
In reply to: Tom Lane (#8)
#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pete Forman (#27)
#29Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#29)
#31Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#28)
#32Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#28)
#33Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#31)
#34Robert Schrem
robert.schrem@WiredMinds.de
In reply to: Philip Warner (#33)
#35Philip Warner
pjw@rhyme.com.au
In reply to: Robert Schrem (#34)
#36Robert Schrem
robert.schrem@WiredMinds.de
In reply to: Philip Warner (#35)
#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#35)
#38Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#33)