floating point representation

Started by Hiroshi Inouealmost 25 years ago38 messages
#1Hiroshi Inoue
Inoue@tpf.co.jp

Hi all,

I have a question of PostgreSQL's floating point
representation.

create table t (fl1 float4, fl2 float4, fl3 float4);
insert into t values (1.234567, 1.23457, 1.23457);
select * from t;
fl1 | fl2 | fl3
---------+---------+---------
1.23457 | 1.23457 | 1.23457
(1 row)

select * from t where fl1=fl2;
fl1 | fl2 | fl3
-----+-----+-----
(0 rows)

select * from t where t where fl2=fl3;
fl1 | fl2 | fl3
---------+---------+---------
1.23457 | 1.23457 | 1.23457
(1 row)

OK, fl1 != fl2 and fl2 == fl3 but

copy t to stdout;
1.23457 1.23457 1.23457

The output of pg_dump is same. Then
after restoring from the pg_dump
output, we would get a tuple such
that fl1==fl2==fl3.

Is it reasonable ?

In addtion this makes a client library like ODBC
driver very unhappy with the handlig of floating
point data. For example, once a floating point
data like fl1(1.234567) was stored, MS-Access
couldn't update the tuple any more.

Is there a way to change the precision of floating
point representation from clients ?

Regards,
Hiroshi Inoue

#2Peter Eisentraut
peter_e@gmx.net
In reply to: Hiroshi Inoue (#1)
Re: floating point representation

Hiroshi Inoue writes:

Is there a way to change the precision of floating
point representation from clients ?

Not currently, but I image it couldn't be too hard to introduce a
parameter that changes the format string used by float*out to something
else.

The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#2)
Re: floating point representation

Peter Eisentraut <peter_e@gmx.net> writes:

The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.

Here's what I find in the C99 draft:

a,A A double argument representing a (finite) floating-
point number is converted in the style
[-]0xh.hhhhp�d, where there is one hexadecimal digit
^ � == "+/-" ... tgl
(which is nonzero if the argument is a normalized
floating-point number and is otherwise unspecified)
before the decimal-point character (219) and the
number of hexadecimal digits after it is equal to
the precision; if the precision is missing and
FLT_RADIX is a power of 2, then the precision is
sufficient for an exact representation of the value;
if the precision is missing and FLT_RADIX is not a
power of 2, then the precision is sufficient to
distinguish (220) values of type double, except that
trailing zeros may be omitted; if the precision is
zero and the # flag is not specified, no decimal-
point character appears. The letters abcdef are
used for a conversion and the letters ABCDEF for A
conversion. The A conversion specifier produces a
number with X and P instead of x and p. The
exponent always contains at least one digit, and
only as many more digits as necessary to represent
the decimal exponent of 2. If the value is zero,
the exponent is zero.

A double argument representing an infinity or NaN is
converted in the style of an f or F conversion
specifier.

____________________

219Binary implementations can choose the hexadecimal digit
to the left of the decimal-point character so that
subsequent digits align to nibble (4-bit) boundaries.

220The precision p is sufficient to distinguish values of
the source type if 16p-1>bn where b is FLT_RADIX and n is
the number of base-b digits in the significand of the
source type. A smaller p might suffice depending on the
implementation's scheme for determining the digit to the
left of the decimal-point character.

7.19.6.1 Library 7.19.6.1

314 Committee Draft -- August 3, 1998 WG14/N843

So, it looks like C99-compliant libc implementations will have this,
but I'd hesitate to rely on it for pg_dump purposes; it would certainly
not be very portable for awhile yet.

Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?

regards, tom lane

#4Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Tom Lane (#3)
RE: floating point representation

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Peter Eisentraut <peter_e@gmx.net> writes:

The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable. (I
image this was done because of C99, but I'm speculating.) It might be
useful to offer this to preserve accurate data across dumps.

[snip]

So, it looks like C99-compliant libc implementations will have this,
but I'd hesitate to rely on it for pg_dump purposes; it would certainly
not be very portable for awhile yet.

Agreed.

Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?

Currently the precision of float display format is FLT_DIG(DBL_DIG).
It's not sufficent to distinguish float values. As Peter already suggested,
the quickest solution would be to change XXX_DIG constants to variables
and provide a routine to SET the variables. Strictly speaking the precision
needed to distigush float values seems OS-dependent. It seems preferable
to have a symbol to specify the precision.

Regards,
Hiroshi Inoue

#5Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Hiroshi Inoue (#4)
Re: floating point representation

I wrote:

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Peter Eisentraut <peter_e@gmx.net> writes:

[snip]

Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?

Currently the precision of float display format is FLT_DIG(DBL_DIG).
It's not sufficent to distinguish float values. As Peter already suggested,
the quickest solution would be to change XXX_DIG constants to variables
and provide a routine to SET the variables. Strictly speaking the precision
needed to distigush float values seems OS-dependent. It seems preferable
to have a symbol to specify the precision.

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

Regards,
Hiroshi Inoue

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#5)
Re: floating point representation

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use. The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using). This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.

regards, tom lane

#7Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Hiroshi Inoue (#4)
Re: floating point representation

Tom Lane wrote:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use.

Agreed.

The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).

Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ?

This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.

Regards,
Hiroshi Inoue

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hiroshi Inoue (#7)
Re: floating point representation

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Tom Lane wrote:

The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).

Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ?

I didn't check my numbers. If the current behavior is '%.6g','%.15g'
then we should stay with that as the default.

Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings. Do we support any platforms where these are not 6 & 15?

regards, tom lane

#9Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#6)
Re: floating point representation

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

This seems a small enough change that I do not fear fixing it at this
late date. However, I do not like the idea of making the SET variables
be just number of digits precision. As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use. The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using). This way, someone could select the C99
%a format if he knew that his libc supported it. Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.

Added to TODO:

* Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#10Peter Eisentraut
peter_e@gmx.net
In reply to: Hiroshi Inoue (#5)
Re: floating point representation

Hiroshi Inoue writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

I'd prefer names that go with the SQL type names:

REAL_FORMAT
DOUBLE_PRECISION_FORMAT

Seems a bit tacky, but a lot of work has been put in to make these names
more prominent.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#11Peter Eisentraut
peter_e@gmx.net
In reply to: Hiroshi Inoue (#5)
Re: floating point representation

Hiroshi Inoue writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

And what about the geometry types that are based on floats?

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#9)
Re: floating point representation

Bruce Momjian <pgman@candle.pha.pa.us> writes:

Added to TODO:
* Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args

foo_PRECISION is not the right name if these variables will contain
printf format strings. Perhaps foo_FORMAT? Anyone have a better idea?

After further thought I think that we ought to standardize on %.6g and
%.15g even if the local <float.h> offers slightly different values of
FLT_DIG and DBL_DIG. IEEE or near-IEEE float math is so close to
universal that I don't think it's worth worrying about the possibility
that different precisions would be more appropriate for some platforms.
Furthermore, having cross-platform consistency of display format seems
more useful than not.

Something else we should perhaps think about, though we are very late
in beta: once these variables exist, we could have the geometry regress
test set them to suppress a couple of digits, and eliminate most if not
all of the need for platform-specific geometry results. Doing this
would be a no-brainer at any other time in the development cycle, but
right now I am worried about whether we'd be able to reconfirm regress
results on all the currently-supported platforms before release.
Comments?

regards, tom lane

#13Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: floating point representation

Peter Eisentraut <peter_e@gmx.net> writes:

Or must we postpone to fix it ?

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Hmm... that wouldn't matter, except for this %a format. Maybe we'd
better not try to make this happen in the waning days of the 7.1 cycle.

And what about the geometry types that are based on floats?

They should track the float8 format, certainly.

regards, tom lane

#14Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Peter Eisentraut (#10)
Re: floating point representation

Hiroshi Inoue writes:

The 7.1-release seems near.
May I provide the followings ?
SET FLOAT4_PRECISION TO ..
SET FLOAT8_PRECISION TO ..

I'd prefer names that go with the SQL type names:

REAL_FORMAT
DOUBLE_PRECISION_FORMAT

Seems a bit tacky, but a lot of work has been put in to make these names
more prominent.

TODO updated:

* Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT using printf args

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#11)
Re: floating point representation

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

However, it's becoming quite clear to me that this feature needs more
thought than first appeared. Accordingly, I now vote that we not try
to fit it into 7.1, but do it in a more considered fashion for 7.2.

regards, tom lane

#16Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Peter Eisentraut (#11)
Re: floating point representation

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

From the first I don't want to change the current default
output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

However, it's becoming quite clear to me that this feature needs more
thought than first appeared. Accordingly, I now vote that we not try
to fit it into 7.1, but do it in a more considered fashion for 7.2.

The simplest way to fix it quickly would be to not provide
XXXX_IN_FORMAT and restrict XXXX_FORMAT to "%.*g" at present.

Regards,
Hiroshi Inoue

#17Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hiroshi Inoue (#16)
Re: floating point representationu

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

From the first I don't want to change the current default

output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

But we run regress with the proper setting, right? How does giving
people the ability to change the defaults affect the regression tests?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#18Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#17)
Re: floating point representationu

Bruce Momjian wrote:

Tom Lane wrote:

Peter Eisentraut <peter_e@gmx.net> writes:

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4. The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

From the first I don't want to change the current default

output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

But we run regress with the proper setting, right?> How does giving
people the ability to change the defaults affect the regression tests?

Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).

Regards,
Hiroshi Inoue

#19Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hiroshi Inoue (#18)
Re: floating point representationu

From the first I don't want to change the current default

output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

But we run regress with the proper setting, right?> How does giving
people the ability to change the defaults affect the regression tests?

Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).

But are they going to change the default to run the regression tests?
How do they change it? in ~/.psqlrc?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#20Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Hiroshi Inoue (#4)
Re: floating point representation

Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings. Do we support any platforms where these are not 6 & 15?

In principle, yes. VAX does not use IEEE math (by default anyway) and
has less range and more precision. Most machines nowadays use the IEEE
definitions, but having at least one counterexample will help keep us
honest ;)

- Thomas

#21Hiroshi Inoue
Inoue@tpf.co.jp
In reply to: Bruce Momjian (#19)
Re: floating point representationu

Bruce Momjian wrote:

From the first I don't want to change the current default

output format
"%." #FLT_DIG "g" (REAL)
"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

But we run regress with the proper setting, right?> How does giving
people the ability to change the defaults affect the regression tests?

Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).

But are they going to change the default to run the regression tests?
How do they change it? in ~/.psqlrc?

Probably there's a misunderstanding between you and I
but unfortunaltely I don't understand what it is in my
poor English.
Anyway in my plan(current format as default) there would
be no problem with regress test at least for 7.1.

Regards,
Hiroshi Inoue

#22Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Hiroshi Inoue (#21)
Re: floating point representationu

same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).

But are they going to change the default to run the regression tests?
How do they change it? in ~/.psqlrc?

Probably there's a misunderstanding between you and I
but unfortunaltely I don't understand what it is in my
poor English.
Anyway in my plan(current format as default) there would
be no problem with regress test at least for 7.1.

Oh, I see. I can't see any way we can make this change in 7.1. It has
to be done in 7.2. You are right, changing it at this late date would
be a regression disaster.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#19)
Re: floating point representationu

Bruce Momjian <pgman@candle.pha.pa.us> writes:

But are they going to change the default to run the regression tests?

You're barking up the wrong tree, Bruce. Hiroshi specifically said
that he does *not* want to change the default behavior.

regards, tom lane

#24Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Bruce Momjian (#9)
Re: floating point representation

...right now I am worried about whether we'd be able to reconfirm regress
results on all the currently-supported platforms before release.

This would be an excellent topic for a full development cycle ;)

- Thomas

#25Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#23)
Re: floating point representationu

Bruce Momjian <pgman@candle.pha.pa.us> writes:

But are they going to change the default to run the regression tests?

You're barking up the wrong tree, Bruce. Hiroshi specifically said
that he does *not* want to change the default behavior.

OK, I am confused. Can someone straighten me out?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#26Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Thomas Lockhart (#24)
Re: floating point representation

...right now I am worried about whether we'd be able to reconfirm regress
results on all the currently-supported platforms before release.

This would be an excellent topic for a full development cycle ;)

Oh, I see. Never mind. I was lost.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#27Pete Forman
pete.forman@westerngeco.com
In reply to: Tom Lane (#8)
Re: floating point representation

Tom Lane writes:

Hiroshi Inoue <Inoue@tpf.co.jp> writes:

Tom Lane wrote:

The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).

Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ?

I didn't check my numbers. If the current behavior is '%.6g','%.15g'
then we should stay with that as the default.

Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings. Do we support any platforms where these are not 6 & 15?

Please remind me what we are trying to do. 6 & 15 are values to
suppress trailing digits at the end of a number in a standard printf.
For example, 0.1 prints as 0.10000000000000001 at %.17g but as 0.1 at
%.16g. However those shorter formats are less precise. There are
several other doubles that will also print the same result. A round
trip of printf/scanf will not generally preserve the number.

Printing for display purposes may not be adequate for dumping with a
view to restoring. Are we talking about display or dump?

The ideal is to print just enough digits to be able to read the number
back. There should be no redundant digits at the end. Printf is
unable to do this by itself. The reason is that the correct number of
decimal digits for a %.*g is a function of the number being printed.

There are algorithms to do the right thing but they can be expensive.
I play with some in a program at the URI below. There is a minor typo
in the usage and a missing (optional) file. I'll correct those when
the site allows uploads again. The files' contents are currently
available at http://petef.8k.com/.
--
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco -./\.- by myself and does not represent
pete.forman@westerngeco.com -./\.- opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef -./\.- Hughes or their divisions.

#28Tom Lane
tgl@sss.pgh.pa.us
In reply to: Pete Forman (#27)
Re: floating point representation

Pete Forman <pete.forman@westerngeco.com> writes:

Please remind me what we are trying to do.

The real point is that we need to serve several different purposes
that aren't necessarily fully compatible.

The existing default of FLT_DIG or DBL_DIG digits seems like a good
general-purpose policy, but it doesn't meet all needs. For pg_dump,
we clearly would like to promise exact dump and restore. On the
other side, the geometry regress tests would like to suppress a few
of the noisier low-order digits. And we frequently see questions from
users about how they can display fewer digits than the system wants to
give them --- or, more generally, format the output in some special
form.

I think the idea of making a user-settable format string is a good one.
I'm just afraid of the idea of trying to shoehorn in a solution at the
last minute; if we do, we may find it's not quite right and then have
a backwards-compatibility problem with fixing it. Besides, we are in
"no new features" mode during beta. I think it should wait for 7.2.

regards, tom lane

#29Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#28)
Re: floating point representation

I think the idea of making a user-settable format string is a good one.
I'm just afraid of the idea of trying to shoehorn in a solution at the
last minute; if we do, we may find it's not quite right and then have
a backwards-compatibility problem with fixing it. Besides, we are in
"no new features" mode during beta. I think it should wait for 7.2.

Agreed. I have the items on the TODO list.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#29)
Re: floating point representation

Peter Eisentraut <peter_e@gmx.net> writes:

Tom Lane writes:

And we frequently see questions from users about how they can display
fewer digits than the system wants to give them --- or, more
generally, format the output in some special form.

to_char() should serve those people.

Only if they're willing to go through and change all their queries.
The geometry regress tests seem a good counterexample: one SET at the
top versus a lot of rewriting.

regards, tom lane

#31Peter Eisentraut
peter_e@gmx.net
In reply to: Tom Lane (#28)
Re: floating point representation

Tom Lane writes:

And we frequently see questions from users about how they can display
fewer digits than the system wants to give them --- or, more
generally, format the output in some special form.

to_char() should serve those people.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/

#32Philip Warner
pjw@rhyme.com.au
In reply to: Tom Lane (#28)
Re: floating point representation

At 15:48 20/02/01 -0500, Tom Lane wrote:

The existing default of FLT_DIG or DBL_DIG digits seems like a good
general-purpose policy, but it doesn't meet all needs. For pg_dump,
we clearly would like to promise exact dump and restore. On the
other side, the geometry regress tests would like to suppress a few
of the noisier low-order digits. And we frequently see questions from
users about how they can display fewer digits than the system wants to
give them --- or, more generally, format the output in some special
form.

If I could add another 'nice-to-have' in here: the ability on a
per-attribute basis to specify the preferred output format. This could
apply to real, date, integer etc etc. Clearly not a 7.1 feature.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#33Philip Warner
pjw@rhyme.com.au
In reply to: Peter Eisentraut (#31)
Re: floating point representation

At 22:29 20/02/01 +0100, Peter Eisentraut wrote:

Tom Lane writes:

And we frequently see questions from users about how they can display
fewer digits than the system wants to give them --- or, more
generally, format the output in some special form.

to_char() should serve those people.

This is not a good solution if what you want (as a user) is consistency of
output no matter who retrieves the data; people should not have to wrap
every SELECT field in to_char to get the precision/format they want.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#34Robert Schrem
Robert.Schrem@WiredMinds.de
In reply to: Philip Warner (#33)
Re: floating point representation

I think a formating mode where only the relevant digits are written
to the output would be great as an alternative to the discussed
fixed formatting strings. In this context i think of 'relevant'
as in the following:

'Output as few characters as possible but ensure that scanf is
still able to rebuild the binary reprressentation of the floating
point number exactly.'

To make this happen we would need to compute a seperate formatting
string for each floating point value:

E.g. if the binary value is exactly '1.00000E00' then we just
write '1' to the output, because the rest is just 'ASCII noise'
and not neccessary for rebuilding the identical binary value for
the given floating point value.

The advantage would be, that we only generate as much ASCII data
as absolutly neccessary to rebuild the original data exactly.
At least this is what I would expect from pg_dump.

robert schrem

#35Philip Warner
pjw@rhyme.com.au
In reply to: Robert Schrem (#34)
Re: Re: floating point representation

At 10:19 21/02/01 +0100, Robert Schrem wrote:

The advantage would be, that we only generate as much ASCII data
as absolutly neccessary to rebuild the original data exactly.
At least this is what I would expect from pg_dump.

pg_dump is only one side of thre problem, but the simplest solution might
be to add an option to dump the hex mantissa, exponent & sign. This should
be low-cost and an exact representation of the machine version of the number.

The other issues, like what is sent to psql & via interfaces like odbc
(currently text) should be application/DBA based and setable on a
per-attribute basis. eg. some applications want 1.0000 because the data
came from a piece of hardware with a know error, and 1.0000 means 1.0000+/-
0.00005 etc. Maybe this is just an argument for a new 'number with error'
type...

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 0500 83 82 82 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

#36Robert Schrem
Robert.Schrem@WiredMinds.de
In reply to: Philip Warner (#35)
Re: Re: floating point representation

On Wed, 21 Feb 2001, you wrote:

At 10:19 21/02/01 +0100, Robert Schrem wrote:

The advantage would be, that we only generate as much ASCII data
as absolutly neccessary to rebuild the original data exactly.
At least this is what I would expect from pg_dump.

pg_dump is only one side of thre problem, but the simplest solution might
be to add an option to dump the hex mantissa, exponent & sign. This should
be low-cost and an exact representation of the machine version of the number.

The hex dumps should be done in a machine independant way - I think
that's what you ment when stating mantissa, exponent & sign seperatly,
right? I think this whould be a good solution...

The other issues, like what is sent to psql & via interfaces like odbc
(currently text) should be application/DBA based and setable on a
per-attribute basis.

You think of an additional tag in a CREATE TABLE instruction like

CREATE TABLE temperature (
id id,
messure_time timestamp default now() formatted as "hhmmmdd",
value float formatted as "%3.2f"
);

or maybe

CREATE TABLE temperature (
id id,
messure_time("hhmmmdd") timestamp default now(),
value("%3.2f") float
);

or is there something in SQL99 ?

eg. some applications want 1.0000 because the data
came from a piece of hardware with a know error, and 1.0000 means 1.0000+/-
0.00005 etc. Maybe this is just an argument for a new 'number with error'
type...

I think a float value in a database column has no known error
range and therefore we should not care about the 'physical' error
of a value in this context. Just think of a computed column in a
VIEW - how can we know for shure how percise such a result is if
we don't have any additional information about the messurement
errors of all operands (or constants) involved.

If you would introduce a new type - 'number with error' - this
would be totally different and a big contribution to the solution
of this. Then you can also handle errors like 1.0000+/-0.00002
percisely - witch you can't by only formatting 1.0000.

robert schrem

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Philip Warner (#35)
Re: Re: floating point representation

Philip Warner <pjw@rhyme.com.au> writes:

The other issues, like what is sent to psql & via interfaces like odbc
(currently text) should be application/DBA based and setable on a
per-attribute basis. eg. some applications want 1.0000 because the data
came from a piece of hardware with a know error, and 1.0000 means 1.0000+/-
0.00005 etc. Maybe this is just an argument for a new 'number with error'
type...

FWIW, there is a number-with-error type in contrib/seg ...

regards, tom lane

#38Peter Eisentraut
peter_e@gmx.net
In reply to: Philip Warner (#33)
Re: floating point representation

Philip Warner writes:

At 22:29 20/02/01 +0100, Peter Eisentraut wrote:

Tom Lane writes:

And we frequently see questions from users about how they can display
fewer digits than the system wants to give them --- or, more
generally, format the output in some special form.

to_char() should serve those people.

This is not a good solution if what you want (as a user) is consistency of
output no matter who retrieves the data; people should not have to wrap
every SELECT field in to_char to get the precision/format they want.

Views should serve those people.

--
Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/