Zero-padding and zero-masking fixes for to_char(float)

Started by Bruce Momjianabout 11 years ago23 messageshackers

bruce@momjian.us

about 11 years ago

In September, while researching the to_char() buffer overflow bugs fixed
in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
inconsistency in how to_char() does zero-padding for float4/8 values.
Now that 9.4.1 is released and I am home for a while, I am ready to
address this.

For example, to_char(int4) properly pads with trailing zeros, e.g.

SELECT to_char(int4 '1999999999', '9999999999999999D' || repeat('9', 1000));
------
1999999999.000000000000000000000000000000...

Numeric does the same thing:

SELECT to_char(numeric '99999999999', '9999999999999999D' || repeat('9', 1000));
------
99999999999.00000000000000000000000000000...

However, float4/8 do not supply the requested zero padding:

SELECT to_char(float4 '99999999999', '9999999999999999D' || repeat('9', 1000));
------
99999997952

and

SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));
------
99999999999.0000

float4/8 are padding to the internal precision, while int4/numeric are
padding based on the requested precision. This is inconsistent.

The first attached patch fixes this, and also zeros the "junk" digits
which exceed the precision of the underlying type:

SELECT to_char(float4 '99999999999', '9999999999999999D' || repeat('9', 1000));
------
99999900000.00000000000000000000000000000...

SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));
------
99999999999.0000000000000000000000000000....

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
------
1.1234567891234500000000000000000000000

which is pretty close.

The second patch adds regression tests for these.

I would like to apply this for 9.5 while I remember what I was doing,
but I guess now that I have written this email, I will be able to keep
it for 9.6 if people prefer.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Bruce Momjian (#1)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

In September, while researching the to_char() buffer overflow bugs fixed
in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
inconsistency in how to_char() does zero-padding for float4/8 values.
Now that 9.4.1 is released and I am home for a while, I am ready to
address this.

...

float4/8 are padding to the internal precision, while int4/numeric are
padding based on the requested precision. This is inconsistent.

The first attached patch fixes this, and also zeros the "junk" digits
which exceed the precision of the underlying type:

Patch applied.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David Rowley

dgrowleyml@gmail.com

about 11 years ago

In reply to: Bruce Momjian (#2)

Re: Zero-padding and zero-masking fixes for to_char(float)

On 22 March 2015 at 14:46, Bruce Momjian <bruce@momjian.us> wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

In September, while researching the to_char() buffer overflow bugs fixed
in 9.4.1 (commit 0150ab567bcf5e5913e2b62a1678f84cc272441f), I found an
inconsistency in how to_char() does zero-padding for float4/8 values.
Now that 9.4.1 is released and I am home for a while, I am ready to
address this.

...

float4/8 are padding to the internal precision, while int4/numeric are
padding based on the requested precision. This is inconsistent.

The first attached patch fixes this, and also zeros the "junk" digits
which exceed the precision of the underlying type:

Patch applied.

This seems to have broken jacana. Looks like MSVC by default has a 3 digit
exponent.

Going by this:
https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
that it can quite easily be set back to 2.

I've attached a patch which seems to fix the issue.

Regards

David Rowley

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: David Rowley (#3)

Re: Zero-padding and zero-masking fixes for to_char(float)

David Rowley <dgrowleyml@gmail.com> writes:

This seems to have broken jacana. Looks like MSVC by default has a 3 digit
exponent.

jacana was broken before this patch; but some other Windows critters
are now unhappy as well.

Going by this:
https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
that it can quite easily be set back to 2.

I've attached a patch which seems to fix the issue.

That seems likely to have side-effects far beyond what's appropriate.
We have gone out of our way to accommodate 3-digit exponents in other
tests. What I want to know is why this patch created a 3-digit output
where there was none before.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Tom Lane (#4)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Sun, Mar 22, 2015 at 01:42:56AM -0400, Tom Lane wrote:

David Rowley <dgrowleyml@gmail.com> writes:

This seems to have broken jacana. Looks like MSVC by default has a 3 digit
exponent.

jacana was broken before this patch; but some other Windows critters
are now unhappy as well.

Going by this:
https://msdn.microsoft.com/en-us/library/0fatw238(v=vs.80).aspx it seems
that it can quite easily be set back to 2.

I've attached a patch which seems to fix the issue.

That seems likely to have side-effects far beyond what's appropriate.
We have gone out of our way to accommodate 3-digit exponents in other
tests. What I want to know is why this patch created a 3-digit output
where there was none before.

I was wondering the same thing too, but when I grep'ed the regression
output files looking for exponents, I found float4-exp-three-digits.out
and int8-exp-three-digits.out. I think this means we have had this
issue before, and we are going to have to either create a
numeric-exp-three-digits.out alternate output file, move the test to one
of the existing exp-three-digits files, or remove the tests. Sorry, I
didn't expect any problems with this patch as it was so small and
localized.

What has me more concerned is the Solaris 10 failure. This query:

SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));

expects this:

99999999999.00000000000...

but on Solaris 10 gets this:

.000000000000000000

Yes, the nines are gone, and only this query is failing. Oddly, this
query did not fail, though the only difference is fewer decimal digits:

SELECT to_char(float8 '99999999999', '9999999999999999D99999999');

This smells like a libc bug, e.g. OmniOS 5.11 passed the test.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Noah Misch

noah@leadboat.com

about 11 years ago

In reply to: Bruce Momjian (#5)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote:

What has me more concerned is the Solaris 10 failure. This query:

SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));

expects this:

99999999999.00000000000...

but on Solaris 10 gets this:

.000000000000000000

Yes, the nines are gone, and only this query is failing. Oddly, this
query did not fail, though the only difference is fewer decimal digits:

SELECT to_char(float8 '99999999999', '9999999999999999D99999999');

This smells like a libc bug, e.g. OmniOS 5.11 passed the test.

Use of the "f" conversion specifier with precision greater than 512 is not
portable; I get a similar diff on AIX 7. Until this patch, PostgreSQL would
not use arbitrarily-large precisions on that conversion specifier. (Who would
guess, but the "e" conversion specifier is not affected.)

I recommend adding a "configure" test to use our snprintf.c replacements if
sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Noah Misch

noah@leadboat.com

about 11 years ago

In reply to: Bruce Momjian (#1)

Re: Zero-padding and zero-masking fixes for to_char(float)

When you posted this, I made a note to review it.

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
------
1.1234567891234500000000000000000000000

which is pretty close.

PostgreSQL 9.4 returns "1.12345678912346". Your patch truncates digits past
DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG
digits. PostgreSQL must continue to round.

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant. Should we match Oracle in this respect while
we're breaking compatibility anyway? I tend to think yes.

*************** int4_to_char(PG_FUNCTION_ARGS)
*** 5214,5221 ****
/* we can do it easily because float8 won't lose any precision */
float8 val = (float8) value;

! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
! snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val);
/*
* Swap a leading positive sign for a space.
--- 5207,5213 ----
/* we can do it easily because float8 won't lose any precision */
float8		val = (float8) value;
! orgnum = psprintf("%+.*e", Num.post, val);

Neither the submission notes nor the commit messages mentioned it, but this
improves behavior for to_char(int4, text). Specifically, it helps EEEE
formats with more than about 500 decimal digits:

SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE'));

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Noah Misch (#6)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:

On Sun, Mar 22, 2015 at 11:22:26AM -0400, Bruce Momjian wrote:

What has me more concerned is the Solaris 10 failure. This query:

SELECT to_char(float8 '99999999999', '9999999999999999D' || repeat('9', 1000));

expects this:

99999999999.00000000000...

but on Solaris 10 gets this:

.000000000000000000

Yes, the nines are gone, and only this query is failing. Oddly, this
query did not fail, though the only difference is fewer decimal digits:

SELECT to_char(float8 '99999999999', '9999999999999999D99999999');

This smells like a libc bug, e.g. OmniOS 5.11 passed the test.

Use of the "f" conversion specifier with precision greater than 512 is not
portable; I get a similar diff on AIX 7. Until this patch, PostgreSQL would
not use arbitrarily-large precisions on that conversion specifier. (Who would
guess, but the "e" conversion specifier is not affected.)

Yeah.

I recommend adding a "configure" test to use our snprintf.c replacements if
sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

Do we really want to go to our /port snprintf just to handle 512+
digits?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Noah Misch (#7)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:

When you posted this, I made a note to review it.

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
------
1.1234567891234500000000000000000000000

which is pretty close.

PostgreSQL 9.4 returns "1.12345678912346". Your patch truncates digits past
DBL_DIG, whereas both Oracle and PostgreSQL 9.4 round to the nearest DBL_DIG
digits. PostgreSQL must continue to round.

Ah, that rounding is a big problem. I can't just round because if the
digit to be rounded up is '9', I have to set that to zero and increment
the previous digit, and that could cascade and shift the entire string
one over. I think I have to go back to the original code that does the
weird computations to figure out how many digits are on the left of the
decimal point, then set the format string after. I was hoping my patch
could clean that all up, but we now need snprintf to do that rounding
for us. :-(

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant. Should we match Oracle in this respect while
we're breaking compatibility anyway? I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.

*************** int4_to_char(PG_FUNCTION_ARGS)
*** 5214,5221 ****
/* we can do it easily because float8 won't lose any precision */
float8 val = (float8) value;

! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
! snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val);
/*
* Swap a leading positive sign for a space.
--- 5207,5213 ----
/* we can do it easily because float8 won't lose any precision */
float8		val = (float8) value;
! orgnum = psprintf("%+.*e", Num.post, val);
Neither the submission notes nor the commit messages mentioned it, but this
improves behavior for to_char(int4, text). Specifically, it helps EEEE
formats with more than about 500 decimal digits:

SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE'));

Wow, that is surprising.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Bruce Momjian (#9)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:

*************** int4_to_char(PG_FUNCTION_ARGS)
*** 5214,5221 ****
/* we can do it easily because float8 won't lose any precision */
float8 val = (float8) value;

! orgnum = (char *) palloc(MAXDOUBLEWIDTH + 1);
! snprintf(orgnum, MAXDOUBLEWIDTH + 1, "%+.*e", Num.post, val);
/*
* Swap a leading positive sign for a space.
--- 5207,5213 ----
/* we can do it easily because float8 won't lose any precision */
float8		val = (float8) value;
! orgnum = psprintf("%+.*e", Num.post, val);
Neither the submission notes nor the commit messages mentioned it, but this
improves behavior for to_char(int4, text). Specifically, it helps EEEE
formats with more than about 500 decimal digits:

SELECT length(to_char(1, '9D' || repeat('9', 800) || 'EEEE'));
Wow, that is surprising.

Ah, yes. There was a problem where were clipping off int4 EEEE output
to MAXDOUBLEWIDTH, which made no sense, so there was the fix for that as
well.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Tom Lane

tgl@sss.pgh.pa.us

about 11 years ago

In reply to: Bruce Momjian (#8)

Re: Zero-padding and zero-masking fixes for to_char(float)

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:

I recommend adding a "configure" test to use our snprintf.c replacements if
sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

Do we really want to go to our /port snprintf just to handle 512+
digits?

I'd rather not go that direction (that is, to using a configure test).
It assumes that all copies of libc on a particular platform behave the
same, which seems like a bad bet to me. I think we'd be better off to
avoid asking libc to do anything that might not work everywhere.

On the other hand, this line of thought might end up having you
reimplement in formatting.c the same logic I put into snprintf.c
recently, which seems a bit silly.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Noah Misch

noah@leadboat.com

about 11 years ago

In reply to: Tom Lane (#11)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:

I recommend adding a "configure" test to use our snprintf.c replacements if
sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

Do we really want to go to our /port snprintf just to handle 512+
digits?

I'd rather not go that direction (that is, to using a configure test).
It assumes that all copies of libc on a particular platform behave the
same, which seems like a bad bet to me. I think we'd be better off to
avoid asking libc to do anything that might not work everywhere.

On the other hand, this line of thought might end up having you
reimplement in formatting.c the same logic I put into snprintf.c
recently, which seems a bit silly.

We already assume that a positive PGAC_FUNC_SNPRINTF_LONG_LONG_INT_MODIFIER,
PGAC_FUNC_SNPRINTF_ARG_CONTROL and PGAC_FUNC_SNPRINTF_SIZE_T_SUPPORT will hold
forever. While one can construct counterexamples, I don't recall much push to
change that. This %1000f problem evades casual testing and will fade from
hacker memory. We're liable to introduce similar bugs if everyone hacking
high level code like src/backend/utils/adt needs to remember it.

A potential compromise is to verify the configure-detected behavior during
startup_hacks(). The DBA will see clearly when the runtime libraries exhibit
a bug absent at build time.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Tom Lane (#11)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Mon, Mar 23, 2015 at 12:36:25AM -0400, Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

On Sun, Mar 22, 2015 at 12:46:08PM -0400, Noah Misch wrote:

I recommend adding a "configure" test to use our snprintf.c replacements if
sprintf("%.*f", 65536, 99999999999.0) gives unexpected output.

Do we really want to go to our /port snprintf just to handle 512+
digits?

I'd rather not go that direction (that is, to using a configure test).
It assumes that all copies of libc on a particular platform behave the
same, which seems like a bad bet to me. I think we'd be better off to
avoid asking libc to do anything that might not work everywhere.

On the other hand, this line of thought might end up having you
reimplement in formatting.c the same logic I put into snprintf.c
recently, which seems a bit silly.

If we can't trust libc for 512+ precision (and I don't think a startup
check is warranted), I think we should either use our internal snprintf
for to_char() in all cases, or in cases where the precision is 512+. Of
course, this hinges on the assumption that only to_char() cares about
512+ digits --- if not, a startup check seems a requirement.

However, even if we have a working snprintf, the bigger problem is
having to do rounding, e.g. this is the worst case:

SELECT to_char(float4 '5.5555555555555555555555555555', '9D' || repeat('9', 1000) || 'EEEE');

What we have to do here is to round to the specified precision, i.e. we
can't just pad with zeros, which was my original approach. I am afraid
that means we have to keep the odd coding where we call snprintf with
zero decimal digits, get its length, then figure out the decimal
precision, then call snprintf again to format the string with rounding.
Then, we need to separate the mantissa from the exponent and add the
desired number of zeros and copy it into a new longer string.

So, instead of the code being cleaner, it will be even more messy, and
will duplicate some of what we already do in our port/snprintf.c.
Unfortunately, there is no way to call snprintf and tell it we want a
specific number of decimal digits _and_ pad the rest with zeros. I
think trying to do the rounding on the output string will never work
well, e.g. 9.9999 rounded to one decimal digit is 10.0.

Yuck.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Noah Misch

noah@leadboat.com

about 11 years ago

In reply to: Bruce Momjian (#9)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
------
1.1234567891234500000000000000000000000

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant. Should we match Oracle in this respect while
we're breaking compatibility anyway? I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.

http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
float8 and 9 digits for float4.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Noah Misch (#14)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
------
1.1234567891234500000000000000000000000

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant. Should we match Oracle in this respect while
we're breaking compatibility anyway? I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.

http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
float8 and 9 digits for float4.

OK, I am fine in using those values if you can find them as compiler
defines, but I don't see how we can grab those values from a user test
on Oracle.

There are some "invisible" float digits that don't appear in %f but can
be shown if desired --- I think we used to do that in the regression
tests, but found they added too much platform-specific randomness. Do
we want to go in that direction?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Jeff Anton

antonpgsql@hesiod.org

about 11 years ago

In reply to: Noah Misch (#14)

Re: Zero-padding and zero-masking fixes for to_char(float)

The issue of significant (decimal) digits to and from floating point
representation is a complex one.

What 'significant' means may depend upon the intent.

There are (at least) two different tests which may need to be used.

* How many digits can be stored and then accurately returned?
or
* How many decimal digits are needed to recreate a floating point value?
Or in longer form, if you have a floating point value, you may want to
print it in decimal form and then later scan that to recreate the exact
bit pattern from the original floating point value. How many decimal
digits do you need?

The first question produces a smaller number of digits then the second one!

The idea of zero padding is, IMO, a bad idea all together. It makes
people feel better, but it adds inaccuracy. I've lost this
interpretation so many times now that I only mention it for the real
number geeks out there.

Postgresql seems to be using the first interpretation and reporting
fewer digits. I've noticed this with pg_dump. That a dump and restore
of floating point values does not produce the same floating point
values. To me, that is inexcusable. Using the -Fc format, real values
are preserved. I have a large database of security prices. I want
accuracy above all.

I do not have time right now to produce the needed evidence for all
these cases of floating point values. If there is interest I can
produce this in a day or so.

Jeff Anton

BTW: This is my first posting to this list. I should introduce myself.
I'm Jeff Anton. I was the first Postgres project lead programmer
working for Michael Stonebraker at U.C. Berkeley a very long time ago.
The first version was never released. I've since worked for several db
companies.

On 03/24/15 06:47, Noah Misch wrote:

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
------
1.1234567891234500000000000000000000000

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant. Should we match Oracle in this respect while
we're breaking compatibility anyway? I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.

http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
float8 and 9 digits for float4.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Andrew Gierth

andrew@tao11.riddles.org.uk

about 11 years ago

In reply to: Jeff Anton (#16)

Re: Zero-padding and zero-masking fixes for to_char(float)

"Jeff" == Jeff Anton <antonpgsql@hesiod.org> writes:

Jeff> Postgresql seems to be using the first interpretation and
Jeff> reporting fewer digits. I've noticed this with pg_dump. That a
Jeff> dump and restore of floating point values does not produce the
Jeff> same floating point values. To me, that is inexcusable. Using
Jeff> the -Fc format, real values are preserved. I have a large
Jeff> database of security prices. I want accuracy above all.

-Fc doesn't do anything different with floats than text dumps do.

pg_dump (in all modes) uses the extra_float_digits setting to try and
get a value that restores exactly. There have been issues with this in
the past (the limit of extra_float_digits had to be raised from 2 to 3,
iirc), but if it's happening now then that should probably be considered
a bug.

--
Andrew (irc:RhodiumToad)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Bruce Momjian (#15)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:

On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d, '9.9999999999999999999999999999999999999') as x from dual;
------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789', '9.9999999999999999999999999999999999999');
------
1.1234567891234500000000000000000000000

These outputs show Oracle treating 17 digits as significant while PostgreSQL
treats 15 digits as significant. Should we match Oracle in this respect while
we're breaking compatibility anyway? I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I don't
know what they are using internally.

http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant digits for
float8 and 9 digits for float4.

OK, I am fine in using those values if you can find them as compiler
defines, but I don't see how we can grab those values from a user test
on Oracle.

There are some "invisible" float digits that don't appear in %f but can
be shown if desired --- I think we used to do that in the regression
tests, but found they added too much platform-specific randomness. Do
we want to go in that direction?

How about if we have to_char() honor our extra_float_digits GUC, so
users who want those digits can get them?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Baker, Keith [OCDUS Non-J&J]

KBaker9@its.jnj.com

about 11 years ago

In reply to: Bruce Momjian (#18)

Re: Zero-padding and zero-masking fixes for to_char(float)

I like the idea of using extra_float_digits (which I always set this to 3 to ensure I don't lose precision).
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#1251
Theorem 15
"When a binary IEEE single precision number is converted to the closest eight digit decimal number, it is not always possible to uniquely recover the binary number from the decimal one. However, if nine decimal digits are used, then converting the decimal number to the closest binary number will recover the original floating-point number."

Keith Baker

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
owner@postgresql.org] On Behalf Of Bruce Momjian
Sent: Tuesday, March 24, 2015 3:08 PM
To: Noah Misch
Cc: PostgreSQL-development
Subject: Re: [HACKERS] Zero-padding and zero-masking fixes for
to_char(float)

On Tue, Mar 24, 2015 at 10:05:12AM -0400, Bruce Momjian wrote:

On Tue, Mar 24, 2015 at 09:47:56AM -0400, Noah Misch wrote:

On Sun, Mar 22, 2015 at 10:53:12PM -0400, Bruce Momjian wrote:

On Sun, Mar 22, 2015 at 04:41:19PM -0400, Noah Misch wrote:

On Wed, Mar 18, 2015 at 05:52:44PM -0400, Bruce Momjian wrote:

This "junk" digit zeroing matches the Oracle behavior:

SELECT to_char(1.123456789123456789123456789d,

'9.9999999999999999999999999999999999999') as x from dual;

------
1.1234567891234568000000000000000000000

Our output with the patch would be:

SELECT to_char(float8 '1.123456789123456789123456789',

'9.9999999999999999999999999999999999999');

------
1.1234567891234500000000000000000000000

These outputs show Oracle treating 17 digits as significant
while PostgreSQL treats 15 digits as significant. Should we
match Oracle in this respect while we're breaking compatibility

anyway? I tend to think yes.

Uh, I am hesistant to adjust our precision to match Oracle as I
don't know what they are using internally.

http://sqlfiddle.com/#!4/8b4cf/5 strongly implies 17 significant
digits for
float8 and 9 digits for float4.

OK, I am fine in using those values if you can find them as compiler
defines, but I don't see how we can grab those values from a user test
on Oracle.

There are some "invisible" float digits that don't appear in %f but
can be shown if desired --- I think we used to do that in the
regression tests, but found they added too much platform-specific
randomness. Do we want to go in that direction?

How about if we have to_char() honor our extra_float_digits GUC, so users
who want those digits can get them?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Noah Misch

noah@leadboat.com

about 11 years ago

In reply to: Jeff Anton (#16)

Re: Zero-padding and zero-masking fixes for to_char(float)

On Tue, Mar 24, 2015 at 09:26:10AM -0700, Jeff Anton wrote:

The issue of significant (decimal) digits to and from floating point
representation is a complex one.

What 'significant' means may depend upon the intent.

True. I meant simply that Oracle TO_CHAR emits no more than 17 nonzero digits
for a BINARY_DOUBLE and no more than 9 nonzero digits for a BINARY_FLOAT.

BTW: This is my first posting to this list. I should introduce myself.
I'm Jeff Anton. I was the first Postgres project lead programmer working
for Michael Stonebraker at U.C. Berkeley a very long time ago.
The first version was never released. I've since worked for several db
companies.

Welcome back.

Thanks,
nm

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Noah Misch

noah@leadboat.com

about 11 years ago

In reply to: Bruce Momjian (#18)

#22

Bruce Momjian

bruce@momjian.us

about 11 years ago

In reply to: Noah Misch (#14)

#23

Bruce Momjian