Strange results when casting string to double

Started by Carsten Kleinabout 4 years ago24 messagesgeneral
Jump to latest
#1Carsten Klein
c.klein@datagis.com

Hi there,

I'm using several (now unsupported) PostgreSQL 9.3.24 servers on
different (ancient) Ubuntu 14.04 LTS machines. On only one of those
servers, I get strange/wrong results when converting a string into a
double value:

SELECT 1.56::double precision;

--> 1.55999999999999 (wrong!)

Although I do not find any differences in configuration, on all other
servers the result looks like this (correct?):

SELECT 1.56::double precision;

--> 1.56 (correct!)

AFAIK, this conversion is done by internal function float8in, which,
when called directly, yields the same results:

SELECT float8in('1.56');

--> 1.55999999999999 (wrong!) on one server, and
--> 1.56 (correct!) on all other servers.

Option extra_float_digits is zero (0) while doing all these tests. Also,
the problem seems to occur while converting text to double precision and
not when displaying the obtained double precision value. Why? The binary
representation of the double precision value is also different.

I've created a small to_bit function in Python to get the double
precision value's binary representation:

CREATE OR REPLACE FUNCTION to_bit(value double precision)
RETURNS bit AS
$BODY$
if 'fn.to_bit_d64' in SD:
return SD['fn.to_bit_d64'](value)

import struct
def to_bit_d64(value):
return ''.join('{:0>8b}'.format(c) for c in struct.pack('!d',
value))

SD['fn.to_bit_d64'] = to_bit_d64
return SD['fn.to_bit_d64'](value)
$BODY$
LANGUAGE plpython3u IMMUTABLE STRICT
COST 100;

The fraction (mantissa) of both values is different by 1:

value fraction
1.55999999999999 1000111101011100001010001111010111000010100011110101
1.56 1000111101011100001010001111010111000010100011110110

The fraction of the probably wrong value is one less than the fraction
of the correct value.

Formatting both values with 20 digits right of the decimal separator
(like printf("%.20f" ...) yields:

1.55999999999999983124 (wrong!)
1.56000000000000005329 (correct!)

Since even calling function float8in directly returns a probably wrong
result on one of the servers makes me believe, that there's no custom
cast in place being responsible for the wrong results.

Function float8in basically relies on C library function

double strtod(const char *str, char **endptr)

which I tested with a tiny C programm (calling strtod only and printing
the result with printf("%.20f", val);). The result is

1.56000000000000005329 (correct!)

on every server. So, seems like the C library function works as expected
on all servers.

Although I'm not a C expert, I don't find anything suspicious that
function float8in does with the value returned from strtod.

In version 9.3.24, file /src/backend/utils/adt/float.c looks a bit
different from the file in master branch. However, basically both
versions do much the same things. The old 9.3.24 version does some more
special error checks (#ifdef HAVE_BUGGY_IRIX_STRTOD, #ifdef
HAVE_BUGGY_SOLARIS_STRTOD and CHECKFLOATVAL), but these either throw
errors or set the converted value to return to a special value (if
indicated).

Has anyone an idea of what's going on here? I know, this version is far
from still being supported, however, there's not much real changes in
file float.c between these versions (in other words, this may happen
with recent versions as well?). The database instances on all servers
are configured quite the same (more or less). All run with the same
extensions installed; none is using any preloaded libraries (which may
replace C library function strtod?).

--
Carsten Klein
c(dot)klein(@)datagis(dot)com

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Carsten Klein (#1)
Re: Strange results when casting string to double

On 2/16/22 05:27, Carsten Klein wrote:

Hi there,

I'm using several (now unsupported) PostgreSQL 9.3.24 servers on
different (ancient) Ubuntu 14.04 LTS machines. On only one of those
servers, I get strange/wrong results when converting a string into a
double value:

Has anyone an idea of what's going on here? I know, this version is far
from still being supported, however, there's not much real changes in
file float.c between these versions (in other words, this may happen
with recent versions as well?). The database instances on all servers
are configured quite the same (more or less).

What is the more or less for the problem server?

Are the hardware(CPU) architectures the same for all the servers?

If not how does the problem server differ?

Was Postgres installed from the same source/same way on all the server?

All run with the same
extensions installed; none is using any preloaded libraries (which may
replace C library function strtod?).

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Carsten Klein
c.klein@datagis.com
In reply to: Adrian Klaver (#2)
Re: Strange results when casting string to double

On Wed, Feb 16, 2022 at 05:46 PM Adrian Klaver wrote

On 2/16/22 05:27, Carsten Klein wrote:

Hi there,

I'm using several (now unsupported) PostgreSQL 9.3.24 servers on
different (ancient) Ubuntu 14.04 LTS machines. On only one of those
servers, I get strange/wrong results when converting a string into a
double value:

Has anyone an idea of what's going on here? I know, this version is
far from still being supported, however, there's not much real changes
in file float.c between these versions (in other words, this may
happen with recent versions as well?). The database instances on all
servers are configured quite the same (more or less).

What is the more or less for the problem server?

What? Didn't get that...

Are the hardware(CPU) architectures the same for all the servers?

The problem server is a virtual machine (VMware). I've tested this on
two other servers and do receive probably correct results. One of these
is also a virtual machine (same VMware, running on different hardware),
the other one is a physical box with an AMD Athlon(tm) II X4 640 64-bit
processor.

Customer site:

Production System: VMware (ESX Host A) works as expected

Testing System: VMware (ESX Host B) DOES NOT WORK as expected

Our site:

Development System: AMD Athlon(tm) II works as expected

However, since the strtod function works correctly from my tiny C
program on all these machines, I don't believe that this problem has to
do with hardware or architecture. I guess, PostgreSQL dynamically links
to these C library functions, right?

If not how does the problem server differ?

I certainly have no idea. Actually, I don't see any more relevant
differences.

Was Postgres installed from the same source/same way on all the server?

PostgreSQL was installed from Ubuntu's official repositories (14.04 LTS)
on all servers.

--
Carsten Klein
c(dot)klein(@)datagis(dot)com

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: Carsten Klein (#3)
Re: Strange results when casting string to double

On Wed, Feb 16, 2022 at 11:38 AM Carsten Klein <c.klein@datagis.com> wrote:

On Wed, Feb 16, 2022 at 05:46 PM Adrian Klaver wrote

On 2/16/22 05:27, Carsten Klein wrote:

Hi there,

I'm using several (now unsupported) PostgreSQL 9.3.24 servers on
different (ancient) Ubuntu 14.04 LTS machines. On only one of those
servers, I get strange/wrong results when converting a string into a
double value:

Has anyone an idea of what's going on here? I know, this version is
far from still being supported, however, there's not much real changes
in file float.c between these versions (in other words, this may
happen with recent versions as well?). The database instances on all
servers are configured quite the same (more or less).

What is the more or less for the problem server?

What? Didn't get that...

You said they are more or less the same. Problems like these tend to hide
in the "less" portion of the inequality.

Testing System: VMware (ESX Host B) DOES NOT WORK as expected

Given that this isn't working as expected it doesn't make for a great
testing system. Install and initdb 14.2 on this machine and let's see what
PostgreSQL produces.

If not how does the problem server differ?

I certainly have no idea. Actually, I don't see any more relevant
differences.

What is the precise version of libc that is installed for one. Exact ESX
releases too.

This isn't really all that interesting a report for the project if it only
exists in one ancient system that cannot be experimented with. Maybe
it's a faulty register on that machine's CPU. There is more
double-checking and comparing that can be done here but it seems unlikely
to be productive. It is more plausible that the snowflake machine in
question just has issues and needs to be retired. Installing a newer
version of PostgreSQL on it before junking it is about the right amount of
experimental effort.

David J.

#5Carsten Klein
c.klein@datagis.com
In reply to: David G. Johnston (#4)
Re: Strange results when casting string to double

On Wed, Feb 16, 2022 at 08:11 PM David G. Johnston wrote:

You said they are more or less the same.  Problems like these tend to
hide in the "less" portion of the inequality.

On of the virtualized servers was created as a clone of the other one
(using VMware to clone the VM). So, basically, these are very equal. Of
course, they diverged over time.

Focusing on PostgreSQL, here are the differences of postgresql.conf,
comparing testing system and production system:

< work_mem = 8MB # min 64kB
---

work_mem = 4MB # min 64kB

417c417
< #log_statement = 'all' # none, ddl, mod, all
---

#log_statement = 'none' # none, ddl, mod, all

Both PostgreSQL server have the same roles and users, that same
extensions installed and no preloaded libraries.

Given that this isn't working as expected it doesn't make for a great
testing system.  Install and initdb 14.2 on this machine and let's see
what PostgreSQL produces.

The testing system runs since 2015. I don't know whether the problem was
present from the beginning. But I don't think so, as we also have
"correct" double values in that database. Now, since "binary equality"
of the double precision values is a new requirement, we started to
notice, that (at least not) newly added (UPDATEd) values, e. g. 1.56 are
not binary equal to already present values:

Table abc, column xyz: currently (before) 1.56

UPDATE abc SET xyz = 1.56;

Table abc, column xyz: after 1.559999999

We have a trigger, that monitors such updates and it reports a changed
value, which is not correct. The problem is, that the assignment

SET xyz = 1.56

actually means

SET xyz = 1.559999999

since getting a double value from the string 1.56 yields 1.559999999.

Yes, moving to the latest PostgreSQL version might fix that error.
However, this is a customer's testing system. Actually, it is intended
to be reinstalled with Ubuntu 22.04 LTS which brings PostgreSQL 14. But
prior to that, we need to complete a project on the testing system that
requires that "binary equality" of double values.

What is the precise version of libc that is installed for one.  Exact
ESX releases too.

Both VM servers run on

ESXi 6 (correct behavior)
ESXi 6.5 (misbehaving)

All machines use libc version 2.19 (libc-2.19.so).

This isn't really all that interesting a report for the project if it
only exists in one ancient system that cannot be experimented with.
Maybe it's a faulty register on that machine's CPU.  There is more
double-checking and comparing that can be done here but it seems
unlikely to be productive.  It is more plausible that the snowflake
machine in question just has issues and needs to be retired.  Installing
a newer version of PostgreSQL on it before junking it is about the right
amount of experimental effort.

I just wanted to ask whether someone knows something about this or has
ever heard about such a behavior. You say, the snowflake machine has
issues... I don't believe in hardware issues, since it runs in VMware
and likely on many different CPUs. Isn't it more than unlikely that such
a constantly occurring error is caused by one faulty CPU (among that
many CPUs an ESX server typically has)?

And, keep in mind that strtod function works as expected from a simply C
testing program. I guess that the parsed double's value gets modified
somewhere in PostgreSQL after strtod was called. However, I do not yet
see where and why. I was hoping that someone of you could help.

Carsten

#6Gavan Schneider
list.pg.gavan@pendari.org
In reply to: Carsten Klein (#5)
Re: Strange results when casting string to double

On 17 Feb 2022, at 8:02, Carsten Klein wrote:

We have a trigger, that monitors such updates and it reports a changed value, which is not correct. The problem is, that the assignment

SET xyz = 1.56

actually means

SET xyz = 1.559999999

since getting a double value from the string 1.56 yields 1.559999999.

<snip>

I just wanted to ask whether someone knows something about this or has ever heard about such a behavior. You say, the snowflake machine has issues... I don't believe in hardware issues, since it runs in VMware and likely on many different CPUs. Isn't it more than unlikely that such a constantly occurring error is caused by one faulty CPU (among that many CPUs an ESX server typically has)?

Harking back to my long distant formative years I was taught to never ever rely on equality tests when it came to floating point values. Nothing has changed in this regard.

If binary exact is part of the requirement then the answer is INTEGER or NUMERIC never FLOAT, REAL, DOUBLE or anything similar.

So, assuming the trigger function is the only source of this spurious grief one could apply the lesson taught in the 60s from the dawn of computing:
EQUALITY = absolute_value(op1 - op2) < epsilon — where op1 & op2 are DOUBLE, and epsilon is smaller than you care for

Given the intrinsic (standards compliant) uncertainty when converting from absolute (e.g., string representation) to floating point there will never be value resolving why there are differences.

I suggest using the comparison that is appropriate to the representation of those values or fix the design by using the proper representation.

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem — neat, plausible, and wrong.
— H. L. Mencken, 1920

#7Carsten Klein
c.klein@datagis.com
In reply to: Gavan Schneider (#6)
Re: Strange results when casting string to double

On Thu, Feb 17, 2022 at 00:07 Gavan Schneider wrote:

Harking back to my long distant formative years I was taught to never ever rely on equality tests when it came to floating point values. Nothing has changed in this regard.

If binary exact is part of the requirement then the answer is INTEGER or NUMERIC never FLOAT, REAL, DOUBLE or anything similar.

So, assuming the trigger function is the only source of this spurious grief one could apply the lesson taught in the 60s from the dawn of computing:
EQUALITY = absolute_value(op1 - op2) < epsilon — where op1 & op2 are DOUBLE, and epsilon is smaller than you care for

Given the intrinsic (standards compliant) uncertainty when converting from absolute (e.g., string representation) to floating point there will never be value resolving why there are differences.

I suggest using the comparison that is appropriate to the representation of those values or fix the design by using the proper representation.

Just some(!) background:

I know that there are other ways to compare _any_ floating point values.
However, doing that for a whole ROW in a fast manner is not trivial (e.
g. it tends to get quite slow). With the hstore extension and (so
called) binary equality I've found a very fast way which is also fully
generic. hstore uses text representation, so comparisons depend on how
values are converted to and from text.

But all that is not the point. Double precision conversion algorithms
are well defined and deterministic, so it should yield the same results
when called with the same arguments (every time and on every machine).
That is

1.56::double precision == 1.56000000000000005329070518201E0

1.56::double precision != 1.55999999999999983124610025698E0

With reduced precision (as in PG), that gives you:

1.56::double precision == 1.56

1.56::double precision != 1.55999999999999

However, one of my ProstgreSQL servers returns the latter (wrong) value.
You can test this with C library function 'strtod' or with any of the
online converters around:

https://www.binaryconvert.com/result_double.html

Click 'New Conversion' and enter 1.56 into the 'Decimal' field. Then
'Convert to binary' or hit enter.

So, the primary problem of that PostgreSQL server is, that it converts
text to double in a wrong way. Apart from any triggers, "binary
equality" and whatever else I'm doing in this project, this has dramatic
effects on the database, as it's messing up the values that I'm storing:

Imagine I do:

INSERT INTO foo (my_col) VALUES ('Hello World');

But the database has happily stored a different string:

SELECT my_col FROM foo;
my_col
-------------
Hello Worlc
(1 row)

Finding that string again may be done with "fuzzy search" or regular
expressions, but the much better approach is the database not to let
mess up the string while storing it.

Double precision values are limited in precision (in binary) and there
are numbers, that cannot be stored exactly (as it's true for many
decimal numbers, like 1/3 ~ 0.33333 as well). Nevertheless, with a given
maximum of precision, the same values should have the same (well
defined) binary value so that conversions between text and double should
not change the value at any time on any machine.

Carsten

#8Thomas Kellerer
shammat@gmx.net
In reply to: Carsten Klein (#1)
Re: Strange results when casting string to double

Carsten Klein schrieb am 16.02.2022 um 14:27:

I'm using several (now unsupported) PostgreSQL 9.3.24 servers on
different (ancient) Ubuntu 14.04 LTS machines. On only one of those
servers, I get strange/wrong results when converting a string into a
double value:

SELECT 1.56::double precision;

--> 1.55999999999999   (wrong!)

Although I do not find any differences in configuration, on all other servers the result looks like this (correct?):

SELECT 1.56::double precision;

--> 1.56               (correct!)

I wonder if extra_float_digits is different between those systems
Maybe initialized by differently configured SQL clients.

#9Carsten Klein
c.klein@datagis.com
In reply to: Thomas Kellerer (#8)
Re: Strange results when casting string to double

On Thu, Feb 17, 2022 at 09:41 AM Thomas Kellerer wrote:

Carsten Klein schrieb am 16.02.2022 um 14:27:

Ah, man versteht sich :)

I'm using several (now unsupported) PostgreSQL 9.3.24 servers on
different (ancient) Ubuntu 14.04 LTS machines. On only one of those
servers, I get strange/wrong results when converting a string into a
double value:

SELECT 1.56::double precision;

--> 1.55999999999999   (wrong!)

Although I do not find any differences in configuration, on all other servers the result looks like this (correct?):

SELECT 1.56::double precision;

--> 1.56               (correct!)

I wonder if extra_float_digits is different between those systems
Maybe initialized by differently configured SQL clients.

As I've seen in the sources in file /src/backend/utils/adt/float.c,
extra_float_digits is used when converting double precision values back
to text only. The binary (BIT) representation if the double value's
fraction (mantissa) tells me, that the actual double precision value is
already wrong.

Also, extra_float_digits is constantly zero in my tests. I've only been
using pgAdmin III for that and always ensured with

SELECT current_setting('extra_float_digits');

that it's zero.

Carsten

#10Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Carsten Klein (#5)
Re: Strange results when casting string to double

On 2022-02-16 22:02:02 +0100, Carsten Klein wrote:

On of the virtualized servers was created as a clone of the other one (using
VMware to clone the VM). So, basically, these are very equal. Of course,
they diverged over time.

Focusing on PostgreSQL, here are the differences of postgresql.conf,
comparing testing system and production system:

I don't think these explain the difference. I'd check whether the
postgresql binaries and all the the shared libraries are the same.

Or - since this only happens on the test system and not on the production
system - I'd just clone the production system again to create a new test
system and see if the problem happens there, too.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#11Carsten Klein
c.klein@datagis.com
In reply to: Peter J. Holzer (#10)
Re: Strange results when casting string to double

AOn Thu, Feb 17, 2022 at 10:27 AM Peter J. Holzer wrote

I don't think these explain the difference. I'd check whether the
postgresql binaries and all the the shared libraries are the same.

Or - since this only happens on the test system and not on the production
system - I'd just clone the production system again to create a new test
system and see if the problem happens there, too.

File postgresql and it's loaded libraries are identical. Same set of
libraries loaded on all servers; all loaded library have identical MD5 sums.

It's the customer's VMware, so I can't just make another clone. However,
I'm quite sure that it will work on a newly cloned testing system (since
on production system everything is OK).

Carsten

#12Peter Eisentraut
peter_e@gmx.net
In reply to: Carsten Klein (#1)
Re: Strange results when casting string to double

On 16.02.22 14:27, Carsten Klein wrote:

AFAIK, this conversion is done by internal function float8in, which,
when called directly, yields the same results:

SELECT float8in('1.56');

--> 1.55999999999999   (wrong!)   on one server, and
--> 1.56               (correct!) on all other servers.

float8in() really just calls the operating system's strtod() function.
I would test that one directly with a small C program.

It's also possible that different compiler options lead to different
optimizations.

#13Carsten Klein
c.klein@datagis.com
In reply to: Peter Eisentraut (#12)
Re: Strange results when casting string to double

On 18.02.2022 13:28, Peter Eisentraut wrote:

float8in() really just calls the operating system's strtod() function. I
would test that one directly with a small C program.

It's also possible that different compiler options lead to different
optimizations.

That's what I did. Here's my small C program: (nicht lachen *g*)

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char* argv[]) {

/* default string to convert */
char buf[10] = "1.56\0";

/* choose and print string to convert */
char* sval = argc > 1 ? argv[1] : buf;
printf("string value: %s\n", sval);

/* convert and print */
char* ptr;
double dval = strtod(sval, &ptr);
printf("double value: %.20f\n", dval);

return 0;
}

It works correctly on all these servers. Here's its output:

string value: 1.56
double value: 1.56000000000000005329

I didn't test different compiler options. However, PostgreSQL was always
installed from official Ubuntu 14.04 repositories (getting the binaries,
not the source packages), so all binaries should have been compiled with
the same options.

Carsten

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Carsten Klein (#13)
Re: Strange results when casting string to double

Carsten Klein <c.klein@datagis.com> writes:

On 18.02.2022 13:28, Peter Eisentraut wrote:

float8in() really just calls the operating system's strtod() function. I
would test that one directly with a small C program.

That's what I did. Here's my small C program: (nicht lachen *g*)

Yeah, you said that upthread, which makes the whole thing pretty
baffling. One possible explanation is that your small program got linked
against a different version of libc than what the Postgres backend is
using ("ldd" would help you check that, but given the age of the Postgres
installation, this seems plausible). Beyond that it's hard to think
of any explanation other than hardware fault or corrupted executable.

regards, tom lane

#15Carsten Klein
c.klein@datagis.com
In reply to: Tom Lane (#14)
Re: Strange results when casting string to double

On 18.02.2022 16:32, Tom Lane wrote:

Yeah, you said that upthread, which makes the whole thing pretty
baffling. One possible explanation is that your small program got linked
against a different version of libc than what the Postgres backend is
using ("ldd" would help you check that, but given the age of the Postgres
installation, this seems plausible). Beyond that it's hard to think
of any explanation other than hardware fault or corrupted executable.

Tom,

both PostgreSQL and my C program are linked to the same libc.so.6. Same
path, same MD5 sum. Since libc is a Shared Object (so), both processes
should really run the identical code. Am I missing something? I've
written and compiled the small C program on the same old Ubuntu OS.

So, you're not aware of any ways this behavior could be achieved from
within PostgreSQL? Something like a custom cast (actually, there is
none) or something that could intercept string to double conversion?
That would be something to look at closer. The question is: how would
you implement such an evil database wide text to double conversion (just
to kid users) if you had to?

PostgreSQL is up for more than 480 days on that server. I'm thinking of
giving a restart of the database a try. However, there's a long running
import taking place, so this will not happen before mid or end of next week.

Regards, Carsten

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Carsten Klein (#15)
Re: Strange results when casting string to double

Carsten Klein <c.klein@datagis.com> writes:

So, you're not aware of any ways this behavior could be achieved from
within PostgreSQL? Something like a custom cast (actually, there is
none) or something that could intercept string to double conversion?

Well, you asserted that these installations are all alike ... but
sure, it's conceivable that somebody could've replaced the default
numeric -> float8 cast (i.e., numeric_float8()) with some other code
that does it a bit inaccurately. There's only a pg_cast entry
connecting that function to those types. Have you tried stepping
through things with a debugger, to see if numeric_float8 and float8in
are actually reached on the problematic system?

That would be something to look at closer. The question is: how would
you implement such an evil database wide text to double conversion (just
to kid users) if you had to?

At the SQL level, your example does not involve text -> double.
The literal 1.56 is of type numeric.

regards, tom lane

#17Peter J. Holzer
hjp-pgsql@hjp.at
In reply to: Carsten Klein (#15)
Re: Strange results when casting string to double

On 2022-02-18 18:07:35 +0100, Carsten Klein wrote:

both PostgreSQL and my C program are linked to the same libc.so.6. Same
path, same MD5 sum. Since libc is a Shared Object (so), both processes
should really run the identical code. Am I missing something? I've written
and compiled the small C program on the same old Ubuntu OS.

[...]

PostgreSQL is up for more than 480 days on that server. I'm thinking of
giving a restart of the database a try. However, there's a long running
import taking place, so this will not happen before mid or end of next week.

If there has been a glibc update (or a postgresql update) in those 480
days (Ubuntu 14.04 is out of its normal maintenance period but ESM is
still available) the running processes may well run different code than
a newly started program. So it could be a bug which has since been
fixed.

Another idea: It could be the case that something (maybe a bug in
postgres, maybe an extension, maybe even a random bit flip in memory)
changed the FP rounding mode within the postgres process, which would
affect all FP computations until the rounding mode is reset. That would
have to have happened in the master to affect the worker processes as
consistently as you are seeing. I don't know if it is even possible for
a non-standard rounding mode to persist for any length of time, but if
it is it would certainly account for weird rounding errors.

hp

--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"

#18Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter J. Holzer (#17)
Re: Strange results when casting string to double

"Peter J. Holzer" <hjp-pgsql@hjp.at> writes:

Another idea: It could be the case that something (maybe a bug in
postgres, maybe an extension, maybe even a random bit flip in memory)
changed the FP rounding mode within the postgres process, which would
affect all FP computations until the rounding mode is reset. That would
have to have happened in the master to affect the worker processes as
consistently as you are seeing. I don't know if it is even possible for
a non-standard rounding mode to persist for any length of time, but if
it is it would certainly account for weird rounding errors.

Hmm, that is a pretty interesting theory. On a RHEL8 box, I find
that fesetround(FE_DOWNWARD) causes strtod("1.56", NULL) to return

1.55999999999999983124

rather than the usual

1.56000000000000005329

which seems to square with Carsten's symptom.

Postgres itself contains no fesetround calls, but if you want
to believe a random bit flip changed that mode, maybe that'd
account for it. It'd certainly be interesting to find out
whether the problem persists after a postmaster restart.

[ wanders away wondering if the troublesome machine has ECC
memory ... ]

regards, tom lane

#19Tomas Pospisek
tpo2@sourcepole.ch
In reply to: Peter J. Holzer (#17)
Re: Strange results when casting string to double

On 18.02.22 22:42, Peter J. Holzer wrote:

If there has been a glibc update (or a postgresql update) in those 480
days (Ubuntu 14.04 is out of its normal maintenance period but ESM is
still available) the running processes may well run different code than
a newly started program. So it could be a bug which has since been
fixed.

That would be visible via `lsof`. `libc. The file `...libc...so` that
`postgres` is keeping open would have the text `DEL` (as in deleted) in
the `FD` column of `lsof`'s output.

As opposed to a newly started program which would have `REG` (regular
file) there.
*t

#20Merlin Moncure
mmoncure@gmail.com
In reply to: Tomas Pospisek (#19)
Re: Strange results when casting string to double

On Sat, Feb 19, 2022 at 8:35 AM Tomas Pospisek <tpo2@sourcepole.ch> wrote:

On 18.02.22 22:42, Peter J. Holzer wrote:

If there has been a glibc update (or a postgresql update) in those 480
days (Ubuntu 14.04 is out of its normal maintenance period but ESM is
still available) the running processes may well run different code than
a newly started program. So it could be a bug which has since been
fixed.

That would be visible via `lsof`. `libc. The file `...libc...so` that
`postgres` is keeping open would have the text `DEL` (as in deleted) in
the `FD` column of `lsof`'s output.

As opposed to a newly started program which would have `REG` (regular
file) there.

If this doesn't bear fruit, are there debugging symbols? Setting a
breakpoint might produce some insight.

merlin

#21Carsten Klein
c.klein@datagis.com
In reply to: Tomas Pospisek (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Carsten Klein (#21)
#23Carsten Klein
c.klein@datagis.com
In reply to: Tom Lane (#22)
#24Tom Lane
tgl@sss.pgh.pa.us
In reply to: Carsten Klein (#23)