Inconsistency between TO_CHAR() and TO_NUMBER()

Started by Patryk Kordylewskialmost 13 years ago10 messagesbugs
Jump to latest

Hi,

i have the following test-case:

http://psql.privatepaste.com/b3b431851a

We want to convert a TO_CHAR() formated numeric back to numeric using
TO_NUMBER() with the same format string. This does not work with a
german locale.

By removing the FM prefix and using the exact amount of digits in the
format string for TO_NUMBER() it starts to work. Switching the locale
back to en_US does work too.

Is this a bug or am i missing something?

Thanks!

Best regards,
Patryk

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Patryk Kordylewski (#1)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

Patryk Kordylewski <pk@fooby.de> writes:

Hi,
i have the following test-case:

http://psql.privatepaste.com/b3b431851a

Just for the record: this is a completely unacceptable method of
submitting a bug report. After a month from now, when that paste has
expired, nobody looking at the PG archives will have any way to find out
what you were talking about. Even without the archival consideration,
you've added an extra step to fetch the test case for anyone reading
your mail.

Having said that, though, this does look wrong ...

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Tom Lane (#2)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

Am 06.05.2013 16:26, schrieb Tom Lane:

Patryk Kordylewski <pk@fooby.de> writes:

Hi,
i have the following test-case:

http://psql.privatepaste.com/b3b431851a

Just for the record: this is a completely unacceptable method of
submitting a bug report. After a month from now, when that paste has
expired, nobody looking at the PG archives will have any way to find out
what you were talking about. Even without the archival consideration,
you've added an extra step to fetch the test case for anyone reading
your mail.

Having said that, though, this does look wrong ...

regards, tom lane

Hmpf, you're right. I'm sorry. Here is the test-case:

PostgreSQL 9.2.4 (also tested on 9.1 and 8.2)

### WORKS

SET lc_numeric TO 'en_US.UTF-8';
SET

SELECT
TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
'FM99G999G999G999G999G999G999D000');
to_char | to_number
-----------------+---------------
123,456,789.123 | 123456789.123
(1 row)

### DOES NOT WORK

SET lc_numeric TO 'de_DE.UTF-8';
SET

SELECT
TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
'FM99G999G999G999G999G999G999D000');
to_char | to_number
-----------------+-----------
123.456.789,123 | 123.456
(1 row)

### FORMAT STRING WITHOUT FM PREFIX AND EXACT NUMBER OF DIGITS WORKS

SELECT
TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
'999G999G999D000');
to_char | to_number
-----------------+---------------
123.456.789,123 | 123456789.123
(1 row)

Thank you.

Best regards,
Patryk

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Patryk Kordylewski (#3)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

Patryk Kordylewski <pk@fooby.de> writes:

SET lc_numeric TO 'de_DE.UTF-8';
SET

SELECT
TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
'FM99G999G999G999G999G999G999D000');
to_char | to_number
-----------------+-----------
123.456.789,123 | 123.456
(1 row)

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*. So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch. I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale. Perhaps we should only apply this to HEAD and not back-patch?

regards, tom lane

#5Thomas Kellerer
spam_eater@gmx.net
In reply to: Tom Lane (#4)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

Tom Lane wrote on 10.05.2013 17:49:

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*. So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch. I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale. Perhaps we should only apply this to HEAD and not back-patch?

The manual claims that 'D' is locale dependent (whereas '.' is not), so
_theoretically_ a back patch would make sense I guess.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

In reply to: Thomas Kellerer (#5)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

On 10-05-2013 13:09, Thomas Kellerer wrote:

Tom Lane wrote on 10.05.2013 17:49:

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*. So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch. I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale. Perhaps we should only apply this to HEAD and not back-patch?

+1 only in HEAD. That's because (a) it doesn't crash, (b) it doesn't
always produce the "wrong" answer (only in some specific situation) and
(c) it has been like that for years without a complain. For those
reasons, it is better to continue with this "wrong" behavior in back
branches than prevent important security updates to be applied (without
applying a patch to preserve the "wrong" answer). This argument is only
valid for legacy closed-source apps but seems to have more weight than
the bug scenario.

The manual claims that 'D' is locale dependent (whereas '.' is not), so
_theoretically_ a back patch would make sense I guess.

I would consider a documentation bug in back branches because fix it
means break apps.

--
Euler Taveira de Oliveira - Timbira http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Euler Taveira de Oliveira (#6)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

On 11.05.2013 01:17, Euler Taveira wrote:

On 10-05-2013 13:09, Thomas Kellerer wrote:

Tom Lane wrote on 10.05.2013 17:49:

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*. So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch. I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale. Perhaps we should only apply this to HEAD and not back-patch?

+1 only in HEAD. That's because (a) it doesn't crash, (b) it doesn't
always produce the "wrong" answer (only in some specific situation) and
(c) it has been like that for years without a complain. For those
reasons, it is better to continue with this "wrong" behavior in back
branches than prevent important security updates to be applied (without
applying a patch to preserve the "wrong" answer). This argument is only
valid for legacy closed-source apps but seems to have more weight than
the bug scenario.

+1 for HEAD-only. The Finnish language and locale uses comma (,) as the
decimal separator, and it's a real pain in the ass. And if something
goes wrong there, it can be *really* subtle. I once had to debug an
application where all prices were suddenly rounded down to the nearest
euro. And it only happened on some servers (those with locale set to
Finnish). It was not a PostgreSQL application, but it turned out to be a
bug in the JDBC driver of another DBMS.

Would it be possible to be lenient, and also accept . as the decimal
separator, when there is no ambiguity? Ie. when . is not the thousands
separator.

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#7)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

Heikki Linnakangas <hlinnakangas@vmware.com> writes:

Would it be possible to be lenient, and also accept . as the decimal
separator, when there is no ambiguity? Ie. when . is not the thousands
separator.

I originally coded it that way, but concluded that it was probably a
waste of code space. How many locales can you point to where neither
the decimal point nor thousands_sep is "."? It certainly wouldn't be
enough to noticeably reduce the potential pain from this change, so
I decided that it'd be better to keep the behavior straightforward
and as-documented.

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#9Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#8)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

On 13.05.2013 17:09, Tom Lane wrote:

Heikki Linnakangas<hlinnakangas@vmware.com> writes:

Would it be possible to be lenient, and also accept . as the decimal
separator, when there is no ambiguity? Ie. when . is not the thousands
separator.

I originally coded it that way, but concluded that it was probably a
waste of code space. How many locales can you point to where neither
the decimal point nor thousands_sep is "."?

On my laptop, there are eight locales that use "," as the decimal
separator and " " as the thousands separator.

$ grep -l "^thousands_sep.*U00A0" /usr/share/i18n/locales/* | xargs grep
-l "^decimal_point.*U002C"
/usr/share/i18n/locales/cs_CZ
/usr/share/i18n/locales/et_EE
/usr/share/i18n/locales/fi_FI
/usr/share/i18n/locales/lv_LV
/usr/share/i18n/locales/nb_NO
/usr/share/i18n/locales/ru_RU
/usr/share/i18n/locales/sk_SK
/usr/share/i18n/locales/uk_UA

Out of these, ru_RU actually uses "." as the LC_MONETARY decimal point,
even though it uses "," as the LC_NUMERIC decimal point. I think that
strengthens the argument for accepting both. I don't speak Russian, but
if you pass a monetary value to TO_NUMBER in ru_RU locale, using "." as
the decimal separator, you probably would expect it to work.

According to
http://en.wikipedia.org/wiki/Decimal_separator#Examples_of_use, many
countries accept either "1 234 567,89" or "1.234.567,89" style, but
looking at the locale files installed on my system, the latter style is
the one actually used (e.g Germany).

- Heikki

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#4)
Re: Inconsistency between TO_CHAR() and TO_NUMBER()

On Fri, May 10, 2013 at 11:49:18AM -0400, Tom Lane wrote:

Patryk Kordylewski <pk@fooby.de> writes:

SET lc_numeric TO 'de_DE.UTF-8';
SET

SELECT
TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
TO_NUMBER(TO_CHAR(123456789.123, 'FM99G999G999G999G999G999G999D000'),
'FM99G999G999G999G999G999G999D000');
to_char | to_number
-----------------+-----------
123.456.789,123 | 123.456
(1 row)

I looked into this, and find that the reason it misbehaves is that
NUM_numpart_from_char() will treat a '.' as being a decimal point
*without any regard to locale considerations*. So even if we have
a locale-dependent format string and a locale that says '.' is a
thousands separator, it does the wrong thing.

It's a bit surprising nobody's complained of this before.

I propose the attached patch. I'm slightly worried though about whether
this might break any existing applications that are (incorrectly)
depending on a D format specifier being able to match '.' regardless of
locale. Perhaps we should only apply this to HEAD and not back-patch?

I never trust to_char() changes for backpatching. ;-)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs