numeric_to_number() function skipping some digits

Started by Jeevan Chalkeover 16 years ago15 messages
#1Jeevan Chalke
jeevan.chalke@enterprisedb.com

Hi,

With PG84, I have tried something like this which seem incorrect to me.

# SELECT '' AS to_number_2, to_number('-347,58', '99G999');
to_number_2 | to_number
-------------+-----------
| -3458
(1 row)

After browsing the code (numeric_to_number), I have found that number string
is processed according to the number format. Which means, when 'G' is found
in the format we have moved the number pointer to next character. This is
why the digit '7' is skipped. And while processing ',', as it is not a digit
it is ignored.

Is this expected behaviour? or a bug?

Thanks

--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeevan Chalke (#1)
Re: numeric_to_number() function skipping some digits

Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:

With PG84, I have tried something like this which seem incorrect to me.

# SELECT '' AS to_number_2, to_number('-347,58', '99G999');

Well, the input doesn't actually match the format, so I'm not totally
excited about this. You do get sane results from:

regression=# select to_number('-347,58', '999G99');
to_number
-----------
-34758
(1 row)

Should we have it throw an error if the input corresponding to a G
symbol doesn't match the expected group separator? I'm concerned that
that would break applications that work okay today.

regards, tom lane

#3Brendan Jurd
direvus@gmail.com
In reply to: Tom Lane (#2)
Re: numeric_to_number() function skipping some digits

2009/9/19 Tom Lane <tgl@sss.pgh.pa.us>:

Should we have it throw an error if the input corresponding to a G
symbol doesn't match the expected group separator?  I'm concerned that
that would break applications that work okay today.

It would be a substantial change to the behaviour, and to do it
properly we'd have to change to_date() to actually parse separator
characters as well.

That is, you can currently write to_date('2009/09/19', 'YYYY-MM-DD')
-- it doesn't matter what the separator characters actually look like,
since per the format pattern they cannot affect the date outcome.

This naturally leads to the question we always have to ask with these
functions: What Does Oracle Do?

But FWIW, a -1 from me for changing this.

Cheers,
BJ

#4Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Brendan Jurd (#3)
Re: numeric_to_number() function skipping some digits

Hi,

On Sat, Sep 19, 2009 at 1:52 AM, Brendan Jurd <direvus@gmail.com> wrote:

2009/9/19 Tom Lane <tgl@sss.pgh.pa.us>:

Should we have it throw an error if the input corresponding to a G
symbol doesn't match the expected group separator? I'm concerned that
that would break applications that work okay today.

It would be a substantial change to the behaviour, and to do it
properly we'd have to change to_date() to actually parse separator
characters as well.

That is, you can currently write to_date('2009/09/19', 'YYYY-MM-DD')
-- it doesn't matter what the separator characters actually look like,
since per the format pattern they cannot affect the date outcome.

This naturally leads to the question we always have to ask with these
functions: What Does Oracle Do?

Oracle returns "19-SEP-09" irrespective of the format.
Here in PG, we have getting the proper date irrespective of the format as
Oracle. But in the case to to_number the returned value is wrong. For
example following query returns '340' on PG where as it returns '3450' on
Oracle.

select to_number('34,50','999,99') from dual;

But FWIW, a -1 from me for changing this.

Do you mean this is the expected behaviour then?

Cheers,
BJ

--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough

#5Brendan Jurd
direvus@gmail.com
In reply to: Jeevan Chalke (#4)
Re: numeric_to_number() function skipping some digits

2009/9/21 Jeevan Chalke <jeevan.chalke@enterprisedb.com>:

Oracle returns "19-SEP-09" irrespective of the format.
Here in PG, we have getting the proper date irrespective of the format as
Oracle. But in the case to to_number the returned value is wrong. For
example following query returns '340' on PG where as it returns '3450' on
Oracle.

select to_number('34,50','999,99') from dual;

Hi Jeevan,

Thanks for checking up on the Oracle behaviour. It appears to
silently disregard grouping characters in the format pattern, and also
disregard them wherever they appear in the input string (or else it
reads the string from right-to-left?).

It seems that, to match Oracle, we'd need to teach the code that 'G'
and ',' are no-ops for to_number(), and also that such characters
should be ignored in the input.

To be honest, though, I'm not sure it's worth pursuing. If you want
to feed in numbers that have decorative characters all through them,
it's far more predictable to just regex out the cruft and use ordinary
numeric parsing than to use to_number(), which is infamous for its
idiosyncrasies:

# SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric;
3450

Cheers,
BJ

#6Brendan Jurd
direvus@gmail.com
In reply to: Brendan Jurd (#5)
Re: numeric_to_number() function skipping some digits

2009/9/21 Brendan Jurd <direvus@gmail.com>:

# SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric;
3450

Sorry, that regex ought to have read E'[^\\d.]'.

#7Jeevan Chalke
jeevan.chalke@enterprisedb.com
In reply to: Brendan Jurd (#5)
Re: numeric_to_number() function skipping some digits

Hi,

On Mon, Sep 21, 2009 at 12:36 PM, Brendan Jurd <direvus@gmail.com> wrote:

2009/9/21 Jeevan Chalke <jeevan.chalke@enterprisedb.com>:

Oracle returns "19-SEP-09" irrespective of the format.
Here in PG, we have getting the proper date irrespective of the format as
Oracle. But in the case to to_number the returned value is wrong. For
example following query returns '340' on PG where as it returns '3450' on
Oracle.

select to_number('34,50','999,99') from dual;

Hi Jeevan,

Thanks for checking up on the Oracle behaviour. It appears to
silently disregard grouping characters in the format pattern, and also
disregard them wherever they appear in the input string (or else it
reads the string from right-to-left?).

It seems that Oracle reads formatting string from right-to-left. Here are
few results:
('number','format') ==> Oracle PG
--------------------------------------------
('34,50','999,99') ==> 3450 340
('34,50','99,99') ==> 3450 3450
('34,50','99,999') ==> Invalid Number 3450
('34,50','999,999') ==> Invalid Number 340

It seems that, to match Oracle, we'd need to teach the code that 'G'
and ',' are no-ops for to_number(), and also that such characters
should be ignored in the input.

That means we cannot simply ignore such characters from the input. Rather we
can process the string R-L. But yes this will definitely going to break the
current applications running today.

To be honest, though, I'm not sure it's worth pursuing. If you want
to feed in numbers that have decorative characters all through them,
it's far more predictable to just regex out the cruft and use ordinary
numeric parsing than to use to_number(), which is infamous for its
idiosyncrasies:

# SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric;
3450

This (with E'[^\\d.]') ignores/replaces all the characters except digits
from the input which we certainly not wishing to do. Instead we can continue
with the current implementation. But IMHO, somewhere in the time-line we
need to fix this.

Cheers,
BJ

Thanks
--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough

In reply to: Jeevan Chalke (#7)
Re: numeric_to_number() function skipping some digits

Jeevan Chalke escreveu:

That means we cannot simply ignore such characters from the input.
Rather we can process the string R-L. But yes this will definitely going
to break the current applications running today.

IIRC we tight the to_char() and to_timestamp() input for 8.4. Why don't we do
it for to_number() too? The fact of breaking application is not a strong
argument against having compatibility with Oracle to_*() functions. People
that are building applications around these two databases and are using the
to_*() functions are expecting that the behavior being the same (i didn't
remember any complaints about that but ...).

So +1 for this TODO.

--
Euler Taveira de Oliveira
http://www.timbira.com/

#9daveg
daveg@sonic.net
In reply to: Jeevan Chalke (#7)
Re: numeric_to_number() function skipping some digits

On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote:

It seems that Oracle reads formatting string from right-to-left. Here are
few results:
('number','format') ==> Oracle PG
--------------------------------------------
('34,50','999,99') ==> 3450 340
('34,50','99,99') ==> 3450 3450
('34,50','99,999') ==> Invalid Number 3450
('34,50','999,999') ==> Invalid Number 340

It seems worse to to give a wrong answer silently then to throw an error.
What we do now seems sort of MySqlish.

-dg

--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

#10Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: daveg (#9)
Re: numeric_to_number() function skipping some digits

daveg <daveg@sonic.net> wrote:

On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote:

It seems that Oracle reads formatting string from right-to-left.
Here are few results:
('number','format') ==> Oracle PG
--------------------------------------------
('34,50','999,99') ==> 3450 340
('34,50','99,99') ==> 3450 3450
('34,50','99,999') ==> Invalid Number 3450
('34,50','999,999') ==> Invalid Number 340

It seems worse to to give a wrong answer silently then to throw an
error. What we do now seems sort of MySqlish.

I agree with David on that.

Further, it seems to me that functions which are there only for Oracle
compatibility (i.e., they are not part of the SQL standard and are not
particularly sensible, but are popular in Oracle) should behave like
the corresponding Oracle function. An argument could even be made for
"bug compatibility" for such functions, at least to some extent;
although we're not talking about that here -- the Oracle results seem
more sane than current PostgreSQL behavior..

-Kevin

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#10)
Re: numeric_to_number() function skipping some digits

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

daveg <daveg@sonic.net> wrote:

On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote:

It seems that Oracle reads formatting string from right-to-left.

It seems worse to to give a wrong answer silently then to throw an
error. What we do now seems sort of MySqlish.

I agree with David on that.

So who is volunteering to do the presumably rather major rewrite
involved?

regards, tom lane

#12Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#11)
Re: numeric_to_number() function skipping some digits

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

daveg <daveg@sonic.net> wrote:

On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote:

It seems that Oracle reads formatting string from right-to-left.

It seems worse to to give a wrong answer silently then to throw an
error. What we do now seems sort of MySqlish.

I agree with David on that.

So who is volunteering to do the presumably rather major rewrite
involved?

We don't typically identify an author before putting something on the
TODO list.

That said, this one seems like it would be within my skill set. I
seem to have trouble finding tasks that are for which we can get
consensus, so I just might pick this one up if we achieve such
consensus on it. The biggest problem, should I take it on, would be
that I don't currently have access to an Oracle database, so someone
would have to supply me with accurate specs for how it should behave,
or point to unambiguous documentation.

-Kevin

#13Joshua D. Drake
jd@commandprompt.com
In reply to: Kevin Grittner (#12)
Re: numeric_to_number() function skipping some digits

On Wed, 2009-09-30 at 19:08 -0500, Kevin Grittner wrote:

Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

daveg <daveg@sonic.net> wrote:

On Tue, Sep 22, 2009 at 10:27:19AM +0530, Jeevan Chalke wrote:

It seems that Oracle reads formatting string from right-to-left.

It seems worse to to give a wrong answer silently then to throw an
error. What we do now seems sort of MySqlish.

I agree with David on that.

So who is volunteering to do the presumably rather major rewrite
involved?

We don't typically identify an author before putting something on the
TODO list.

That said, this one seems like it would be within my skill set. I
seem to have trouble finding tasks that are for which we can get
consensus, so I just might pick this one up if we achieve such
consensus on it. The biggest problem, should I take it on, would be
that I don't currently have access to an Oracle database, so someone
would have to supply me with accurate specs for how it should behave,
or point to unambiguous documentation.

Just download developer edition?

-Kevin

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

#14Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Joshua D. Drake (#13)
Re: numeric_to_number() function skipping some digits

"Joshua D. Drake" <jd@commandprompt.com> wrote:

On Wed, 2009-09-30 at 19:08 -0500, Kevin Grittner wrote:

I don't currently have access to an Oracle database

Just download developer edition?

[quick google search]

Looks like that would do it. Thanks.

-Kevin

#15Bruce Momjian
bruce@momjian.us
In reply to: Jeevan Chalke (#7)
Re: numeric_to_number() function skipping some digits

Added to TODO:

|Fix to_number() handling for values not matching the format string

---------------------------------------------------------------------------

Jeevan Chalke wrote:

Hi,

On Mon, Sep 21, 2009 at 12:36 PM, Brendan Jurd <direvus@gmail.com> wrote:

2009/9/21 Jeevan Chalke <jeevan.chalke@enterprisedb.com>:

Oracle returns "19-SEP-09" irrespective of the format.
Here in PG, we have getting the proper date irrespective of the format as
Oracle. But in the case to to_number the returned value is wrong. For
example following query returns '340' on PG where as it returns '3450' on
Oracle.

select to_number('34,50','999,99') from dual;

Hi Jeevan,

Thanks for checking up on the Oracle behaviour. It appears to
silently disregard grouping characters in the format pattern, and also
disregard them wherever they appear in the input string (or else it
reads the string from right-to-left?).

It seems that Oracle reads formatting string from right-to-left. Here are
few results:
('number','format') ==> Oracle PG
--------------------------------------------
('34,50','999,99') ==> 3450 340
('34,50','99,99') ==> 3450 3450
('34,50','99,999') ==> Invalid Number 3450
('34,50','999,999') ==> Invalid Number 340

It seems that, to match Oracle, we'd need to teach the code that 'G'
and ',' are no-ops for to_number(), and also that such characters
should be ignored in the input.

That means we cannot simply ignore such characters from the input. Rather we
can process the string R-L. But yes this will definitely going to break the
current applications running today.

To be honest, though, I'm not sure it's worth pursuing. If you want
to feed in numbers that have decorative characters all through them,
it's far more predictable to just regex out the cruft and use ordinary
numeric parsing than to use to_number(), which is infamous for its
idiosyncrasies:

# SELECT regexp_replace('34,50', E'[\\d.]', '', 'g')::numeric;
3450

This (with E'[^\\d.]') ignores/replaces all the characters except digits
from the input which we certainly not wishing to do. Instead we can continue
with the current implementation. But IMHO, somewhere in the time-line we
need to fix this.

Cheers,
BJ

Thanks
--
Jeevan B Chalke
EnterpriseDB Software India Private Limited, Pune
Visit us at: www.enterprisedb.com
---
If better is possible, then good is not enough

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +