Rectifying wrong Date outputs

Started by Piyush Neweabout 15 years ago25 messageshackers
Jump to latest
#1Piyush Newe
piyush.newe@enterprisedb.com

Hi,

I was randomly testing some date related stuff on PG & observed that the
outputs were wrong.

e.g.
postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YY');
to_date
------------
3910-01-01 <--------- Look at this
(1 row)

postgres=# SELECT TO_DATE('01-jan-2010', 'DD-MON-YYYY');
to_date
------------
2010-01-01
(1 row)

User can provide the year in any format i.e. 1-digit to 4-digit, in this
case the internal representation of century should be logically correct.

Considering this situation, I am drawing the table below & logging my
observation here. This might help us to demonstrate where we are lacking in
PG.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-1', 'DD-MON-Y') 2001-01-01 01-JAN-2001*
*TO_DATE('01-jan-1', 'DD-MON-YY') 2001-01-01 01-JAN-2001*
*TO_DATE('01-jan-1', 'DD-MON-YYY') 2001-01-01 01-JAN-2001*
*TO_DATE('01-jan-1', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001*

In this case, all the cases seems correct. Also the YEAR part in the output
is seems logical.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-10', 'DD-MON-Y') 2010-01-01 Error *
*TO_DATE('01-jan-10', 'DD-MON-YY') 2010-01-01 01-JAN-2010*
*TO_DATE('01-jan-10', 'DD-MON-YYY') 2010-01-01 01-JAN-2010*
*TO_DATE('01-jan-10', 'DD-MON-YYYY') 0010-01-01 01-JAN-0010*

In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
1st case the output is not correct since the Format ('Y') is lesser than the
actual input ('10'). But PG is ignoring this condition and throwing whatever
is input. The output year is might not be the year, what user is expecting.
Hence PG should throw an error.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-067', 'DD-MON-Y') 2067-01-01 Error*
*TO_DATE('01-jan-111', 'DD-MON-YY') 2011-01-01 Error*
*TO_DATE('01-jan-678', 'DD-MON-YYY') 1678-01-01 01-JAN-2678*
*TO_DATE('01-jan-001', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001*

In this case, only last case seems correct in PG. Rest other cases are might
not be logical, rather the output is vague. In PG, I haven't seen any
document which is saying something like this, if year is 111...999 then the
century would be 2000 and 001...099 then then century would be 1000.
However, the 1st and 2nd case should throw an error since the output format
('Y' & 'YY') are really not matching with the Input ('067' & '111'),
respectively.

*Data Format PostgreSQL EDBAS*
*TO_DATE('01-jan-2010', 'DD-MON-Y') 4010-01-01 Error*
*TO_DATE('01-jan-2010', 'DD-MON-YY') 3910-01-01 Error *
*TO_DATE('01-jan-2010', 'DD-MON-YYY') 3010-01-01 Error *
*TO_DATE('01-jan-2010', 'DD-MON-YYYY') 2010-01-01 01-JAN-2010*

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Again it should throw error in these cases, because Output Format
is not matching with Input Data. The exception here is the 2nd case, where
century is well-defined.

After observing the all above cases, the summary would be, if the output
format is lesser than the actual input value, then it should throw an error.
Considering this thumb rule, we can fix the wrong outputs in PG. I have made
necessary changes to the code & attaching the patch with this email.

In the patch, I have written one centralize function which will decide the
century depends upon the given Input format.

Thoughts ? Any other ideas on this ?

--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Attachments:

Fix_Century_V1_PG.patchtext/x-patch; charset=US-ASCII; name=Fix_Century_V1_PG.patchDownload+65-21
#2Robert Haas
robertmhaas@gmail.com
In reply to: Piyush Newe (#1)
Re: Rectifying wrong Date outputs

On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
<piyush.newe@enterprisedb.com> wrote:

Data Format                 PostgreSQL EDBAS
TO_DATE('01-jan-10',  'DD-MON-Y')         2010-01-01 Error
TO_DATE('01-jan-10',  'DD-MON-YY')         2010-01-01 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-YYY') 2010-01-01 01-JAN-2010
TO_DATE('01-jan-10',  'DD-MON-YYYY') 0010-01-01 01-JAN-0010
In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
1st case the output is not correct since the Format ('Y') is lesser than the
actual input ('10'). But PG is ignoring this condition and throwing whatever
is input. The output year is might not be the year, what user is expecting.
Hence PG should throw an error.

I can't get worked up about this. If there's a consensus that
throwing an error here is better, fine, but on first blush the PG
behavior doesn't look unreasonable to me.

Data Format                 PostgreSQL EDBAS
TO_DATE('01-jan-2010', 'DD-MON-Y') 4010-01-01 Error
TO_DATE('01-jan-2010', 'DD-MON-YY') 3910-01-01 Error
TO_DATE('01-jan-2010', 'DD-MON-YYY') 3010-01-01 Error
TO_DATE('01-jan-2010', 'DD-MON-YYYY') 2010-01-01 01-JAN-2010

These cases look a lot stranger. I'm inclined to think that if the
number of digits specified exceeds the number of Ys, then we can
either (1) throw an error, as you suggest or (2) give the same answer
we would have given if the number of Ys were equal to the number of
digits given. In other words, if we're not going to error out here,
all of these should return 2010-01-01.

Data Format PostgreSQL EDBAS
TO_DATE('01-jan-067', 'DD-MON-Y') 2067-01-01 Error
TO_DATE('01-jan-111', 'DD-MON-YY') 2011-01-01 Error
TO_DATE('01-jan-678', 'DD-MON-YYY') 1678-01-01 01-JAN-2678
TO_DATE('01-jan-001', 'DD-MON-YYYY') 0001-01-01 01-JAN-0001

These are so strange that it's hard to reason about them; who uses
three-digit years? In the third case above, you should EDBAS
deciding that 678 means 2678 instead of 1678, but that seems quite
arbitrary. 1678 seems just as plausible. But the behavior in the
second case looks wrong (shouldn't the answer should be either 1111 or
2111?) and the first case looks inconsistent with the third one (why
does 067 mean 2067 rather than 1967 while 678 means 1678 rather than
2678?).

I'm inclined to think that we have a bug here in the case where the #
of digits given is greater than the # of Ys. See also this:

rhaas=# select to_date('01-jan-678', 'DD-MON-Y');
to_date
------------
2678-01-01
(1 row)

rhaas=# select to_date('01-jan-678', 'DD-MON-YY');
to_date
------------
2578-01-01
(1 row)

rhaas=# select to_date('01-jan-678', 'DD-MON-YYY');
to_date
------------
1678-01-01
(1 row)

It's a lot less clear to me that we have a bug in the other direction
(# of digits given is less than the # of Ys), but maybe....

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#2)
Re: Rectifying wrong Date outputs

Robert Haas wrote:

On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
<piyush.newe@enterprisedb.com> wrote:

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
1st case the output is not correct since the Format ('Y') is lesser than the
actual input ('10'). But PG is ignoring this condition and throwing whatever
is input. The output year is might not be the year, what user is expecting.
Hence PG should throw an error.

I can't get worked up about this. If there's a consensus that
throwing an error here is better, fine, but on first blush the PG
behavior doesn't look unreasonable to me.

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS

To clarify, the user is reporting EDB Advanced Server, though the
community PG has the same issues, or at least similar; with git HEAD:

test=> SELECT TO_DATE('01-jan-2010', 'DD-MON-YY');
to_date
------------
3910-01-01
(1 row)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#4Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#3)
Re: Rectifying wrong Date outputs

On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian <bruce@momjian.us> wrote:

Robert Haas wrote:

On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
<piyush.newe@enterprisedb.com> wrote:

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
1st case the output is not correct since the Format ('Y') is lesser than the
actual input ('10'). But PG is ignoring this condition and throwing whatever
is input. The output year is might not be the year, what user is expecting.
Hence PG should throw an error.

I can't get worked up about this.  If there's a consensus that
throwing an error here is better, fine, but on first blush the PG
behavior doesn't look unreasonable to me.

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS

To clarify, the user is reporting EDB Advanced Server, though the
community PG has the same issues, or at least similar;  with git HEAD:

       test=> SELECT TO_DATE('01-jan-2010',  'DD-MON-YY');
         to_date
       ------------
        3910-01-01
       (1 row)

Actually, I think he's comparing PostgreSQL to Advanced Server.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#5Bruce Momjian
bruce@momjian.us
In reply to: Robert Haas (#4)
Re: Rectifying wrong Date outputs

Robert Haas wrote:

On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian <bruce@momjian.us> wrote:

Robert Haas wrote:

On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
<piyush.newe@enterprisedb.com> wrote:

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
In this case, it seems in last 3 cases PG is behaving correctly. Whereas in
1st case the output is not correct since the Format ('Y') is lesser than the
actual input ('10'). But PG is ignoring this condition and throwing whatever
is input. The output year is might not be the year, what user is expecting.
Hence PG should throw an error.

I can't get worked up about this. ?If there's a consensus that
throwing an error here is better, fine, but on first blush the PG
behavior doesn't look unreasonable to me.

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS

To clarify, the user is reporting EDB Advanced Server, though the
community PG has the same issues, or at least similar; ?with git HEAD:

? ? ? ?test=> SELECT TO_DATE('01-jan-2010', ?'DD-MON-YY');
? ? ? ? ?to_date
? ? ? ?------------
? ? ? ? 3910-01-01
? ? ? ?(1 row)

Actually, I think he's comparing PostgreSQL to Advanced Server.

Oh, I understand now. I was confused that the headings didn't line up
with the values. I see now the first value is community PG and the
second is EDBAS.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

#6Piyush Newe
piyush.newe@enterprisedb.com
In reply to: Bruce Momjian (#5)
Re: Rectifying wrong Date outputs

Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
Advanced Server.
Thanks Burce for clarification.

For the 1-digit, 2-digit & 3-digit Year inputs, as I said, I didn't see any
document in PG which will explain what would be the century considered if it
is not given. If I missed out it somewhere please let me know.

I refer to following link which explains the behavior of Oracle.
http://forums.oracle.com/forums/thread.jspa?threadID=312239

Now, if
1. # of digits given is greater than the # of Ys

i.e.
postgres=# select to_date('01-jan-111', 'DD-MON-Y');
to_date
------------
2111-01-01
(1 row)

What we should do ? Either we should throw an error or we should give what
user has provided.
IMHO, we should follow what format is given by user. However, even if the
'format' gets wrong rather invalid, it is not throwing any error.

e.g.
postgres=# select to_date('01-jan-111', 'DD-MON-Y POSTGRES'); <<<<---- Look
at this
to_date
------------
2111-01-01
(1 row)

2. # of digits given is less than the # of Ys
Consider following case,

postgres=# select to_date('01-jan-6', 'DD-MON-Y'), to_date('01-jan-6',
'DD-MON-YYYY');
to_date | to_date
------------+------------
2006-01-01 | 0006-01-01

Why this behaviour not predictable ? I think we are always considering the
current century, if it is not provided. If I missed out any related
document, please share.

And yes,

postgres=# select to_date('01-jan-1761', 'DD-MON-Y');
to_date
------------
3761-01-01 <<<----- Look at this.
(1 row)

Definitely, their is a bug in this case.

Am I missing something ?

-Piyush

On Thu, Mar 17, 2011 at 3:30 AM, Bruce Momjian <bruce@momjian.us> wrote:

Robert Haas wrote:

On Wed, Mar 16, 2011 at 5:52 PM, Bruce Momjian <bruce@momjian.us> wrote:

Robert Haas wrote:

On Wed, Mar 16, 2011 at 8:21 AM, Piyush Newe
<piyush.newe@enterprisedb.com> wrote:

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS
TO_DATE('01-jan-10', ?'DD-MON-Y') ?? ? ? ?2010-01-01 Error
TO_DATE('01-jan-10', ?'DD-MON-YY') ?? ? ? ?2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYY') 2010-01-01 01-JAN-2010
TO_DATE('01-jan-10', ?'DD-MON-YYYY') 0010-01-01 01-JAN-0010
In this case, it seems in last 3 cases PG is behaving correctly.

Whereas in

1st case the output is not correct since the Format ('Y') is lesser

than the

actual input ('10'). But PG is ignoring this condition and throwing

whatever

is input. The output year is might not be the year, what user is

expecting.

Hence PG should throw an error.

I can't get worked up about this. ?If there's a consensus that
throwing an error here is better, fine, but on first blush the PG
behavior doesn't look unreasonable to me.

Data Format ?? ? ? ? ? ? ? ?PostgreSQL EDBAS

To clarify, the user is reporting EDB Advanced Server, though the
community PG has the same issues, or at least similar; ?with git HEAD:

? ? ? ?test=> SELECT TO_DATE('01-jan-2010', ?'DD-MON-YY');
? ? ? ? ?to_date
? ? ? ?------------
? ? ? ? 3910-01-01
? ? ? ?(1 row)

Actually, I think he's comparing PostgreSQL to Advanced Server.

Oh, I understand now. I was confused that the headings didn't line up
with the values. I see now the first value is community PG and the
second is EDBAS.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

--
--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Piyush Newe (#6)
Re: Rectifying wrong Date outputs

Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:

Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
Advanced Server.
Thanks Burce for clarification.

For the 1-digit, 2-digit & 3-digit Year inputs, as I said, I didn't see any
document in PG which will explain what would be the century considered if it
is not given. If I missed out it somewhere please let me know.

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges. Some of it has been fixed,
but a lot of bugs remain. Looks like this is one of those places and it
seems appropriate to spend some time fixing it. Since it would involve
a behavior change, it should only go to 9.2, of course.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#8Robert Haas
robertmhaas@gmail.com
In reply to: Alvaro Herrera (#7)
Re: Rectifying wrong Date outputs

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Excerpts from Piyush Newe's message of jue mar 17 02:30:06 -0300 2011:

Sorry for creating the confusion. The table drawn was PostgreSQL vs EDB
Advanced Server.
Thanks Burce for clarification.

For the 1-digit, 2-digit & 3-digit Year inputs, as I said, I didn't see any
document in PG which will explain what would be the century considered if it
is not given. If I missed out it somewhere please let me know.

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges.  Some of it has been fixed,
but a lot of bugs remain.  Looks like this is one of those places and it
seems appropriate to spend some time fixing it.  Since it would involve
a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits > # of Ys in
9.1, if the fix is simple and clear-cut. I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday. It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#8)
Re: Rectifying wrong Date outputs

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges. �Some of it has been fixed,
but a lot of bugs remain. �Looks like this is one of those places and it
seems appropriate to spend some time fixing it. �Since it would involve
a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits > # of Ys in
9.1, if the fix is simple and clear-cut. I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday. It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

Agreed, it's certainly not too late for bug fixes in 9.1. I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.

Having said that, it's not entirely clear to me what sane behavior is
here. Personally I would expect that an n-Ys format spec would consume
at most n digits from the input. Otherwise how are you going to use
to_date to pick apart strings that don't have any separators? So
I think the problem is actually upstream of the behavior complained of
here. However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.

regards, tom lane

#10Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Robert Haas (#8)
Re: Rectifying wrong Date outputs

Excerpts from Robert Haas's message of jue mar 17 11:09:56 -0300 2011:

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges.  Some of it has been fixed,
but a lot of bugs remain.  Looks like this is one of those places and it
seems appropriate to spend some time fixing it.  Since it would involve
a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits > # of Ys in
9.1, if the fix is simple and clear-cut. I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday. It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

If it can be delivered quickly and it is simple, sure. But anything
more involved should respect the release schedule.

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#11Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Tom Lane (#9)
Re: Rectifying wrong Date outputs

Tom Lane <tgl@sss.pgh.pa.us> wrote:

what we should first do is see what Oracle does in such cases,
because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.

+1

-Kevin

#12Piyush Newe
piyush.newe@enterprisedb.com
In reply to: Tom Lane (#9)
Re: Rectifying wrong Date outputs

On Thu, Mar 17, 2011 at 7:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges. Some of it has been fixed,
but a lot of bugs remain. Looks like this is one of those places and it
seems appropriate to spend some time fixing it. Since it would involve
a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits > # of Ys in
9.1, if the fix is simple and clear-cut. I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday. It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

Agreed, it's certainly not too late for bug fixes in 9.1. I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.

I feel the patch for the same would be easier and was attached in the
initial mail of this mail thread. For your ready reference, I am attaching
the same patch here again.

Having said that, it's not entirely clear to me what sane behavior is
here. Personally I would expect that an n-Ys format spec would consume
at most n digits from the input. Otherwise how are you going to use
to_date to pick apart strings that don't have any separators? So
I think the problem is actually upstream of the behavior complained of
here. However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.

Following is the extended chart which is comparing the behavior of Oracle,
PG & EDBAS.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, all the cases are in sync except the 1st one. I didn't
understand why Oracle is interpreting year '1' as '2011'.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 Error
TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010

In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
Oracle is throwing error in 1st case since the Format ('Y') is lesser than
the actual value ('10'). But PostgreSQL is ignoring this case and throwing
whatever is input. The output is might not be the same was user is
expecting.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 Error
TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, just last case was correct in PG. Rest other cases are not in
sync with Oracle, rather the output is vague.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 Error
TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 Error
TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
the format is lesser than the actual value. It seems this rule is not
applicable for 2nd case in Oracle.

In all above mentioned cases, the observation is, If the # Ys are lesser
than the # of digits,, then it should throw an error. Only in case of 'YY',
its not correct, unless the year is later than 9999. In this way, we can fix
the wrong outputs in PG.

regards, tom lane

--
--
Piyush S Newe
Principal Engineer
EnterpriseDB
office: +91 20 3058 9500
www.enterprisedb.com

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the
individual or entity to whom it is addressed. This message contains
information from EnterpriseDB Corporation that may be privileged,
confidential, or exempt from disclosure under applicable law. If you are not
the intended recipient or authorized to receive this for the intended
recipient, any use, dissemination, distribution, retention, archiving, or
copying of this communication is strictly prohibited. If you have received
this e-mail in error, please notify the sender immediately by reply e-mail
and delete this message.

Attachments:

Fix_Century_V1_PG.patchtext/x-patch; charset=US-ASCII; name=Fix_Century_V1_PG.patchDownload+65-21
#13Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Piyush Newe (#12)
Re: Rectifying wrong Date outputs

On 21.03.2011 07:40, Piyush Newe wrote:

On Thu, Mar 17, 2011 at 7:56 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Robert Haas<robertmhaas@gmail.com> writes:

On Thu, Mar 17, 2011 at 9:46 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:

Keep in mind that the datetime stuff was abandoned by the maintainer
some years ago with quite some rough edges. Some of it has been fixed,
but a lot of bugs remain. Looks like this is one of those places and it
seems appropriate to spend some time fixing it. Since it would involve
a behavior change, it should only go to 9.2, of course.

I wouldn't object to fixing the problem with # of digits> # of Ys in
9.1, if the fix is simple and clear-cut. I think we are still
accepting patches to make minor tweaks, like the tab-completion patch
I committed yesterday. It also doesn't bother me tremendously if we
push it off, but I don't think that anyone's going to be too sad if
TO_DATE('01-jan-2010', 'DD-MON-YYY') starts returning something more
sensible than 3010-01-01.

Agreed, it's certainly not too late for bug fixes in 9.1. I agree
that this isn't something we would want to tweak in released branches,
but 9.1 isn't there yet.

I feel the patch for the same would be easier and was attached in the
initial mail of this mail thread. For your ready reference, I am attaching
the same patch here again.

Having said that, it's not entirely clear to me what sane behavior is
here. Personally I would expect that an n-Ys format spec would consume
at most n digits from the input. Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?

Yeah, seems reasonable.

So
I think the problem is actually upstream of the behavior complained of
here. However, what we should first do is see what Oracle does in such
cases, because the main driving factor for these functions is Oracle
compatibility not what might seem sane in a vacuum.

Following is the extended chart which is comparing the behavior of Oracle,
PG& EDBAS.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YYY') 01-JAN-2001 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-1', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, all the cases are in sync except the 1st one. I didn't
understand why Oracle is interpreting year '1' as '2011'.

It makes sense to me. Year "1", when dat format is "Y", means the year
closest to current date that ends with 1. Or maybe the year that ends
with 1 in the current decade. This is analoguous to how two-digit years
are interpreted (except that we've hardcoded that the "current date" to
compare against is year 2000 - an assumption that will start to bite us
some time before year 2100).

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-10', 'DD-MON-Y') Error 01-JAN-2010 Error
TO_DATE('01-jan-10', 'DD-MON-YY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10', 'DD-MON-YYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010
TO_DATE('01-jan-10', 'DD-MON-YYYY') 01-JAN-0010 01-JAN-0010 01-JAN-0010

In this case, it seems in last 3 cases PostgreSQL is behaving correctly.
Oracle is throwing error in 1st case since the Format ('Y') is lesser than
the actual value ('10'). But PostgreSQL is ignoring this case and throwing
whatever is input. The output is might not be the same was user is
expecting.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-067', 'DD-MON-Y') Error 01-JAN-2067 Error
TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-001', 'DD-MON-YYYY') 01-JAN-0001 01-JAN-0001 01-JAN-0001

In this case, just last case was correct in PG. Rest other cases are not in
sync with Oracle, rather the output is vague.

*Data Format Oracle PostgreSQL EDBAS*

TO_DATE('01-jan-2010', 'DD-MON-Y') Error 01-JAN-4010 Error
TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error
TO_DATE('01-jan-2010', 'DD-MON-YYY') Error 01-JAN-3010 Error
TO_DATE('01-jan-2010', 'DD-MON-YYYY') 01-JAN-2010 01-JAN-2010 01-JAN-2010

In this case, PG is giving wrong output in first 3 cases. Those need to get
rectified. Oracle is throwing error in 1st and 3rd case and the reason is,
the format is lesser than the actual value. It seems this rule is not
applicable for 2nd case in Oracle.

Yeah, quite inconsistent :-(.

These results are not in favor of the idea that a format with n Ys
always consumess up to n digits from the input. With that rule,
to_date('01-jan-2010', 'DD-MON-YY') would return "01-JAN-2020", which
isn't what Oracle does and seems quite surprising too.

So ignoring the cases where Oracle throws an error but PostgreSQL
doesn't, there's four cases where the results differ:

*Data Format Oracle PostgreSQL EDBAS*
TO_DATE('01-jan-1', 'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-111', 'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678', 'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-2010', 'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error

IMHO our current behavior in 2nd and 4th case is so bizarre that we
should change them to match Oracle. I think we should fix the 1st too,
the notion that a single-digit year means something between 2000-2009
seems pretty useless (granted, using a single digit for year is
brain-dead to begin with).

The 3rd one is debatable. The range for three-digit years is currently
1100-2099, which is enough range for many applications. But should we
change it for the sake of matching Oracle's behavior? Not that anyone
uses YYY in practice, but still.

BTW, whatever behavior we choose, this needs to be documented. I don't
see anything in the docs on how Y, YY or YYY are expanded.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#14Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#13)
Re: Rectifying wrong Date outputs

On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Having said that, it's not entirely clear to me what sane behavior is
here.  Personally I would expect that an n-Ys format spec would consume
at most n digits from the input.  Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?

Yeah, seems reasonable.

On the flip side, what if you want to allow either a two digit year or
a four digit year? It doesn't seem unreasonable to allow YY to
emcompass what YYYY would have allowed, unless there's a separate
notion for 'either YY or YYYY'.

It makes sense to me. Year "1", when dat format is "Y", means the year
closest to current date that ends with 1. Or maybe the year that ends with 1
in the current decade. This is analoguous to how two-digit years are
interpreted (except that we've hardcoded that the "current date" to compare
against is year 2000 - an assumption that will start to bite us some time
before year 2100).

Agree with all of this.

So ignoring the cases where Oracle throws an error but PostgreSQL doesn't,
there's four cases where the results differ:

*Data Format Oracle PostgreSQL EDBAS*
TO_DATE('01-jan-1',  'DD-MON-Y') 01-JAN-2011 01-JAN-2001 01-JAN-2001
TO_DATE('01-jan-111',  'DD-MON-YY') 01-JAN-0111 01-JAN-2011 Error
TO_DATE('01-jan-678',  'DD-MON-YYY') 01-JAN-2678 01-JAN-1678 01-JAN-2678
TO_DATE('01-jan-2010',  'DD-MON-YY') 01-JAN-2010 01-JAN-3910 Error

IMHO our current behavior in 2nd and 4th case is so bizarre that we should
change them to match Oracle. I think we should fix the 1st too, the notion
that a single-digit year means something between 2000-2009 seems pretty
useless (granted, using a single digit for year is brain-dead to begin
with).

I agree, but do we understand what Oracle does categorically, rather
than just its output on this specific input?

The 3rd one is debatable. The range for three-digit years is currently
1100-2099, which is enough range for many applications. But should we change
it for the sake of matching Oracle's behavior? Not that anyone uses YYY in
practice, but still.

I'm OK with that, but again, exactly what rule is Oracle applying here?

BTW, whatever behavior we choose, this needs to be documented. I don't see
anything in the docs on how Y, YY or YYY are expanded.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#14)
Re: Rectifying wrong Date outputs

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Having said that, it's not entirely clear to me what sane behavior is
here. �Personally I would expect that an n-Ys format spec would consume
at most n digits from the input. �Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?

Yeah, seems reasonable.

On the flip side, what if you want to allow either a two digit year or
a four digit year? It doesn't seem unreasonable to allow YY to
emcompass what YYYY would have allowed, unless there's a separate
notion for 'either YY or YYYY'.

What I was thinking was that YYYY would take either 2 or 4 digits.
Whatever you do here, the year will have to be delimited by a non-digit
for such cases to be parseable.

I'm OK with that, but again, exactly what rule is Oracle applying here?

Yeah. Hopefully they documented it, and we don't have to try to
reverse-engineer the intention from an undersized set of samples.

regards, tom lane

#16Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#15)
Re: Rectifying wrong Date outputs

On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 21, 2011 at 6:24 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

Having said that, it's not entirely clear to me what sane behavior is
here.  Personally I would expect that an n-Ys format spec would consume
at most n digits from the input.  Otherwise how are you going to use
to_date to pick apart strings that don't have any separators?

Yeah, seems reasonable.

On the flip side, what if you want to allow either a two digit year or
a four digit year?  It doesn't seem unreasonable to allow YY to
emcompass what YYYY would have allowed, unless there's a separate
notion for 'either YY or YYYY'.

What I was thinking was that YYYY would take either 2 or 4 digits.
Whatever you do here, the year will have to be delimited by a non-digit
for such cases to be parseable.

I was assuming a slightly more general variant of that - namely, Y,
YY, or YYY would all accept that many digits, or more; and the result
of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or YYYY,
respectively, had been used.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#16)
Re: Rectifying wrong Date outputs

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What I was thinking was that YYYY would take either 2 or 4 digits.
Whatever you do here, the year will have to be delimited by a non-digit
for such cases to be parseable.

I was assuming a slightly more general variant of that - namely, Y,
YY, or YYY would all accept that many digits, or more; and the result
of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or YYYY,
respectively, had been used.

As far as I can see, that would completely destroy the use-case of
trying to parse a string where there's not non-digit delimiters and
so you have to take exactly the specified number of digits, not more.

Why not head in the other direction of allowing fewer digits than
suggested by the format, instead of more?

regards, tom lane

#18Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#17)
Re: Rectifying wrong Date outputs

On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 21, 2011 at 9:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

What I was thinking was that YYYY would take either 2 or 4 digits.
Whatever you do here, the year will have to be delimited by a non-digit
for such cases to be parseable.

I was assuming a slightly more general variant of that - namely, Y,
YY, or YYY would all accept that many digits, or more; and the result
of Y with 2, 3, or 4 digits would be the same as if YY, YYY, or YYYY,
respectively, had been used.

As far as I can see, that would completely destroy the use-case of
trying to parse a string where there's not non-digit delimiters and
so you have to take exactly the specified number of digits, not more.

Yeah, I thought about that, but it seems that use case is already
hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
worry about it. If Piyush's table is to be believed, Oracle only
throws an error for the wrong number of digits if the format is Y or
YYY, and the actual number of digits is more. If the format is YY,
then it accepts 2, 3, or 4 digit years. And since YY is exponentially
more likely to be used than Y or YYY, that pretty much means you can't
do what you're talking about using this syntax anyway.

Why not head in the other direction of allowing fewer digits than
suggested by the format, instead of more?

Well, that seems a bit counterintuitive to me. I think it's much more
likely that someone wants to insist on a four-digit year (and not
allow just two digits) than that they want to insist on a two-digit
year (and not allow four digits). I also think that would be pretty
terrible for Oracle compatibility, since they're clearly interpreting
99 vs. YYYY as meaning either 0099, not 1999 or 2099. I don't think
we want to be randomly incompatible there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#18)
Re: Rectifying wrong Date outputs

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

As far as I can see, that would completely destroy the use-case of
trying to parse a string where there's not non-digit delimiters and
so you have to take exactly the specified number of digits, not more.

Yeah, I thought about that, but it seems that use case is already
hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
worry about it.

How so?

regression=# select to_date('20110321', 'YYYYMMDD');
to_date
------------
2011-03-21
(1 row)

regression=# select to_date('110321', 'YYMMDD');
to_date
------------
2011-03-21
(1 row)

If you break the latter case, I am sure the villagers will be on your
doorstep shortly.

regards, tom lane

#20Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#19)
Re: Rectifying wrong Date outputs

On Mon, Mar 21, 2011 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Mar 21, 2011 at 10:18 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

As far as I can see, that would completely destroy the use-case of
trying to parse a string where there's not non-digit delimiters and
so you have to take exactly the specified number of digits, not more.

Yeah, I thought about that, but it seems that use case is already
hopelessly broken in both PostgreSQL and Oracle, so I'm disinclined to
worry about it.

How so?

regression=# select to_date('20110321', 'YYYYMMDD');
 to_date
------------
 2011-03-21
(1 row)

regression=# select to_date('110321', 'YYMMDD');
 to_date
------------
 2011-03-21
(1 row)

If you break the latter case, I am sure the villagers will be on your
doorstep shortly.

Oh, dear. No wonder this code is so hard to get right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21Piyush Newe
piyush.newe@enterprisedb.com
In reply to: Robert Haas (#20)
#22Piyush Newe
piyush.newe@enterprisedb.com
In reply to: Piyush Newe (#21)
#23Bruce Momjian
bruce@momjian.us
In reply to: Piyush Newe (#1)
#24Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#23)
#25Bruce Momjian
bruce@momjian.us
In reply to: Bruce Momjian (#24)