New to_timestamp implementation is pretty strict

Started by Heikki Linnakangasabout 17 years ago20 messages
#1Heikki Linnakangas
heikki.linnakangas@enterprisedb.com

I like strict in general, but this doesn't seem logical:

postgres=# SELECT to_timestamp('29-12-2005 01:2:03', 'DD-MM-YYYY
HH24:MI:SS'); -- works
to_timestamp
------------------------
2005-12-29 01:02:03+02
(1 row)

postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY
HH24:MI:SS'); -- doesn't work
ERROR: source string too short for "SS" formatting field
DETAIL: Field requires 2 characters, but only 1 remain.
HINT: If your source string is not fixed-width, try using the "FM"
modifier.

I think the end of string should be treated like a field separator,
colon in this example, and we should accept both of the above. Opinions?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#2David E. Wheeler
david@kineticode.com
In reply to: Heikki Linnakangas (#1)
Re: New to_timestamp implementation is pretty strict

On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote:

postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY
HH24:MI:SS'); -- doesn't work
ERROR: source string too short for "SS" formatting field
DETAIL: Field requires 2 characters, but only 1 remain.
HINT: If your source string is not fixed-width, try using the "FM"
modifier.

I think the end of string should be treated like a field separator,
colon in this example, and we should accept both of the above.
Opinions?

I'm generally in favor of being generous in the input one can accept,
but in this case it seems ambiguous to me. Is that supposed to be :30
or :03? There's no way to tell.

My $0.02.

Best,

David

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David E. Wheeler (#2)
Re: New to_timestamp implementation is pretty strict

"David E. Wheeler" <david@kineticode.com> writes:

On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote:

I think the end of string should be treated like a field separator,
colon in this example, and we should accept both of the above.
Opinions?

I'm generally in favor of being generous in the input one can accept,
but in this case it seems ambiguous to me. Is that supposed to be :30
or :03? There's no way to tell.

But notice that we are allowing a single digit for the hour and minute
fields. It's inconsistent that the last field works differently.
(And it is that it's the last field, not that it's SS --- try minutes
as the last field.)

regards, tom lane

#4Dave Page
dpage@pgadmin.org
In reply to: David E. Wheeler (#2)
Re: New to_timestamp implementation is pretty strict

On Mon, Dec 1, 2008 at 2:45 PM, David E. Wheeler <david@kineticode.com> wrote:

On Dec 1, 2008, at 1:08 PM, Heikki Linnakangas wrote:

postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY
HH24:MI:SS'); -- doesn't work
ERROR: source string too short for "SS" formatting field
DETAIL: Field requires 2 characters, but only 1 remain.
HINT: If your source string is not fixed-width, try using the "FM"
modifier.

I think the end of string should be treated like a field separator, colon
in this example, and we should accept both of the above. Opinions?

I'm generally in favor of being generous in the input one can accept, but in
this case it seems ambiguous to me. Is that supposed to be :30 or :03?
There's no way to tell.

How is it ambiguous? The leading zero is technically redundant. A
trailing on most certainly isn't.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#5David E. Wheeler
david@kineticode.com
In reply to: Tom Lane (#3)
Re: New to_timestamp implementation is pretty strict

On Dec 1, 2008, at 3:52 PM, Tom Lane wrote:

I'm generally in favor of being generous in the input one can accept,
but in this case it seems ambiguous to me. Is that supposed to be :30
or :03? There's no way to tell.

But notice that we are allowing a single digit for the hour and minute
fields. It's inconsistent that the last field works differently.
(And it is that it's the last field, not that it's SS --- try minutes
as the last field.)

Oh, well yeah, it should be consistent. But I'm still not sure that :3
should be allowed. OTOH, who does that, anyway?

Best,

David

#6David E. Wheeler
david@kineticode.com
In reply to: Dave Page (#4)
Re: New to_timestamp implementation is pretty strict

On Dec 1, 2008, at 3:55 PM, Dave Page wrote:

I'm generally in favor of being generous in the input one can
accept, but in
this case it seems ambiguous to me. Is that supposed to be :30 or :
03?
There's no way to tell.

How is it ambiguous? The leading zero is technically redundant. A
trailing on most certainly isn't.

it depends on how you look at it, I suppose. If you look at ":xy" as
"x" being the 10s position and "y" being the 1s position, it makes no
sense. If you look at it as an integer, it does.

Best,

David

#7Alvaro Herrera
alvherre@commandprompt.com
In reply to: David E. Wheeler (#5)
Re: New to_timestamp implementation is pretty strict

David E. Wheeler wrote:

Oh, well yeah, it should be consistent. But I'm still not sure that :3
should be allowed. OTOH, who does that, anyway?

Anyone who prints times as %d:%d:%d. You can find those in the wild.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

#8Dave Page
dpage@pgadmin.org
In reply to: David E. Wheeler (#6)
Re: New to_timestamp implementation is pretty strict

On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler <david@kineticode.com> wrote:

it depends on how you look at it, I suppose. If you look at ":xy" as "x"
being the 10s position and "y" being the 1s position, it makes no sense.

Suffice it to say, I don't look at it that way :-). I'd wager most
people wouldn't either, but I have no data to back that up of course.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

#9David E. Wheeler
david@kineticode.com
In reply to: Alvaro Herrera (#7)
Re: New to_timestamp implementation is pretty strict

On Dec 1, 2008, at 4:07 PM, Alvaro Herrera wrote:

David E. Wheeler wrote:

Oh, well yeah, it should be consistent. But I'm still not sure
that :3
should be allowed. OTOH, who does that, anyway?

Anyone who prints times as %d:%d:%d. You can find those in the wild.

I guess I should have expected that. Sheesh.

Best,

David

#10David E. Wheeler
david@kineticode.com
In reply to: Dave Page (#8)
Re: New to_timestamp implementation is pretty strict

On Dec 1, 2008, at 4:09 PM, Dave Page wrote:

On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler
<david@kineticode.com> wrote:

it depends on how you look at it, I suppose. If you look at ":xy"
as "x"
being the 10s position and "y" being the 1s position, it makes no
sense.

Suffice it to say, I don't look at it that way :-). I'd wager most
people wouldn't either, but I have no data to back that up of course.

Yeah, I could see that. It makes no sense to me (":3" just looks
weird), but maybe I just think too much like a computer. ;-)

Best,

David

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Dave Page (#8)
Re: New to_timestamp implementation is pretty strict

Dave Page wrote:

On Mon, Dec 1, 2008 at 3:02 PM, David E. Wheeler <david@kineticode.com> wrote:

it depends on how you look at it, I suppose. If you look at ":xy" as "x"
being the 10s position and "y" being the 1s position, it makes no sense.

Suffice it to say, I don't look at it that way :-). I'd wager most
people wouldn't either, but I have no data to back that up of course.

Isn't the point that ambiguity is undesirable, as is inconsistency? So
counts of people who see this one way or the other should be irrelevant.

Alvaro noted the use in the wild of formats like "%d:%d:%d" for times.
IMNSHO we should not cater to such bad code.

cheers

andrew

cheers

#12Greg Stark
stark@enterprisedb.com
In reply to: Andrew Dunstan (#11)
Re: New to_timestamp implementation is pretty strict

How would you parse an input format of just 'SS' ? is there something
ambiguous about '3' ? I don't see anything "bad" about using %d to
output an integer number of seconds.

--
greg

#13Robert Haas
robertmhaas@gmail.com
In reply to: Greg Stark (#12)
Re: New to_timestamp implementation is pretty strict

On Mon, Dec 1, 2008 at 10:33 AM, Greg Stark <stark@enterprisedb.com> wrote:

How would you parse an input format of just 'SS' ? is there something
ambiguous about '3' ? I don't see anything "bad" about using %d to
output an integer number of seconds.

+1.

It seems to me that it's pretty silly to say that we "know" that the 2
in "01:2:03" is intended to mean 02, but we are somehow confused about
whether the 3 in "01:02:3" is intended to mean 03 or 30. Sure, the
latter could be the result of a truncation, but if the user is
randomly truncating their strings, they're going to have problems with
a lot more than to_timestamp().

...Robert

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Greg Stark (#12)
Re: New to_timestamp implementation is pretty strict

Greg Stark wrote:

How would you parse an input format of just 'SS' ? is there something
ambiguous about '3' ? I don't see anything "bad" about using %d to
output an integer number of seconds.

The docs say that SS corresponds to "second (00-59)", so clearly it
should expect a two digit zero padded number.

What's so hard about using "%0.2d" ?

cheers

andrew

#15Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#13)
Re: New to_timestamp implementation is pretty strict

Another point here is that we have always accepted single digits in dates:

portal=> select '2008-11-1'::date;
date
------------
2008-11-01
(1 row)

portal=> select '2008-1-11'::date;
date
------------
2008-01-11
(1 row)

If we're going to handle dates and timestamps inconsistently, there
should be a good reason for it.

...Robert

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#15)
Re: New to_timestamp implementation is pretty strict

"Robert Haas" <robertmhaas@gmail.com> writes:

Another point here is that we have always accepted single digits in dates:

Yeah, but that's the general datetime input code, which has rather
different goals than to_timestamp().

After thinking about it I'm inclined to feel that SS and friends should
insist on exactly 2 digits. If you want to allow 1-or-2-digits then use
FMSS, just like the error message tells you. (However, I have a vague
feeling that Oracle doesn't insist on this, and in the end we ought to
follow Oracle's behavior. Can anyone check?)

In any case, it's certainly broken that the last field behaves
differently from not-last fields. I'm not all that set on whether we
insist on two digits or not, but I do think the inconsistency needs
to be fixed.

regards, tom lane

#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Robert Haas (#13)
Re: New to_timestamp implementation is pretty strict

Robert Haas wrote:

On Mon, Dec 1, 2008 at 10:33 AM, Greg Stark <stark@enterprisedb.com> wrote:

How would you parse an input format of just 'SS' ? is there something
ambiguous about '3' ? I don't see anything "bad" about using %d to
output an integer number of seconds.

+1.

It seems to me that it's pretty silly to say that we "know" that the 2
in "01:2:03" is intended to mean 02, but we are somehow confused about
whether the 3 in "01:02:3" is intended to mean 03 or 30.

Yep. It's a fair argument that we shouldn't accept either, but the
inconsistency is just wrong. I've committed a patch fixing the
inconsistency, by allowing "01:02:3".

Now whether we should forbid both, my opinion is that we shouldn't; that
would just unnecessarily brake old applications, and I don't think
there's much danger of ambiguity in what "01:2:03" means.

For better or worse, we also allow these more questionable inputs:

postgres=# SELECT to_timestamp('2008/-3/01', 'YYYY/MM/DD');
to_timestamp
------------------------
2007-09-01 00:00:00+03
(1 row)

postgres=# SELECT to_timestamp('2008--3-01', 'YYYY-MM-DD');
to_timestamp
------------------------
2007-09-01 00:00:00+03
(1 row)

postgres=# SELECT to_timestamp('2008-03', 'YYYY-MM-DD');
to_timestamp
------------------------
2008-03-01 00:00:00+02
(1 row)

postgres=# SELECT to_timestamp('2008-03-04-foobar', 'YYYY-MM-DD');
to_timestamp
------------------------
2008-03-04 00:00:00+02
(1 row)

The argument for rejecting these is stronger, IMHO, but given that we
allowed these in previous releases as well, I don't think we try to
forbid them either.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#18Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#17)
Re: New to_timestamp implementation is pretty strict

For better or worse, we also allow these more questionable inputs:

Wow. Those are all pretty atrocious.

Even so, it's not clear to me that there's a lot of merit to changing
the behavior. If to_timestamp() isn't rigorous enough, you can always
stick some additional error checking in front of it; it's easy to
write a regular expression that will only match EXACTLY YYYY-MM-DD if
that's what you want to do. If to_timestamp() is excessively
pedantic, it forces you into rewriting to_timestamp(), which is a lot
more work. I probably still wouldn't make it accept anything quite
as... creative... as these examples if starting over, but now that the
existing version is out there, I think breaking backward compatibility
isn't warranted.

...Robert

#19Brendan Jurd
direvus@gmail.com
In reply to: Heikki Linnakangas (#1)
Re: New to_timestamp implementation is pretty strict

On Mon, Dec 1, 2008 at 11:08 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

postgres=# SELECT to_timestamp('29-12-2005 01:02:3', 'DD-MM-YYYY
HH24:MI:SS'); -- doesn't work

...

I think the end of string should be treated like a field separator, colon in
this example, and we should accept both of the above. Opinions?

(With apologies for being late to the thread)

I can agree that it is inconsistent to treat separator characters,
like colons and hyphens, differently to the end of the string. If
we're allowing separators to imply variable-width nodes, then the end
of the string should do so as well, so I think Heikki did the right
thing here.

However, I actually think that, ultimately, Heikki's example above
*should* be rejected, and so should any other attempt to provide a
value of the wrong width, even if there are separators present, unless
the user has specified the 'FM' fill mode flag.

Heikki's example shows a too-short string for SS, but what about one
that is too long? Should we accept

# to_timestamp('29122005 0102333', 'DDMMYYYY HH24MISS')

As meaning three hundred and thirty-three seconds? I would argue we
shouldn't; it's most likely that the user made an error, so the right
thing to do is throw an exception and give them an opportunity to fix
it. Making guesses about the user's intention when the input is
heavily ambiguous isn't a fun game to be playing, for us or for the
user.

Given the contrary arguments (backwards- and Oracle- compatibility,
mostly) I decided that was too much for me to bite off in my patch.
That was the same reason I didn't fiddle with the treatment of
end-of-string; 8.3 didn't treat it as a separator either. The
different is that, although it always treated the final field as
fixed-width, prior to my patch it didn't actually throw an error when
fixed-width fields were too short.

Cheers,
BJ

#20Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#16)
Re: New to_timestamp implementation is pretty strict

Tom Lane wrote:

After thinking about it I'm inclined to feel that SS and friends should
insist on exactly 2 digits. If you want to allow 1-or-2-digits then use
FMSS, just like the error message tells you. (However, I have a vague
feeling that Oracle doesn't insist on this, and in the end we ought to
follow Oracle's behavior. Can anyone check?)

Oracle doesn't insist on it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com