Date-Time dangling unit fix
Hi all,
Attached is a patch to fix a parsing error for date-time types that
allow dangling units in the input. For example,
`date '1995-08-06 m y d'` was considered a valid date and the dangling
units were ignored.
Intervals also suffer from a similar issue, but the attached patch
doesn't fix that issue. For example,
`interval '1 day second month 6 hours days years ago'` is parsed as a
valid interval with -1 days and -6 hours. I'm hoping to fix that in a
later patch, but it will likely be more complicated than the other
date-time fixes.
- Joe Koshakow
Attachments:
v1-0001-Handle-dangling-units-in-date-time-input.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Handle-dangling-units-in-date-time-input.patchDownload+48-1
On Sun, Dec 11, 2022 at 10:29 AM Joseph Koshakow <koshy44@gmail.com> wrote:
Hi all,
Attached is a patch to fix a parsing error for date-time types that
allow dangling units in the input. For example,
`date '1995-08-06 m y d'` was considered a valid date and the dangling
units were ignored.Intervals also suffer from a similar issue, but the attached patch
doesn't fix that issue. For example,
`interval '1 day second month 6 hours days years ago'` is parsed as a
valid interval with -1 days and -6 hours. I'm hoping to fix that in a
later patch, but it will likely be more complicated than the other
date-time fixes.- Joe Koshakow
I think I sent that to the wrong email address.
Attachments:
v1-0001-Handle-dangling-units-in-date-time-input.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Handle-dangling-units-in-date-time-input.patchDownload+48-1
I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.
On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow <koshy44@gmail.com> wrote:
I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.
Attached is the described patch.
- Joe Koshakow
Attachments:
v2-0001-Handle-dangling-units-in-date-time-input.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Handle-dangling-units-in-date-time-input.patchDownload+81-2
Joseph Koshakow <koshy44@gmail.com> writes:
On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow <koshy44@gmail.com> wrote:
I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.
Attached is the described patch.
I started to look at this, and soon noticed that while we have test cases
matching this sort of date input, there is no documentation for it. The
code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
because it looks a lot like the ISO 8601 format for intervals (durations).
But I don't have a copy of ISO 8601, and some googling fails to find any
indication that anybody else believes this is a valid datetime format.
Wikipedia for example documents a lot of variants of ISO 8601 [1]https://en.wikipedia.org/wiki/ISO_8601,
but nothing that looks like this.
I wonder if we should just rip this code out instead of fixing it.
I suspect its real-world usage is not different from zero. We'd
have to keep the "Jnnn" Julian-date case, though, so maybe there's
little to be saved.
If we do keep it, there's documentation work to be done. But the
first bit of doco I'd want to see is a pointer to a standard.
regards, tom lane
On Sat, Mar 4, 2023 at 4:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I started to look at this, and soon noticed that while we have test
cases
matching this sort of date input, there is no documentation for it.
The
code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
because it looks a lot like the ISO 8601 format for intervals
(durations).
But I don't have a copy of ISO 8601, and some googling fails to find
any
indication that anybody else believes this is a valid datetime format.
Wikipedia for example documents a lot of variants of ISO 8601 [1],
but nothing that looks like this.I wonder if we should just rip this code out instead of fixing it.
I suspect its real-world usage is not different from zero. We'd
have to keep the "Jnnn" Julian-date case, though, so maybe there's
little to be saved.If we do keep it, there's documentation work to be done. But the
first bit of doco I'd want to see is a pointer to a standard.
I also don't have a copy of ISO 8601 and wasn't able to find anything
about this variant on Google. I did find this comment in datetime.c
/*
* Was this an "ISO date" with embedded field labels? An
* example is "y2001m02d04" - thomas 2001-02-04
*/
which comes from this commit [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f58115dddfa8ca63004c4784f57ef660422861d, which was authored by Thomas Lockhart
(presumably the same thomas from the comment). I've CC'ed Thomas in
case the email still exists and they happen to remember. The commit
message mentions ISO, but not the variant mentioned in the comment.
The mailing list thread can be found here [2]/messages/by-id/3BB433D5.3CB4164E@fourpalms.org, but it doesn't provide
much more information. I also found the following thread [3]/messages/by-id/3B970FF8.B9990807@fourpalms.org, which
happens to have you in it in case you remember it, which seemed to be
the motivation for commit [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f58115dddfa8ca63004c4784f57ef660422861d. It only contains the following line
about ISO:
o support for "ISO variants" on input, including embedded "T" preceeding
the time fields
All that seems to imply the "y2001m02d04" ISO variant was never really
discussed in much detail and it's probably fine to remove it. Though,
it has been around for 22 years which makes it a bit scary to remove.
- Joe Koshakow
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f58115dddfa8ca63004c4784f57ef660422861d
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6f58115dddfa8ca63004c4784f57ef660422861d
[2]: /messages/by-id/3BB433D5.3CB4164E@fourpalms.org
/messages/by-id/3BB433D5.3CB4164E@fourpalms.org
[3]: /messages/by-id/3B970FF8.B9990807@fourpalms.org
/messages/by-id/3B970FF8.B9990807@fourpalms.org
Hello,
05.03.2023 02:31, Joseph Koshakow wrote:
I also don't have a copy of ISO 8601 and wasn't able to find anything
about this variant on Google. I did find this comment in datetime.c/*
* Was this an "ISO date" with embedded field labels? An
* example is "y2001m02d04" - thomas 2001-02-04
*/which comes from this commit [1], which was authored by Thomas Lockhart
(presumably the same thomas from the comment).
I've also seen another interesting comment in datetime.c:
/*
* Was this an "ISO time" with embedded field labels? An
* example is "h04mm05s06" - thomas 2001-02-04
*/
In fact,
SELECT time 'h04mm05s06';
doesn't work for many years, but
SELECT time 'h04mm05s06.0';
still does.
I've just found that I mentioned it some time ago:
/messages/by-id/dff75442-2468-f74f-568c-6006e141062f@gmail.com
Best regards,
Alexander
Attached is a patch for removing the discussed format of date-times.
Attachments:
v2-0001-Remove-unknown-ISO-format-handle-dandling-units.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Remove-unknown-ISO-format-handle-dandling-units.patchDownload+37-215
[ I removed Lockhart, because he's taken no part in Postgres work for
more than twenty years; if that address even still works, you're
just bugging him ]
Alexander Lakhin <exclusion@gmail.com> writes:
In fact,
SELECT time 'h04mm05s06';
doesn't work for many years, but
SELECT time 'h04mm05s06.0';
still does.
I traced that down to this in DecodeTimeOnly:
if ((fmask & DTK_TIME_M) != DTK_TIME_M)
return DTERR_BAD_FORMAT;
where we have
#define DTK_ALL_SECS_M (DTK_M(SECOND) | DTK_M(MILLISECOND) | DTK_M(MICROSECOND))
#define DTK_TIME_M (DTK_M(HOUR) | DTK_M(MINUTE) | DTK_ALL_SECS_M)
So in other words, this test insists on seeing hour, minute, second,
*and* fractional-second fields. That seems obviously too picky.
It might not matter if we rip out this syntax, but I see other similar
tests so I suspect some of them will still be reachable.
Personally I'd say that hh:mm is a plenty complete enough time, and
whether you write seconds is optional, let alone fractional seconds.
We do accept this:
=> select '12:34'::time;
time
----------
12:34:00
(1 row)
so that must be going through a different code path, which I didn't
try to identify yet.
regards, tom lane
On Sun, Mar 5, 2023 at 12:54 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
We do accept this:
=> select '12:34'::time;
time
----------
12:34:00
(1 row)so that must be going through a different code path, which I didn't
try to identify yet.
That query will contain a single field of "12:34" with ftype DTK_TIME.
That will call into DecodeTime(), which calls into DecodeTimeCommon(),
where we have:
*tmask = DTK_TIME_M;
- Joe Koshakow
Also I removed some dead code from the previous patch.
- Joe Koshakow
Attachments:
v3-0001-Remove-unknown-ISO-format-handle-dandling-units.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Remove-unknown-ISO-format-handle-dandling-units.patchDownload+36-225
Joseph Koshakow <koshy44@gmail.com> writes:
Also I removed some dead code from the previous patch.
This needs a rebase over bcc704b52, so I did that and made a
couple of other adjustments.
I'm inclined to think that you removed too much from DecodeTimeOnly.
That does accept date specs (at least for timetz), and I see no very
good reason for it not to accept a Julian date spec. I also wonder
why you removed the UNITS: case there. Seems like we want these
functions to accept the same syntax as much as possible.
I think the code is still a bit schizophrenic about placement of
ptype specs. In the UNITS: case, we don't insist that a unit
apply to exactly the very next field; instead it applies to the next
one where it disambiguates. So for instead this is accepted:
regression=# select 'J PM 1234567 1:23'::timestamp;
timestamp
------------------------
1333-01-11 13:23:00 BC
That's a little weird, or maybe even a lot weird, but it's not
inherently nonsensical so I'm hesitant to stop accepting it.
However, if UNITS acts that way, then why is ISOTIME different?
So I'm inclined to remove ISOTIME's lookahead check
if (i >= nf - 1 ||
(ftype[i + 1] != DTK_NUMBER &&
ftype[i + 1] != DTK_TIME &&
ftype[i + 1] != DTK_DATE))
return DTERR_BAD_FORMAT;
and rely on the ptype-still-set error at the bottom of the loop
to complain about nonsensical cases.
Also, if we do keep the lookahead checks, the one in DecodeTimeOnly
could be simplified --- it's accepting some cases that actually
aren't supported there.
regards, tom lane
Attachments:
v4-0001-Remove-unknown-ISO-format-handle-dandling-units.patchtext/x-diff; charset=us-ascii; name*0=v4-0001-Remove-unknown-ISO-format-handle-dandling-units.pat; name*1=chDownload+47-230
10.03.2023 03:26, Tom Lane wrote:
Joseph Koshakow<koshy44@gmail.com> writes:
Also I removed some dead code from the previous patch.
That's a little weird, or maybe even a lot weird, but it's not
inherently nonsensical so I'm hesitant to stop accepting it.
However, if UNITS acts that way, then why is ISOTIME different?
So I'm inclined to remove ISOTIME's lookahead checkif (i >= nf - 1 ||
(ftype[i + 1] != DTK_NUMBER &&
ftype[i + 1] != DTK_TIME &&
ftype[i + 1] != DTK_DATE))
return DTERR_BAD_FORMAT;and rely on the ptype-still-set error at the bottom of the loop
to complain about nonsensical cases.
I also wonder how the units affect time zone parsing.
With the patch:
SELECT time with time zone '010203m+3';
ERROR: invalid input syntax for type time with time zone: "010203m+3"
But without the patch:
SELECT time with time zone '010203m+3';
01:02:03+03
Though with "non-unit" spec:
SELECT time with time zone '010203mmm+3';
01:02:03-03
(With or without the patch.)
It seems like "units" were just ignored in a time zone specification,
but now they are rejected.
At the same time, I see that the time zone specification allows for any
letters with the +/- sign following:
SELECT time with time zone '010203anyletters+3';
01:02:03-03
It's definitely a separate issue, I just want to note a new erroneous
condition.
Best regards,
Alexander
On 04.03.23 22:05, Tom Lane wrote:
Joseph Koshakow<koshy44@gmail.com> writes:
On Mon, Dec 12, 2022 at 10:55 AM Joseph Koshakow<koshy44@gmail.com> wrote:
I just found another class of this bug that the submitted patch does
not fix. If the units are at the beginning of the string, then they are
also ignored. For example, `date 'm d y2020m11d3'` is also valid. I
think the fix here is to check and make sure that ptype is 0 before
reassigning the value to a non-zero number. I'll send an updated patch
with this tonight.Attached is the described patch.
I started to look at this, and soon noticed that while we have test cases
matching this sort of date input, there is no documentation for it. The
code claims it's an "ISO" (presumably ISO 8601) format, and maybe it is
because it looks a lot like the ISO 8601 format for intervals (durations).
But I don't have a copy of ISO 8601, and some googling fails to find any
indication that anybody else believes this is a valid datetime format.
Wikipedia for example documents a lot of variants of ISO 8601 [1],
but nothing that looks like this.
There are additional formats in (the lesser known) ISO 8601-2, one of
which looks like this:
'1985Y4M12D', calendar year 1985, April 12th
But that is entirely incompatible with the above example, because it has
the units after the numbers.
Even more reason not to support the earlier example.
Alexander Lakhin <exclusion@gmail.com> writes:
I also wonder how the units affect time zone parsing.
With the patch:
SELECT time with time zone '010203m+3';
ERROR: invalid input syntax for type time with time zone: "010203m+3"
But without the patch:
SELECT time with time zone '010203m+3';
01:02:03+03
Yeah, I think this is the ptype-still-set-at-end-of-loop check.
I'm fine with throwing an error for that.
Though with "non-unit" spec:
SELECT time with time zone '010203mmm+3';
01:02:03-03
(With or without the patch.)
It seems like "units" were just ignored in a time zone specification,
but now they are rejected.
I think it's reading "mmm+3" as a POSIX timezone spec. From memory,
POSIX allows any sequence of 3 or more letters as a zone abbreviation.
It looks like we're being lax and not enforcing the "3 or more" part:
regression=# set time zone 'foobar+3';
SET
regression=# select timeofday();
timeofday
----------------------------------------
Fri Mar 10 12:08:24.484853 2023 FOOBAR
(1 row)
regression=# set time zone 'fo+3';
SET
regression=# select timeofday();
timeofday
------------------------------------
Fri Mar 10 12:08:38.207311 2023 FO
(1 row)
regards, tom lane