Proposed patch: make SQL interval-literal syntax work per spec
Over in that TPC-H thread, I was bemoaning once again the never-finished
support for SQL-spec interval literals. I decided to go look at exactly
how unfinished it was, and it turns out that it's actually pretty close.
Hence the attached proposed patch ;-)
The main gating factor is that coerce_type doesn't want to pass typmod
through to the datatype input function when converting a literal
constant. This is necessary for certain types like char and varchar
but loses badly for interval. I have a feeling that we changed that
behavior after Tom Lockhart left the project, which may mean that
interval wasn't quite as broken when he left it as it is today.
Anyway, the attached patch simply hardwires a special case for INTERVAL.
Given that this is reflective of a special case in the standard, and
that there's no very good reason for anyone else to design a datatype
that acts this way, I don't feel too bad about such a hack; but has
anyone got a better idea?
After that it's just a matter of getting DecodeInterval to do the
right things; and it turns out that about half the logic for SQL-spec
input syntax was there already. Almost the only thing I had to change
was the code to decide what a plain integer field at the right end of
the input means.
The patch includes regression test changes that illustrate what it does.
I am not sure about some of the corner cases --- anyone want to see if
their understanding of the spec for <interval string> is different?
There is still some unfinished business if anyone wants to make it
really exactly 100% spec compliant. In particular the spec seems to
allow a minus sign *outside* the string literal, and if I'm reading it
right, a precision spec in combination with field restrictions ought to
look like INTERVAL '...' DAY TO SECOND(3) not INTERVAL(3) '...' DAY TO
SECOND. However, for these you'll get a syntax error instead of
silently wrong answers if you try to use the other syntax, so it's not
quite as pernicious as the matters addressed here.
regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote:
The patch includes regression test changes that illustrate what it
does.
I am not sure about some of the corner cases --- anyone want to see
if
their understanding of the spec for <interval string> is different?
The patch seems to support extensions to the standard.
(1) In the spec, an interval value or literal must be either
year-month or day-time. (I guess they didn't want to try to deal with
the sticky issues of what it means to have an interval of, for
example, seven months and three days -- since an interval has no sense
of which seven months.)
(2) The interval qualifier is not optional in the spec.
(3) It seems to me that they were requiring that there be a
one-to-one match between the fields specified in the interval
qualifier and the fields present in the interval string.
(4) I'm not 100% sure on this one, but it seemed to me that they were
requiring year to be four digits and other components (except for
fractional seconds) to be two digits.
So long as they are documented, there's nothing wrong with extensions.
Nothing I saw suggests that legal interval literals would be
misinterpreted.
There is still some unfinished business if anyone wants to make it
really exactly 100% spec compliant. In particular the spec seems to
allow a minus sign *outside* the string literal,
I agree. They go so far as to point out how it should be interpreted
if the minus is present in both allowed locations (both inside and
outside the quotes):
5) The <sign> in a <signed numeric literal> or an <interval literal> is
a monadic arithmetic operator. The
monadic arithmetic operators + and * specify monadic plus and
monadic minus, respectively. If neither
monadic plus nor monadic minus are specified in a <signed numeric
literal> or an <interval literal>, or if
monadic plus is specified, then the literal is positive. If monadic
minus is specified in a <signed numeric
literal> or <interval literal>, then the literal is negative. If
<sign> is specified in both possible locations in
an <interval literal>, then the sign of the literal is determined by
normal mathematical interpretation of
multiple sign operators.
and if I'm reading it
right, a precision spec in combination with field restrictions ought
to
look like INTERVAL '...' DAY TO SECOND(3) not INTERVAL(3) '...' DAY
TO
SECOND.
Agreed.
However, for these you'll get a syntax error instead of
silently wrong answers if you try to use the other syntax, so it's
not
quite as pernicious as the matters addressed here.
Agreed.
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
(4) I'm not 100% sure on this one, but it seemed to me that they
were
requiring year to be four digits and other components (except for
fractional seconds) to be two digits.
That can't be right. Maybe I saw that in datetime literal specs.
Apologies.
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
I am not sure about some of the corner cases --- anyone want to see if
their understanding of the spec for <interval string> is different?
The patch seems to support extensions to the standard.
Right. All of these were extensions that already existed in PG.
(3) It seems to me that they were requiring that there be a
one-to-one match between the fields specified in the interval
qualifier and the fields present in the interval string.
Yeah. I couldn't actually find any such statement in SQL92, but
the SQL:2008 draft has this at 5.3 rule 30:
30)Let N be the number of <primary datetime field>s in the precision of
the <interval literal>, as specified by <interval qualifier>.
The <interval literal> being defined shall contain N datetime components.
and at rule 34:
34)Within the definition of an <interval literal> that contains a
<year-month literal>, the <interval qualifier> shall not specify DAY,
HOUR, MINUTE, or SECOND. Within the definition of an <interval
literal> that contains a <day-time literal>, the <interval qualifier>
shall not specify YEAR or MONTH.
This seems to be requiring that not only do you give the exact number of
components, but the formatting must match the expectation. So anything
we accept beyond that is gravy. I think that most of the "extension"
cases were already being accepted in some form, and I'd be hesitant
to take them out for fear of breaking existing applications.
There is still some unfinished business if anyone wants to make it
really exactly 100% spec compliant ...
I agree.
I committed the patch as presented, and I think I might go take a quick
look at the other two issues. What I suspect I'll find is that the
minus sign issue isn't fixable without turning INTERVAL into a fully
reserved word, which is probably a cure worse than the disease. However
it might be fairly easy to get the grammar to allow the precision in
either place. (We'd want to keep the old way working for backward
compatibility.)
regards, tom lane
(1) In the spec, an interval value or literal must be either
year-month or day-time. (I guess they didn't want to try to deal with
the sticky issues of what it means to have an interval of, for
example, seven months and three days -- since an interval has no sense
of which seven months.)
Note that, for usability reasons, Karel some time ago try-partitioned
our intervals: year-month|day-week|hour-min-sec.
--Josh
Tom Lane wrote:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Tom Lane <tgl@sss.pgh.pa.us> wrote:
I am not sure about some of the corner cases --- anyone want to see if
their understanding of the spec for <interval string> is different?The patch seems to support extensions to the standard.
Right. All of these were extensions that already existed in PG.
Back a while ago (2003) there was some talk about replacing
some of the non-standard extensions with shorthand forms of
intervals with ISO 8601 intervals that have a similar but
not-the-same shorthand.
Interval ISO Postgres
8601 shorthand
-----------------------------------------------------
'1 year 1 minute' 'P1YT1M' '1Y1M'
'1 year 1 month' 'P1Y1M' N/A
http://archives.postgresql.org/pgsql-patches/2003-09/msg00119.php
When I proposed we support the ISO-8601 standard shorthand,
Tom recommended we rip out the old shorthand at the same time:
http://archives.postgresql.org/pgsql-patches/2003-09/msg00134.php
I've been maintaining a patch that supports these ISO-8601
intervals for a client. Back in 2003 I recall Peter said
he wanted to see SQL standard intervals first. There were
also discussions about selecting the output format. My old
patch made this depend on datestyle; but it seems Tom preferred
a separate GUC?
http://archives.postgresql.org/pgsql-patches/2003-12/msg00257.php
I see there's a TODO regarding ISO8601 intervals as well.
I have a version of this patch for 8.2; but would be happy
to bring it up-to-date if people want to re-consider it now.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
Back a while ago (2003) there was some talk about replacing
some of the non-standard extensions with shorthand forms of
intervals with ISO 8601 intervals that have a similar but
not-the-same shorthand.
I think *replacement* would be a hard sell, as that would tick off all
the existing users ;-). Now it seems like being able to accept either
the 8601 syntax or the existing syntaxes on input wouldn't be tough
at all, if you insist on the P prefix to distinguish; so that end of
it should be easy enough. On the output side, seems like a GUC variable
is the standard precedent here. I'd still vote against overloading
DateStyle --- it does too much already --- but a separate variable for
interval style wouldn't bother me. In fact, given that we are now
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.
regards, tom lane
Tom Lane wrote:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
Back a while ago (2003) there was some talk about replacing
some of the non-standard extensions with shorthand forms of
intervals with ISO 8601 intervals that have a similar but
not-the-same shorthand.I think *replacement* would be a hard sell, as that would tick off all
the existing users ;-). Now it seems like being able to accept either
I originally submitted a patch that supported both, and I think
you suggested replacing on the grounds that the old one was
never documented,
http://archives.postgresql.org/pgsql-patches/2003-09/msg00134.php
"If we're going to support the real ISO spec, I'd suggest ripping
out any not-quite-there variant.
http://archives.postgresql.org/pgsql-patches/2003-09/msg00121.php
"I doubt anyone is using it, because it's completely undocumented."
On the other hand, the company I was at was indeed originally
using it, so I prefer that it stay in as well. Perhaps if
there's a way to mark them as deprecated and post warnings in
the log file if they're used. I think they should be
removed eventually in a few releases, because they're quite
confusing as they stand:
Interval ISO Postgres
8601 shorthand
-----------------------------------------------------
'1 year 1 minute' 'P1YT1M' '1Y1M'
'1 year 1 month' 'P1Y1M' N/A
the 8601 syntax or the existing syntaxes on input wouldn't be tough
at all, if you insist on the P prefix to distinguish; so that end of
ISO 8601 seems to me to require the P, so I think we would.
it should be easy enough. On the output side, seems like a GUC variable
is the standard precedent here. I'd still vote against overloading
DateStyle --- it does too much already --- but a separate variable for
interval style wouldn't bother me. In fact, given that we are now
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.
Great. I'm bringing my patch up-to-date with CVS now
and adding the separate GUC.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
Tom Lane wrote:
I think *replacement* would be a hard sell, as that would tick off all
the existing users ;-). Now it seems like being able to accept either
I originally submitted a patch that supported both, and I think
you suggested replacing on the grounds that the old one was
never documented,
Yeah, but that was five years ago, and someone remedied the oversight
since then ...
The other problem is that the SQL spec clearly defines an interval
literal syntax, and it's not this ISO thing. So even without backward
compatibility issues, 8601-only doesn't seem like it would fly.
regards, tom lane
Tom Lane wrote:
The other problem is that the SQL spec clearly defines an interval
literal syntax, and it's not this ISO thing. So even without backward
compatibility issues, 8601-only doesn't seem like it would fly.
Oh. I wasn't proposing 8601-only. Just the one-character
shorthands like '1Y1M'::interval that postgresql interprets
as "1 year one minute". No standard specifies anything close
to that; and any similar standards ask to interpret that M as
months instead of minutes.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
Oh. I wasn't proposing 8601-only. Just the one-character
shorthands like '1Y1M'::interval that postgresql interprets
as "1 year one minute". No standard specifies anything close
to that; and any similar standards ask to interpret that M as
months instead of minutes.
Hmmm. I would say that the problem with that is not that it's
nonstandard but that it's ambiguous. Our documentation about the
interval type says:
interval values can be written with the following syntax:
[@] quantity unit [quantity unit...] [direction]
Where: quantity is a number (possibly signed); unit is microsecond,
millisecond, second, minute, hour, day, week, month, year, decade,
century, millennium, or abbreviations or plurals of these units;
direction can be ago or empty. The at sign (@) is optional noise. The
amounts of different units are implicitly added up with appropriate
sign accounting. ago negates all the fields.
There isn't anything there that would suggest to a user that 'm' is
well-defined as a unit, much less that it specifically means "minute"
rather than one of the other options. What if we just tweak the code to
reject ambiguous abbreviations?
[ experiments a bit... ] Another interesting point is that "mo",
which is a perfectly unique abbreviation, is rejected. Seems like
the handling of abbreviations in this code could be improved.
regards, tom lane
Tom Lane wrote:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
'1Y1M'::interval ... minute ... month
Hmmm. I would say that the problem with that is not that it's
nonstandard but that it's ambiguous.
Ah yes.
Our documentation...says..."or abbreviations".
...What if we just tweak the code to
reject ambiguous abbreviations?
Good idea. I'll try that.
[ experiments a bit... ] Another interesting point is that "mo",
which is a perfectly unique abbreviation, is rejected. Seems like
the handling of abbreviations in this code could be improved.
It looks like rather than abbreviations being any shorter
form of a unit, it has an explicit list of abbreviations
it likes (deltatktbl) in the beginning of datetime.c that
forces "m" to "minute"? So losing the ambiguous ones
should be very easy.
Tom Lane wrote:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
... ISO 8601 intervals ...
On the output side, seems like a GUC variable
is the standard precedent here. I'd still vote against overloading
DateStyle --- it does too much already --- but a separate variable for
interval style wouldn't bother me. In fact, given that we are now
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.
Is it OK that this seems to me it wouldn't be backward compatible
with the current interval_out that looks to me to be using
the DateStyle GUC?
I supposed it could be made backward compatible if the new
IntervalStyle GUC defaulted to a value of "guess_from_datestyle",
but I fear an option like that might add rather than remove
confusion.
Tom Lane wrote:
somewhat SQL-compliant on interval input, a GUC that selected
PG traditional, SQL-standard, or ISO 8601 interval output format seems
like it could be a good idea.
Trying to do the SQL-standard output now, and have a question
of what to do in the SQL-standard mode when trying to output
an interval that as both a YEAR and a DAY component.
AFAICT the SQL standard doesn't let you have both, so the
"SQL-standard" output actually won't be.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
Trying to do the SQL-standard output now, and have a question
of what to do in the SQL-standard mode when trying to output
an interval that as both a YEAR and a DAY component.
AFAICT the SQL standard doesn't let you have both, so the
"SQL-standard" output actually won't be.
The reason it's not SQL-standard is the data value isn't.
So not a problem. Someone conforming to the spec limits on
what he puts in will see spec-compliant output. I think all
you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0? otherwise your output is just dd which
is uncomfortably ambiguous).
regards, tom lane
Tom Lane wrote:
The reason it's not SQL-standard is the data value isn't.
So not a problem. Someone conforming to the spec limits on
what he puts in will see spec-compliant output. I think all
you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0? otherwise your output is just dd which
is uncomfortably ambiguous).
Great. That's what I'll do.
Any convention or preference on the naming of the GUC?
I assume "intervalstyle" is reasonable?
Or thoughts regarding the current EncodeInterval() that's
already using the "datestyle" GUC?
pg82=# select interval '1';
interval
----------
00:00:01
(1 row)
pg82=# set datestyle='sql';
SET
pg82=# select interval '1';
interval
----------
@ 1 sec
(1 row)
Ron Mayer wrote:
Tom Lane wrote:
you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0? otherwise your output is just dd which
is uncomfortably ambiguous).
Oh, and if both parts are 0, I guess we desire
the (more comfortable than the alternatives) '0'?
Tom Lane wrote:
I think all
you need is 'yyy-mm dd hh:mm:ss' where you omit yyy-mm if
zeroes, omit dd if zero, omit hh:mm:ss if zeroes (but maybe
only if dd is also 0? otherwise your output is just dd which
is uncomfortably ambiguous).
Cool. I think I have it pretty much working with a new
GUC "intervalstyle" that can take values of
"sql_standard" that I think will output SQL standard
interval literals when given a sql
standard interval.
"iso_8601" that will output ISO 8601 "Time Intervals" of
the "format with time-unit deignators", and
"backward_compatible" that will output the same thing
that postgres currently does that depends
on the value of the DateStyle GUC.
I'll add the documentation and regression tests and
can submit a patch early next week. Oh. One more
question is that under ecpg there seems to be a fair
amount of near-duplicated code (EncodeDateTime,
EncodeInterval) for turning dates and times and
intervals to strings.
Should those ECPG functions be made identical to
the ones in the backend?
Could those somehow share code with the backend for
some of their work?
Anyway - here's a quick test of the
SQL Standard and ISO interval output as it stands
right now...
regression=# drop table test_intervals;
DROP TABLE
regression=# create temporary table test_intervals (i interval);
CREATE TABLE
regression=# insert into test_intervals values
regression-# ('0 years'),
regression-# ('1 year 1 month'),
regression-# ('1 day 2 hours 3 minutes 4 seconds'),
regression-# ('1 year 1 minute');
INSERT 0 4
regression=#
regression=# insert into test_intervals values
regression-# ('1-1'),
regression-# ('1'),
regression-# (interval '1' year),
regression-# ('1:00:00'),
regression-# ('1 1:02:03');
INSERT 0 5
regression=#
regression=# insert into test_intervals values
regression-# ('P1Y1M'),
regression-# ('P1DT1H1M1S'),
regression-# ('PT1S');
INSERT 0 3
regression=#
regression=# set intervalstyle to sql_standard;
SET
regression=# select * from test_intervals;
i
-------------
0
1-1
1 2:3:4
1-0 0 0:1:0
1-1
0:0:1
1-0
1:0:0
1 1:2:3
1-1
1 1:1:1
0:0:1
(12 rows)
regression=#
regression=# set intervalstyle to iso_8601;
SET
regression=# select * from test_intervals;
i
------------
PT0S
P1Y1M
P1DT2H3M4S
P1YT1M
P1Y1M
PT1S
P1Y
PT1H
P1DT1H2M3S
P1Y1M
P1DT1H1M1S
PT1S
(12 rows)
regression=#
regression=# set intervalstyle to backward_compatible;
SET
regression=# set datestyle to sql;
SET
regression=# select * from test_intervals;
i
-------------------------------
@ 0
@ 1 year 1 mon
@ 1 day 2 hours 3 mins 4 secs
@ 1 year 1 min
@ 1 year 1 mon
@ 1 sec
@ 1 year
@ 1 hour
@ 1 day 1 hour 2 mins 3 secs
@ 1 year 1 mon
@ 1 day 1 hour 1 min 1 sec
@ 1 sec
(12 rows)
regression=# set datestyle to iso;
SET
regression=# select * from test_intervals;
i
-----------------
00:00:00
1 year 1 mon
1 day 02:03:04
1 year 00:01:00
1 year 1 mon
00:00:01
1 year
01:00:00
1 day 01:02:03
1 year 1 mon
1 day 01:01:01
00:00:01
(12 rows)
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
Cool. I think I have it pretty much working with a new
GUC "intervalstyle" that can take values of
"sql_standard" that I think will output SQL standard
interval literals when given a sql
standard interval.
"iso_8601" that will output ISO 8601 "Time Intervals" of
the "format with time-unit deignators", and
"backward_compatible" that will output the same thing
that postgres currently does that depends
on the value of the DateStyle GUC.
Actually, we have never considered that new releases need to preserve
the behavior of postgresql.conf settings. So the above seems
unnecessarily baroque. How about decoupling interval_out's behavior
from DateStyle altogether, and instead providing values of IntervalStyle
that match all the previous behaviors?
Should those ECPG functions be made identical to
the ones in the backend?
The ECPG situation is a mess :-(. That code was forked off from the
backend some time ago, and has not been well maintained at all. If you
are brave enough to tackle that mess, more power to you; but I strongly
suggest doing it as an independent patch.
Could those somehow share code with the backend for
some of their work?
The palloc and elog dependencies seem to be the hard part.
regards, tom lane
Tom Lane wrote:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
interval ... "sql_standard"..."iso_8601"...
"backward_compatible" ...depends... on ... DateStyle......How about decoupling interval_out's behavior
from DateStyle altogether, and instead providing values of IntervalStyle
that match all the previous behaviors?
Great. That seems much more sane.
Any desired names for the existing interval styles?
Currently we output intervals in these two styles:
'1 year 2 mons 3 days 04:05:06'
when the DateStyle is iso.
and
'@ 1 year 2 mons 3 days 4 hours 5 mins 6 secs'
when the DateStyle is sql or postgres, etc.
I'm not quite sure where those styles came from so
don't know what good names for them might be.
Should those ECPG functions be made identical ...
...
The palloc and elog dependencies seem to be the hard part.
Interesting. So EncodeDateTime and EncodeInterval, guessing 400 or
so lines, seem sharable since at first glance they either already do or
easily could make their callers deal with all allocation and logging.
Agreed that it's a independent patch that I'll try separately.