DecodeInterval fixes
Hi all,
This patch does three things in the DecodeInterval function:
1) Removes dead code for handling unit type RESERVE. There used to be
a unit called "invalid" that was of type RESERVE. At some point that
unit was removed and there were no more units of type RESERVE.
Therefore, the code for RESERVE unit handling is unreachable.
2) Restrict the unit "ago" to only appear at the end of the
interval. According to the docs [0]https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT, this is the only valid place to
put it, but we allowed it multiple times at any point in the input.
3) Error when the user has multiple consecutive units or a unit without
an accompanying value. I spent a lot of time trying to come up with
robust ways to detect this and ultimately settled on using the "type"
field. I'm not entirely happy with this approach, because it involves
having to look ahead to the next field in a couple of places. The other
approach I was considering was to introduce a boolean flag called
"unhandled_unit". After parsing a unit it would be set to true, after
applying the unit to a number it would be set to false. If it was true
right before parsing a unit, then we would error. Please let me know
if you have any suggestions here.
There is one more problem I noticed, but didn't fix. We allow multiple
"@" to be sprinkled anywhere in the input, even though the docs [0]https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
only allow it to appear at the beginning of the input. For example,
the following query works fine:
# SELECT INTERVAL '1 @ year @ @ @ 6 days @ 10 @ minutes';
interval
------------------------
1 year 6 days 00:10:00
(1 row)
Unfortunately, this can't be fixed in DecodeInterval, because all of
the "@" fields are filtered out before this method. Additionally, I
believe this means that the lines
if (type == IGNORE_DTF)
continue;
in DecodeInterval, that appears right after decoding the units, are
unreachable since
"@" is the only unit of type IGNORE_DTF. Since "@" is filtered out,
we'll never decode a unit of type IGNORE_DTF.
For reference, I previously merged a couple similar patches to this
one, but for other date-time types [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5b3c5953553bb9fb0b171abc6041e7c7e9ca5b4d, [2]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bcc704b52490492e6bd73c4444056b3e9644504d.
Thanks,
Joe Koshakow
[0]: https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5b3c5953553bb9fb0b171abc6041e7c7e9ca5b4d
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5b3c5953553bb9fb0b171abc6041e7c7e9ca5b4d
[2]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bcc704b52490492e6bd73c4444056b3e9644504d
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bcc704b52490492e6bd73c4444056b3e9644504d
Attachments:
v1-0001-Fix-interval-decode-handling-of-invalid-intervals.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-interval-decode-handling-of-invalid-intervals.patchDownload
From 4c5641f15e5409ef5973a5f305352018f06da538 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow <koshy44@gmail.com>
Date: Sun, 9 Apr 2023 20:37:27 -0400
Subject: [PATCH] Fix interval decode handling of invalid intervals
This patch does three things in the DecodeInterval function:
1) Removes dead code for handling unit type RESERVE. There used to be
a unit called "invalid" that was of type RESERVE. At some point that
unit was removed and there were no more units of type RESERVE.
Therefore, the code for RESERVE unit handling is unreachable.
2) Restrict the unit "ago" to only appear at the end of the
interval. According to the docs [0], this is the only valid place to
put it, but we allowed it multiple times at any point in the input.
3) Error when the user has multiple consecutive units or a unit without
an accompanying value.
[0] https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
---
src/backend/utils/adt/datetime.c | 55 +++++++++++++++++++-------
src/test/regress/expected/interval.out | 18 +++++++++
src/test/regress/sql/interval.sql | 7 ++++
3 files changed, 65 insertions(+), 15 deletions(-)
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index be2e55bb29..17fc0d45ea 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3335,7 +3335,8 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
if (force_negative &&
itm_in->tm_usec > 0)
itm_in->tm_usec = -itm_in->tm_usec;
- type = DTK_DAY;
+ if (i != 0 && ftype[i - 1] != DTK_STRING && ftype[i - 1] != DTK_SPECIAL)
+ type = DTK_DAY;
break;
case DTK_TZ:
@@ -3372,7 +3373,8 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
* specified. This handles the case of '1 +02:03' since we
* are reading right to left.
*/
- type = DTK_DAY;
+ if (i != 0 && ftype[i - 1] != DTK_STRING && ftype[i - 1] != DTK_SPECIAL)
+ type = DTK_DAY;
break;
}
@@ -3475,12 +3477,14 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
if (!AdjustMicroseconds(val, fval, 1, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(MICROSECOND);
+ type = IGNORE_DTF;
break;
case DTK_MILLISEC:
if (!AdjustMicroseconds(val, fval, 1000, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(MILLISECOND);
+ type = IGNORE_DTF;
break;
case DTK_SECOND:
@@ -3495,19 +3499,24 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
tmask = DTK_M(SECOND);
else
tmask = DTK_ALL_SECS_M;
+ type = IGNORE_DTF;
break;
case DTK_MINUTE:
if (!AdjustMicroseconds(val, fval, USECS_PER_MINUTE, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(MINUTE);
+ type = IGNORE_DTF;
break;
case DTK_HOUR:
if (!AdjustMicroseconds(val, fval, USECS_PER_HOUR, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(HOUR);
- type = DTK_DAY; /* set for next field */
+ if (i != 0 && ftype[i - 1] != DTK_STRING && ftype[i - 1] != DTK_SPECIAL)
+ type = DTK_DAY; /* set for next field */
+ else
+ type = IGNORE_DTF;
break;
case DTK_DAY:
@@ -3515,6 +3524,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
!AdjustFractMicroseconds(fval, USECS_PER_DAY, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(DAY);
+ type = IGNORE_DTF;
break;
case DTK_WEEK:
@@ -3522,6 +3532,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
!AdjustFractDays(fval, 7, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(WEEK);
+ type = IGNORE_DTF;
break;
case DTK_MONTH:
@@ -3529,6 +3540,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
!AdjustFractDays(fval, DAYS_PER_MONTH, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(MONTH);
+ type = IGNORE_DTF;
break;
case DTK_YEAR:
@@ -3536,6 +3548,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
!AdjustFractYears(fval, 1, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(YEAR);
+ type = IGNORE_DTF;
break;
case DTK_DECADE:
@@ -3543,6 +3556,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
!AdjustFractYears(fval, 10, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(DECADE);
+ type = IGNORE_DTF;
break;
case DTK_CENTURY:
@@ -3550,6 +3564,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
!AdjustFractYears(fval, 100, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(CENTURY);
+ type = IGNORE_DTF;
break;
case DTK_MILLENNIUM:
@@ -3557,6 +3572,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
!AdjustFractYears(fval, 1000, itm_in))
return DTERR_FIELD_OVERFLOW;
tmask = DTK_M(MILLENNIUM);
+ type = IGNORE_DTF;
break;
default:
@@ -3566,6 +3582,9 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
case DTK_STRING:
case DTK_SPECIAL:
+ /* reject consecutive unhandled units */
+ if (type != IGNORE_DTF)
+ return DTERR_BAD_FORMAT;
type = DecodeUnits(i, field[i], &uval);
if (type == IGNORE_DTF)
continue;
@@ -3578,13 +3597,15 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
break;
case AGO:
- is_before = true;
- type = uval;
- break;
- case RESERV:
- tmask = (DTK_DATE_M | DTK_TIME_M);
- *dtype = uval;
+ /*
+ * 'ago' is only allowed to appear at the end of the
+ * interval.
+ */
+ if (i != nf - 1)
+ return DTERR_BAD_FORMAT;
+ is_before = true;
+ type = IGNORE_DTF;
break;
default:
@@ -3605,6 +3626,10 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
if (fmask == 0)
return DTERR_BAD_FORMAT;
+ /* reject if unit appeared and was never handled */
+ if (type != IGNORE_DTF)
+ return DTERR_BAD_FORMAT;
+
/* finally, AGO negates everything */
if (is_before)
{
@@ -4482,17 +4507,17 @@ EncodeInterval(struct pg_itm *itm, int style, char *str)
case INTSTYLE_SQL_STANDARD:
{
bool has_negative = year < 0 || mon < 0 ||
- mday < 0 || hour < 0 ||
- min < 0 || sec < 0 || fsec < 0;
+ mday < 0 || hour < 0 ||
+ min < 0 || sec < 0 || fsec < 0;
bool has_positive = year > 0 || mon > 0 ||
- mday > 0 || hour > 0 ||
- min > 0 || sec > 0 || fsec > 0;
+ mday > 0 || hour > 0 ||
+ min > 0 || sec > 0 || fsec > 0;
bool has_year_month = year != 0 || mon != 0;
bool has_day_time = mday != 0 || hour != 0 ||
- min != 0 || sec != 0 || fsec != 0;
+ min != 0 || sec != 0 || fsec != 0;
bool has_day = mday != 0;
bool sql_standard_value = !(has_negative && has_positive) &&
- !(has_year_month && has_day_time);
+ !(has_year_month && has_day_time);
/*
* SQL Standard wants only 1 "<sign>" preceding the whole
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index 28b71d9681..7aba799351 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -1787,3 +1787,21 @@ SELECT extract(epoch from interval '1000000000 days');
86400000000000.000000
(1 row)
+-- test that ago can only appear once at the end of the interval.
+SELECT INTERVAL '42 days 2 seconds ago ago';
+ERROR: invalid input syntax for type interval: "42 days 2 seconds ago ago"
+LINE 1: SELECT INTERVAL '42 days 2 seconds ago ago';
+ ^
+SELECT INTERVAL '2 minutes ago 5 days';
+ERROR: invalid input syntax for type interval: "2 minutes ago 5 days"
+LINE 1: SELECT INTERVAL '2 minutes ago 5 days';
+ ^
+-- test that consecutive and dangling units are not allowed.
+SELECT INTERVAL 'hour 5 months';
+ERROR: invalid input syntax for type interval: "hour 5 months"
+LINE 1: SELECT INTERVAL 'hour 5 months';
+ ^
+SELECT INTERVAL '1 year months days 5 hours';
+ERROR: invalid input syntax for type interval: "1 year months days 5 hours"
+LINE 1: SELECT INTERVAL '1 year months days 5 hours';
+ ^
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 56feda1a3d..b228be14fd 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -582,3 +582,10 @@ SELECT f1,
-- internal overflow test case
SELECT extract(epoch from interval '1000000000 days');
+
+-- test that ago can only appear once at the end of the interval.
+SELECT INTERVAL '42 days 2 seconds ago ago';
+SELECT INTERVAL '2 minutes ago 5 days';
+-- test that consecutive and dangling units are not allowed.
+SELECT INTERVAL 'hour 5 months';
+SELECT INTERVAL '1 year months days 5 hours';
--
2.34.1
Hi Joe, here's a partial review:
On Sun, Apr 9, 2023 at 5:44 PM Joseph Koshakow <koshy44@gmail.com> wrote:
1) Removes dead code for handling unit type RESERVE.
Looks good. From a quick skim it looks like the ECPG copy of this code
(ecpg/pgtypeslib/interval.c) might need to be updated as well?
2) Restrict the unit "ago" to only appear at the end of the
interval. According to the docs [0], this is the only valid place to
put it, but we allowed it multiple times at any point in the input.
Also looks reasonable to me. (Same note re: ECPG.)
3) Error when the user has multiple consecutive units or a unit without
an accompanying value. I spent a lot of time trying to come up with
robust ways to detect this and ultimately settled on using the "type"
field. I'm not entirely happy with this approach, because it involves
having to look ahead to the next field in a couple of places. The other
approach I was considering was to introduce a boolean flag called
"unhandled_unit". After parsing a unit it would be set to true, after
applying the unit to a number it would be set to false. If it was true
right before parsing a unit, then we would error. Please let me know
if you have any suggestions here.
I'm new to this code, but I agree that the use of `type` and the
lookahead are not particularly obvious/intuitive. At the very least,
they'd need some more explanation in the code. Your boolean flag idea
sounds reasonable, though.
There is one more problem I noticed, but didn't fix. We allow multiple
"@" to be sprinkled anywhere in the input, even though the docs [0]
only allow it to appear at the beginning of the input.
(No particular opinion on this.)
It looks like this patch needs a rebase for the CI, too, but there are
no conflicts.
Thanks!
--Jacob
Jacob Champion <jchampion@timescale.com> writes:
Hi Joe, here's a partial review:
On Sun, Apr 9, 2023 at 5:44 PM Joseph Koshakow <koshy44@gmail.com> wrote:1) Removes dead code for handling unit type RESERVE.
Looks good. From a quick skim it looks like the ECPG copy of this code
(ecpg/pgtypeslib/interval.c) might need to be updated as well?
The ECPG datetime datatype support code has been basically unmaintained
for years, and has diverged quite far at this point from the core code.
I wouldn't expect that a patch to the core code necessarily applies
easily to ECPG, nor would I expect somebody patching the core to bother
trying.
Perhaps modernizing/resyncing that ECPG code would be a worthwhile
undertaking, but it'd be a mighty large one, and I'm not sure about
the size of the return. In the meantime, benign neglect is the policy.
regards, tom lane
Jacob Champion <jchampion@timescale.com> writes:
Hi Joe, here's a partial review:
Thanks so much for the review!
I'm new to this code, but I agree that the use of `type` and the
lookahead are not particularly obvious/intuitive. At the very least,
they'd need some more explanation in the code. Your boolean flag idea
sounds reasonable, though.
I've updated the patch with the boolean flag idea. I think it's a
bit cleaner and more readable.
There is one more problem I noticed, but didn't fix. We allow multiple
"@" to be sprinkled anywhere in the input, even though the docs [0]
only allow it to appear at the beginning of the input.(No particular opinion on this.)
I looked into this a bit. The reason this works is because the date
time lexer filters out all punctuation. That's what allows us to parse
things like `SELECT date 'January 8, 1999';`. It's probably not worth
trying to be smarter about what punctuation we allow where, at least
for now. Maybe in the future we can exclude "@" from the punctuation
that get's filtered out.
It looks like this patch needs a rebase for the CI, too, but there are
no conflicts.
The attached patch is rebased against master.
Thanks,
Joe Koshakow
Attachments:
v2-0001-Fix-interval-decode-handling-of-invalid-intervals.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-interval-decode-handling-of-invalid-intervals.patchDownload
From eee98dd65c3556528803b6ee2cab10e9ece8d871 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow <koshy44@gmail.com>
Date: Sun, 9 Apr 2023 20:37:27 -0400
Subject: [PATCH] Fix interval decode handling of invalid intervals
This patch does three things in the DecodeInterval function:
1) Removes dead code for handling unit type RESERVE. There used to be
a unit called "invalid" that was of type RESERVE. At some point that
unit was removed and there were no more units of type RESERVE.
Therefore, the code for RESERVE unit handling is unreachable.
2) Restrict the unit "ago" to only appear at the end of the
interval. According to the docs [0], this is the only valid place to
put it, but we allowed it multiple times at any point in the input.
3) Error when the user has multiple consecutive units or a unit without
an accompanying value.
[0] https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
---
src/backend/utils/adt/datetime.c | 25 +++++++++++++++++++------
src/test/regress/expected/interval.out | 18 ++++++++++++++++++
src/test/regress/sql/interval.sql | 7 +++++++
3 files changed, 44 insertions(+), 6 deletions(-)
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 5d8d583ddc..b930a67007 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3278,6 +3278,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
{
bool force_negative = false;
bool is_before = false;
+ bool parsing_unit_val = false;
char *cp;
int fmask = 0,
tmask,
@@ -3336,6 +3337,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
itm_in->tm_usec > 0)
itm_in->tm_usec = -itm_in->tm_usec;
type = DTK_DAY;
+ parsing_unit_val = false;
break;
case DTK_TZ:
@@ -3373,6 +3375,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
* are reading right to left.
*/
type = DTK_DAY;
+ parsing_unit_val = false;
break;
}
@@ -3562,10 +3565,14 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
default:
return DTERR_BAD_FORMAT;
}
+ parsing_unit_val = false;
break;
case DTK_STRING:
case DTK_SPECIAL:
+ /* reject consecutive unhandled units */
+ if (parsing_unit_val)
+ return DTERR_BAD_FORMAT;
type = DecodeUnits(i, field[i], &uval);
if (type == IGNORE_DTF)
continue;
@@ -3575,16 +3582,18 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
{
case UNITS:
type = uval;
+ parsing_unit_val = true;
break;
case AGO:
- is_before = true;
- type = uval;
- break;
- case RESERV:
- tmask = (DTK_DATE_M | DTK_TIME_M);
- *dtype = uval;
+ /*
+ * 'ago' is only allowed to appear at the end of the
+ * interval.
+ */
+ if (i != nf - 1)
+ return DTERR_BAD_FORMAT;
+ is_before = true;
break;
default:
@@ -3605,6 +3614,10 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
if (fmask == 0)
return DTERR_BAD_FORMAT;
+ /* reject if unit appeared and was never handled */
+ if (parsing_unit_val)
+ return DTERR_BAD_FORMAT;
+
/* finally, AGO negates everything */
if (is_before)
{
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index 28b71d9681..7aba799351 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -1787,3 +1787,21 @@ SELECT extract(epoch from interval '1000000000 days');
86400000000000.000000
(1 row)
+-- test that ago can only appear once at the end of the interval.
+SELECT INTERVAL '42 days 2 seconds ago ago';
+ERROR: invalid input syntax for type interval: "42 days 2 seconds ago ago"
+LINE 1: SELECT INTERVAL '42 days 2 seconds ago ago';
+ ^
+SELECT INTERVAL '2 minutes ago 5 days';
+ERROR: invalid input syntax for type interval: "2 minutes ago 5 days"
+LINE 1: SELECT INTERVAL '2 minutes ago 5 days';
+ ^
+-- test that consecutive and dangling units are not allowed.
+SELECT INTERVAL 'hour 5 months';
+ERROR: invalid input syntax for type interval: "hour 5 months"
+LINE 1: SELECT INTERVAL 'hour 5 months';
+ ^
+SELECT INTERVAL '1 year months days 5 hours';
+ERROR: invalid input syntax for type interval: "1 year months days 5 hours"
+LINE 1: SELECT INTERVAL '1 year months days 5 hours';
+ ^
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 56feda1a3d..b228be14fd 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -582,3 +582,10 @@ SELECT f1,
-- internal overflow test case
SELECT extract(epoch from interval '1000000000 days');
+
+-- test that ago can only appear once at the end of the interval.
+SELECT INTERVAL '42 days 2 seconds ago ago';
+SELECT INTERVAL '2 minutes ago 5 days';
+-- test that consecutive and dangling units are not allowed.
+SELECT INTERVAL 'hour 5 months';
+SELECT INTERVAL '1 year months days 5 hours';
--
2.34.1
On Fri, Jul 7, 2023 at 4:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The ECPG datetime datatype support code has been basically unmaintained
for years, and has diverged quite far at this point from the core code.
I was under the impression that anything in the postgresql.git
repository is considered core, and hence maintained as one unit, from
release to release. An example of this, to me, were all the contrib/*
modules.
I wouldn't expect that a patch to the core code necessarily applies
easily to ECPG, nor would I expect somebody patching the core to bother
trying.
The above statement makes me think that only the code inside
src/backend/ is considered core. Is that the right assumption?
Perhaps modernizing/resyncing that ECPG code would be a worthwhile
undertaking, but it'd be a mighty large one, and I'm not sure about
the size of the return. In the meantime, benign neglect is the policy.
Benign neglect doesn't sound nice from a user's/consumer's
perspective. Can it be labeled (i.e. declared as such in docs) as
deprecated.
Knowing that the tool you use has now been deprecated would be a
better message for someone still using it, even if it was left marked
deprecated indefinitely. Discovering benign neglect for the tool you
depend on, from secondary sources (like this list, forums, etc.), does
not evoke a lot of confidence.
Best regards,
Gurjeet
http://Gurje.et
Gurjeet Singh <gurjeet@singh.im> writes:
On Fri, Jul 7, 2023 at 4:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The ECPG datetime datatype support code has been basically unmaintained
for years, and has diverged quite far at this point from the core code.
I was under the impression that anything in the postgresql.git
repository is considered core, and hence maintained as one unit, from
release to release.
When I say that ecpglib is next door to unmaintained, I'm just stating
the facts on the ground; project policy is irrelevant. That situation
is not likely to change until somebody steps up to do (a lot of) work
on it, which is probably unlikely to happen unless we start getting
actual complaints from ECPG users. For the meantime, what's there now
seems to be satisfactory to whoever is using it ... which might be
nobody?
In any case, you don't have to look far to notice that some parts of
the tree are maintained far more actively than others. ecpglib is
just one of the more identifiable bits that's receiving little love.
The quality of the code under contrib/ is wildly variable, and even
the server code itself has backwaters. (For instance, the "bit" types,
which aren't even in the standard anymore; or the geometric types,
or "money".)
By and large, I don't see this unevenness of maintenance effort as
a problem. It's more like directing our limited resources into the
most useful places. Code that isn't getting worked on is either not
used at all by anybody, or it serves the needs of those who use it
well enough already. Since it's difficult to tell which of those
cases applies, removing code just because it's not been improved
lately is a hard choice to sell. But so is putting maintenance effort
into code that there might be little audience for. In the end we
solve this via the principle of "scratch your own itch": if somebody
is concerned enough about a particular piece of code to put their own
time into improving it, then great, it'll get improved.
Benign neglect doesn't sound nice from a user's/consumer's
perspective. Can it be labeled (i.e. declared as such in docs) as
deprecated.
Deprecation would imply that we're planning to remove it, which
we are not.
regards, tom lane
On Sat, Jul 8, 2023 at 1:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gurjeet Singh <gurjeet@singh.im> writes:
On Fri, Jul 7, 2023 at 4:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
The ECPG datetime datatype support code has been basically unmaintained
for years, and has diverged quite far at this point from the core code.I was under the impression that anything in the postgresql.git
repository is considered core, and hence maintained as one unit, from
release to release.When I say that ecpglib is next door to unmaintained, I'm just stating
the facts on the ground; project policy is irrelevant. That situation
is not likely to change until somebody steps up to do (a lot of) work
on it, which is probably unlikely to happen unless we start getting
actual complaints from ECPG users. For the meantime, what's there now
seems to be satisfactory to whoever is using it ... which might be
nobody?In any case, you don't have to look far to notice that some parts of
the tree are maintained far more actively than others. ecpglib is
just one of the more identifiable bits that's receiving little love.
The quality of the code under contrib/ is wildly variable, and even
the server code itself has backwaters. (For instance, the "bit" types,
which aren't even in the standard anymore; or the geometric types,
or "money".)
Thanks for sharing your view on different parts of the code. This give
a clear direction if someone would be interested in stepping up.
As part of my mentoring a GSoC 2023 participant, last night we were
looking at the TODO wiki page, for something for the mentee to pick up
next. I feel the staleness/deficiencies you mention above are not
captured in the TODO wiki page. It'd be nice if these were documented,
so that newcomers to the community can pick up work that they feel is
an easy lift for them.
By and large, I don't see this unevenness of maintenance effort as
a problem. It's more like directing our limited resources into the
most useful places. Code that isn't getting worked on is either not
used at all by anybody, or it serves the needs of those who use it
well enough already. Since it's difficult to tell which of those
cases applies, removing code just because it's not been improved
lately is a hard choice to sell. But so is putting maintenance effort
into code that there might be little audience for. In the end we
solve this via the principle of "scratch your own itch": if somebody
is concerned enough about a particular piece of code to put their own
time into improving it, then great, it'll get improved.Benign neglect doesn't sound nice from a user's/consumer's
perspective. Can it be labeled (i.e. declared as such in docs) as
deprecated.Deprecation would imply that we're planning to remove it, which
we are not.
Good to know. Sorry I took "benign neglect" to mean that there's no
willingness to improve it. This makes it clear that community wants to
improve and maintain ECPG, it's just a matter of someone volunteering,
and better use of the resources available.
Best regards,
Gurjeet
http://Gurje.et
On Sat, Jul 8, 2023 at 5:06 PM Gurjeet Singh <gurjeet@singh.im> wrote:
I feel the staleness/deficiencies you mention above are not
captured in the TODO wiki page. It'd be nice if these were documented,
so that newcomers to the community can pick up work that they feel is
an easy lift for them.
I think that's a good idea. I've definitely been confused by this in
previous patches I've submitted.
I've broken up this patch into three logical commits and attached them.
None of the actual code has changed.
Thanks,
Joe Koshakow
Attachments:
v3-0001-Remove-dead-code-in-DecodeInterval.patchtext/x-patch; charset=US-ASCII; name=v3-0001-Remove-dead-code-in-DecodeInterval.patchDownload
From b3fe06934b40489d1b4b157677f1292bc698c7da Mon Sep 17 00:00:00 2001
From: Joseph Koshakow <koshy44@gmail.com>
Date: Sun, 9 Jul 2023 13:12:16 -0400
Subject: [PATCH 1/3] Remove dead code in DecodeInterval
This commit removes dead code for handling unit type RESERVE. There
used to be a unit called "invalid" that was of type RESERVE. At some
point that unit was removed and there were no more units of type
RESERVE. Therefore, the code for RESERVE unit handling is unreachable.
---
src/backend/utils/adt/datetime.c | 5 -----
1 file changed, 5 deletions(-)
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 5d8d583ddc..2a5dddc43f 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3582,11 +3582,6 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
type = uval;
break;
- case RESERV:
- tmask = (DTK_DATE_M | DTK_TIME_M);
- *dtype = uval;
- break;
-
default:
return DTERR_BAD_FORMAT;
}
--
2.34.1
v3-0002-Fix-Interval-ago-parsing.patchtext/x-patch; charset=US-ASCII; name=v3-0002-Fix-Interval-ago-parsing.patchDownload
From 6271c5fcca30de0982b4b6073b49c1cea6c7391b Mon Sep 17 00:00:00 2001
From: Joseph Koshakow <koshy44@gmail.com>
Date: Sun, 9 Jul 2023 13:17:08 -0400
Subject: [PATCH 2/3] Fix Interval 'ago' parsing
This commit Restrict the unit "ago" to only appear at the end of the
interval. According to the docs [0], this is the only valid place to
put it, but we allowed it multiple times at any point in the input.
[0] https://www.postgresql.org/docs/15/datatype-datetime.html#DATATYPE-INTERVAL-INPUT
---
src/backend/utils/adt/datetime.c | 6 ++++++
src/test/regress/expected/interval.out | 9 +++++++++
src/test/regress/sql/interval.sql | 4 ++++
3 files changed, 19 insertions(+)
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 2a5dddc43f..9d09381328 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3578,6 +3578,12 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
break;
case AGO:
+ /*
+ * 'ago' is only allowed to appear at the end of the
+ * interval.
+ */
+ if (i != nf - 1)
+ return DTERR_BAD_FORMAT;
is_before = true;
type = uval;
break;
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index 28b71d9681..42062f947f 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -1787,3 +1787,12 @@ SELECT extract(epoch from interval '1000000000 days');
86400000000000.000000
(1 row)
+-- test that ago can only appear once at the end of the interval.
+SELECT INTERVAL '42 days 2 seconds ago ago';
+ERROR: invalid input syntax for type interval: "42 days 2 seconds ago ago"
+LINE 1: SELECT INTERVAL '42 days 2 seconds ago ago';
+ ^
+SELECT INTERVAL '2 minutes ago 5 days';
+ERROR: invalid input syntax for type interval: "2 minutes ago 5 days"
+LINE 1: SELECT INTERVAL '2 minutes ago 5 days';
+ ^
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 56feda1a3d..8fd2e7f41e 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -582,3 +582,7 @@ SELECT f1,
-- internal overflow test case
SELECT extract(epoch from interval '1000000000 days');
+
+-- test that ago can only appear once at the end of the interval.
+SELECT INTERVAL '42 days 2 seconds ago ago';
+SELECT INTERVAL '2 minutes ago 5 days';
--
2.34.1
v3-0003-Fix-Interval-unit-parsing.patchtext/x-patch; charset=US-ASCII; name=v3-0003-Fix-Interval-unit-parsing.patchDownload
From 2ffb81e95031b43955fdba784356fc54659775e2 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow <koshy44@gmail.com>
Date: Sun, 9 Jul 2023 13:21:23 -0400
Subject: [PATCH 3/3] Fix Interval unit parsing
This commit will error when the user has multiple consecutive units or
a unit without an accompanying value.
---
src/backend/utils/adt/datetime.c | 12 ++++++++++++
src/test/regress/expected/interval.out | 9 +++++++++
src/test/regress/sql/interval.sql | 4 ++++
3 files changed, 25 insertions(+)
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 9d09381328..edf22f458e 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3278,6 +3278,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
{
bool force_negative = false;
bool is_before = false;
+ bool parsing_unit_val = false;
char *cp;
int fmask = 0,
tmask,
@@ -3336,6 +3337,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
itm_in->tm_usec > 0)
itm_in->tm_usec = -itm_in->tm_usec;
type = DTK_DAY;
+ parsing_unit_val = false;
break;
case DTK_TZ:
@@ -3373,6 +3375,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
* are reading right to left.
*/
type = DTK_DAY;
+ parsing_unit_val = false;
break;
}
@@ -3562,10 +3565,14 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
default:
return DTERR_BAD_FORMAT;
}
+ parsing_unit_val = false;
break;
case DTK_STRING:
case DTK_SPECIAL:
+ /* reject consecutive unhandled units */
+ if (parsing_unit_val)
+ return DTERR_BAD_FORMAT;
type = DecodeUnits(i, field[i], &uval);
if (type == IGNORE_DTF)
continue;
@@ -3575,6 +3582,7 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
{
case UNITS:
type = uval;
+ parsing_unit_val = true;
break;
case AGO:
@@ -3606,6 +3614,10 @@ DecodeInterval(char **field, int *ftype, int nf, int range,
if (fmask == 0)
return DTERR_BAD_FORMAT;
+ /* reject if unit appeared and was never handled */
+ if (parsing_unit_val)
+ return DTERR_BAD_FORMAT;
+
/* finally, AGO negates everything */
if (is_before)
{
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index 42062f947f..7aba799351 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -1796,3 +1796,12 @@ SELECT INTERVAL '2 minutes ago 5 days';
ERROR: invalid input syntax for type interval: "2 minutes ago 5 days"
LINE 1: SELECT INTERVAL '2 minutes ago 5 days';
^
+-- test that consecutive and dangling units are not allowed.
+SELECT INTERVAL 'hour 5 months';
+ERROR: invalid input syntax for type interval: "hour 5 months"
+LINE 1: SELECT INTERVAL 'hour 5 months';
+ ^
+SELECT INTERVAL '1 year months days 5 hours';
+ERROR: invalid input syntax for type interval: "1 year months days 5 hours"
+LINE 1: SELECT INTERVAL '1 year months days 5 hours';
+ ^
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 8fd2e7f41e..7e1cb92918 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -586,3 +586,7 @@ SELECT extract(epoch from interval '1000000000 days');
-- test that ago can only appear once at the end of the interval.
SELECT INTERVAL '42 days 2 seconds ago ago';
SELECT INTERVAL '2 minutes ago 5 days';
+
+-- test that consecutive and dangling units are not allowed.
+SELECT INTERVAL 'hour 5 months';
+SELECT INTERVAL '1 year months days 5 hours';
--
2.34.1
On Sat, 2023-07-08 at 13:18 -0400, Joseph Koshakow wrote:
Jacob Champion <jchampion@timescale.com> writes:
Hi Joe, here's a partial review:
Thanks so much for the review!
I'm new to this code, but I agree that the use of `type` and the
lookahead are not particularly obvious/intuitive. At the very
least,
they'd need some more explanation in the code. Your boolean flag
idea
sounds reasonable, though.I've updated the patch with the boolean flag idea. I think it's a
bit cleaner and more readable.There is one more problem I noticed, but didn't fix. We allow
multiple
"@" to be sprinkled anywhere in the input, even though the docs
[0]
only allow it to appear at the beginning of the input.(No particular opinion on this.)
I looked into this a bit. The reason this works is because the date
time lexer filters out all punctuation. That's what allows us to
parse
things like `SELECT date 'January 8, 1999';`. It's probably not worth
trying to be smarter about what punctuation we allow where, at least
for now. Maybe in the future we can exclude "@" from the punctuation
that get's filtered out.It looks like this patch needs a rebase for the CI, too, but there
are
no conflicts.The attached patch is rebased against master.
Thanks,
Joe Koshakow
Apologies, I'm posting a little behind the curve here. My initial
thoughts on the original patch mirrored Jacob's re 1 and 2 - that it looked
good, did we need to consider the modified ecpg copy (which has been
answered by Tom). I didn't have have any strong thought re 3) or the '@'.
The updated patch looks good to me. Seems a little clearer/cleaner.
Thanks,
Reid
On Sun, 2023-07-09 at 13:24 -0400, Joseph Koshakow wrote:
I've broken up this patch into three logical commits and attached
them.
None of the actual code has changed.Thanks,
Joe Koshakow
I made a another pass through the separated patches, it looks good to
me.
Thanks,
Reid
On Mon, Jul 10, 2023 at 10:19 AM Reid Thompson <jreidthompson@nc.rr.com> wrote:
I made a another pass through the separated patches, it looks good to
me.
LGTM too. (Thanks Tom for the clarification on ECPG.)
Marked RfC.
--Jacob
On Mon, Jul 10, 2023 at 10:42:31AM -0700, Jacob Champion wrote:
On Mon, Jul 10, 2023 at 10:19 AM Reid Thompson <jreidthompson@nc.rr.com> wrote:
I made a another pass through the separated patches, it looks good to
me.LGTM too. (Thanks Tom for the clarification on ECPG.)
+SELECT INTERVAL '42 days 2 seconds ago ago';
+SELECT INTERVAL '2 minutes ago 5 days';
[...]
+SELECT INTERVAL 'hour 5 months';
+SELECT INTERVAL '1 year months days 5 hours';
0002 and 0003 make this stuff fail, but isn't there a risk that this
breaks applications that relied on these accidental behaviors?
Assuming that this is OK makes me nervous.
--
Michael
On Mon, Aug 21, 2023 at 10:39 PM Michael Paquier <michael@paquier.xyz> wrote:
0002 and 0003 make this stuff fail, but isn't there a risk that this
breaks applications that relied on these accidental behaviors?
Assuming that this is OK makes me nervous.
I wouldn't argue for backpatching, for sure, but I guess I saw this as
falling into the same vein as 5b3c5953 and bcc704b52 which were
already committed.
--Jacob
On Tue, Aug 22, 2023 at 12:58 PM Jacob Champion <jchampion@timescale.com>
wrote:
On Mon, Aug 21, 2023 at 10:39 PM Michael Paquier <michael@paquier.xyz>
wrote:
0002 and 0003 make this stuff fail, but isn't there a risk that this
breaks applications that relied on these accidental behaviors?
Assuming that this is OK makes me nervous.I wouldn't argue for backpatching, for sure, but I guess I saw this as
falling into the same vein as 5b3c5953 and bcc704b52 which were
already committed.
I agree, I don't think we should try and backport this. As Jacob
highlighted, we've merged similar patches for other date time types.
If applications were relying on this behavior, the upgrade may be a
good time for them to re-evaluate their usage since it's outside the
documented spec and they may not be getting the units they're expecting
from intervals like '1 day month'.
Thanks,
Joe Koshakow
On Sun, Aug 27, 2023 at 04:14:00PM -0400, Joseph Koshakow wrote:
On Tue, Aug 22, 2023 at 12:58 PM Jacob Champion <jchampion@timescale.com>
wrote:I wouldn't argue for backpatching, for sure, but I guess I saw this as
falling into the same vein as 5b3c5953 and bcc704b52 which were
already committed.I agree, I don't think we should try and backport this. As Jacob
highlighted, we've merged similar patches for other date time types.
If applications were relying on this behavior, the upgrade may be a
good time for them to re-evaluate their usage since it's outside the
documented spec and they may not be getting the units they're expecting
from intervals like '1 day month'.
I felt like asking anyway. I have looked at the patch series and the
past compatibility changes in this area, and I kind of agree that this
seems like an improvement against confusing interval values. So, I
have applied 0001, 0002 and 0003 after more review.
0002 was a bit careless with the code indentation.
In 0003, I was wondering a bit if we'd better set parsing_unit_val to
false for AGO, but as we do a backward lookup and because after 0002
AGO can only be last, I've let the code as you have suggested, relying
on the initial value of this variable.
--
Michael