Standard-conforming datetime years parsing

Started by Alexander Korotkovabout 6 years ago2 messages

a.korotkov@postgrespro.ru

about 6 years ago

1 attachment(s)

Hi!

Thread [1] about support for .datetime() jsonpath method raises a
question about standard-conforming parising for Y, YY, YYY and RR
datetime template patterns.

According to standard YYY, YY and Y should get higher digits from
current year. Our current implementation gets higher digits so that
the result is closest to 2020.

We currently don't support RR. According to standard RR behavior is
implementation-defined and should select marching 4-digit year in the
interval [CY - 100; CY + 100], where CY is current year. So, our
current implementation of YY is more like RR according to standard.

The open question are:
1) Do we like to make our datetime parsing to depend on current
timestamp? I guess no. But how to parse one-digit year? If we
hardcode constant it would outdate in decade. Thankfully, no one in
the right mind wouldn't use Y pattern, but still.
2) How do we like to parse RR? Standard lives us a lot of freedom
here. Do we like to parse it as do we parse YY now? It looks
reasonable to select a closest matching year. Since PG 13 is going to
be released in 2020, our algorithm would be perfect fit at release
time.
3) Do we like to change behavior to_date()/to_timestamp()? Or just
jsonpath .datetime() and future CAST(... AS ... FORMAT ...) defined in
SQL 2016?

Attached patch solve the questions above as following. YYY, YY and Y
patterns get higher digits from 2020. So, results for Y would become
inconsistent since 2030. RR select matching year closest to 2020 as
YY does for now. It changes behavior for both
to_date()/to_timestamp() and jsonpath .datetime().

Any thoughts?

Links
1. /messages/by-id/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU+4XTw@mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-datetime-years-parsing.patchapplication/octet-stream; name=0001-datetime-years-parsing.patchDownload

commit 8f3fb0fb503618d08613b24de6a918995a1afa4b
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Tue Aug 20 07:57:11 2019 +0300

    Introduce RRRR and RR, revise YYY, YY and Y datetime format patterns
    
    SQL Standard 2016 defines RR and RRRR format patterns and interprets YYY, YY
    and Y format patterns in the different way than current implementation of
    to_date()/to_timestamp() does.
    
    According to standard YYY, YY and Y should get higher digits from current year.
    Our current implementation gets higher digits so that the result is closest
    to 2020.  Now, we get just higher digits from 2020 instead.  We use 2020 instead
    of current year in order to keep our conversion immutable.
    
    According to standard RR behavior is implementation-defined and should select
    marching 4-digit year in the interval [CY - 100; CY + 100], where CY is current
    year.  We use logic, which was previously used in YY, select the matching year
    closest to 2020.
    
    The change is made in favor of upcoming jsonpath .datetime() method.  Standard
    applies to jsonpath .datetime() method and CAST (... FORMAT ...) SQL clause.
    But in order to keep our behavior consistent, we apply this change to
    to_date()/to_timestamp() too.
    
    Discussion: https://postgr.es/m/CAPpHfdsZgYEra_PeCLGNoXOWYx6iU-S3wF8aX0ObQUcZU%2B4XTw%40mail.gmail.com
    Author: Alexander Korotkov

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 28eb322f3fc..a8ed4af3a35 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6332,7 +6332,7 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
         <entry>year (4 or more digits) with comma</entry>
        </row>
        <row>
-        <entry><literal>YYYY</literal></entry>
+        <entry><literal>YYYY</literal>, <literal>RRRR</literal></entry>
         <entry>year (4 or more digits)</entry>
        </row>
        <row>
@@ -6363,6 +6363,10 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
         <entry><literal>I</literal></entry>
         <entry>last digit of ISO 8601 week-numbering year</entry>
        </row>
+       <row>
+        <entry><literal>RR</literal></entry>
+        <entry>last 2 digits of rounded year</entry>
+       </row>
        <row>
         <entry><literal>BC</literal>, <literal>bc</literal>,
         <literal>AD</literal> or <literal>ad</literal></entry>
@@ -6683,10 +6687,18 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        In <function>to_timestamp</function> and <function>to_date</function>,
-       if the year format specification is less than four digits, e.g.
-       <literal>YYY</literal>, and the supplied year is less than four digits,
-       the year will be adjusted to be nearest to the year 2020, e.g.
-       <literal>95</literal> becomes 1995.
+       when using <literal>YYY</literal>, <literal>YY</literal> or
+       <literal>Y</literal> fields, then higher year digits are taken from
+       2020, e.g. <literal>7</literal> becomes 2027.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       In <function>to_timestamp</function> and <function>to_date</function>,
+       when using <literal>RR</literal> field, the supplied year will be
+       adjusted to be nearest to the year 2020, e.g. <literal>95</literal>
+       becomes 1995.
       </para>
      </listitem>
 
@@ -6817,6 +6829,16 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+        <function>to_char(interval)</function> formats <literal>HH</literal> and
+        <literal>HH12</literal> as shown on a 12-hour clock, for example zero hours
+        and 36 hours both output as <literal>12</literal>, while <literal>HH24</literal>
+        outputs the full hour value, which can exceed 23 in
+        an <type>interval</type> value.
+      </para>
+     </listitem>
+
     </itemizedlist>
    </para>
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index f7175df8da6..f090bf9e71a 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -665,6 +665,8 @@ typedef enum
 	DCH_PM,
 	DCH_Q,
 	DCH_RM,
+	DCH_RRRR,
+	DCH_RR,
 	DCH_SSSSS,
 	DCH_SSSS,
 	DCH_SS,
@@ -718,6 +720,8 @@ typedef enum
 	DCH_pm,
 	DCH_q,
 	DCH_rm,
+	DCH_rrrr,
+	DCH_rr,
 	DCH_sssss,
 	DCH_ssss,
 	DCH_ss,
@@ -828,6 +832,8 @@ static const KeyWord DCH_keywords[] = {
 	{"PM", 2, DCH_PM, false, FROM_CHAR_DATE_NONE},
 	{"Q", 1, DCH_Q, true, FROM_CHAR_DATE_NONE}, /* Q */
 	{"RM", 2, DCH_RM, false, FROM_CHAR_DATE_GREGORIAN}, /* R */
+	{"RRRR", 4, DCH_RRRR, true, FROM_CHAR_DATE_GREGORIAN},
+	{"RR", 2, DCH_RR, true, FROM_CHAR_DATE_GREGORIAN},
 	{"SSSSS", 5, DCH_SSSS, true, FROM_CHAR_DATE_NONE},	/* S */
 	{"SSSS", 4, DCH_SSSS, true, FROM_CHAR_DATE_NONE},
 	{"SS", 2, DCH_SS, true, FROM_CHAR_DATE_NONE},
@@ -881,6 +887,8 @@ static const KeyWord DCH_keywords[] = {
 	{"pm", 2, DCH_pm, false, FROM_CHAR_DATE_NONE},
 	{"q", 1, DCH_Q, true, FROM_CHAR_DATE_NONE}, /* q */
 	{"rm", 2, DCH_rm, false, FROM_CHAR_DATE_GREGORIAN}, /* r */
+	{"rrrr", 4, DCH_RRRR, true, FROM_CHAR_DATE_GREGORIAN},
+	{"rr", 2, DCH_RR, true, FROM_CHAR_DATE_GREGORIAN},
 	{"sssss", 5, DCH_SSSS, true, FROM_CHAR_DATE_NONE},	/* s */
 	{"ssss", 4, DCH_SSSS, true, FROM_CHAR_DATE_NONE},
 	{"ss", 2, DCH_SS, true, FROM_CHAR_DATE_NONE},
@@ -1058,7 +1066,8 @@ static void dump_node(FormatNode *node, int max);
 
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
-static int	adjust_partial_year_to_2020(int year);
+static int	adjust_partial_year_to_2020(int year, int ndigits);
+static int	adjust_partial_round_year_to_2020(int year);
 static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode,
 							   bool *have_error);
@@ -2234,8 +2243,27 @@ is_next_separator(FormatNode *n)
 }
 
 
+/*
+ * Get lower 'ndigits' from 'year' while take other higher digits from
+ * 2020.
+ */
+static int
+adjust_partial_year_to_2020(int year, int ndigits)
+{
+	if (ndigits == 1)
+		return year + 2020;
+	else if (ndigits == 2 || ndigits == 3)
+		return year + 2000;
+	else
+		return year;
+}
+
+
+/*
+ * Get closest to 2020 4-digit year, which corresponds to given 2-digits year.
+ */
 static int
-adjust_partial_year_to_2020(int year)
+adjust_partial_round_year_to_2020(int year)
 {
 	/*
 	 * Adjust all dates toward 2020; this is effectively what happens when we
@@ -2247,12 +2275,6 @@ adjust_partial_year_to_2020(int year)
 	/* Force 70-99 into the 1900's */
 	else if (year < 100)
 		return year + 1900;
-	/* Force 100-519 into the 2000's */
-	else if (year < 520)
-		return year + 2000;
-	/* Force 520-999 into the 1000's */
-	else if (year < 1000)
-		return year + 1000;
 	else
 		return year;
 }
@@ -3065,6 +3087,7 @@ DCH_to_char(FormatNode *node, bool is_interval, TmToChar *in, char *out, Oid col
 					str_numth(s, s, S_TH_TYPE(n->suffix));
 				s += strlen(s);
 				break;
+			case DCH_RRRR:
 			case DCH_YYYY:
 			case DCH_IYYY:
 				sprintf(s, "%0*d",
@@ -3095,6 +3118,7 @@ DCH_to_char(FormatNode *node, bool is_interval, TmToChar *in, char *out, Oid col
 					str_numth(s, s, S_TH_TYPE(n->suffix));
 				s += strlen(s);
 				break;
+			case DCH_RR:
 			case DCH_YY:
 			case DCH_IY:
 				sprintf(s, "%0*d",
@@ -3536,6 +3560,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out, bool std,
 					SKIP_THth(s, n->suffix);
 				}
 				break;
+			case DCH_RRRR:
 			case DCH_YYYY:
 			case DCH_IYYY:
 				from_char_parse_int(&out->year, &s, n, have_error);
@@ -3547,17 +3572,22 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out, bool std,
 			case DCH_IYY:
 				len = from_char_parse_int(&out->year, &s, n, have_error);
 				CHECK_ERROR;
-				if (len < 4)
-					out->year = adjust_partial_year_to_2020(out->year);
+				out->year = adjust_partial_year_to_2020(out->year, len);
 				out->yysz = 3;
 				SKIP_THth(s, n->suffix);
 				break;
+			case DCH_RR:
+				from_char_parse_int(&out->year, &s, n, have_error);
+				CHECK_ERROR;
+				out->year = adjust_partial_round_year_to_2020(out->year);
+				out->yysz = 2;
+				SKIP_THth(s, n->suffix);
+				break;
 			case DCH_YY:
 			case DCH_IY:
 				len = from_char_parse_int(&out->year, &s, n, have_error);
 				CHECK_ERROR;
-				if (len < 4)
-					out->year = adjust_partial_year_to_2020(out->year);
+				out->year = adjust_partial_year_to_2020(out->year, len);
 				out->yysz = 2;
 				SKIP_THth(s, n->suffix);
 				break;
@@ -3565,8 +3595,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out, bool std,
 			case DCH_I:
 				len = from_char_parse_int(&out->year, &s, n, have_error);
 				CHECK_ERROR;
-				if (len < 4)
-					out->year = adjust_partial_year_to_2020(out->year);
+				out->year = adjust_partial_year_to_2020(out->year, len);
 				out->yysz = 1;
 				SKIP_THth(s, n->suffix);
 				break;
@@ -3742,10 +3771,12 @@ DCH_datetime_type(FormatNode *node, bool *have_error)
 			case DCH_Y_YYY:
 			case DCH_YYYY:
 			case DCH_IYYY:
+			case DCH_RRRR:
 			case DCH_YYY:
 			case DCH_IYY:
 			case DCH_YY:
 			case DCH_IY:
+			case DCH_RR:
 			case DCH_Y:
 			case DCH_I:
 			case DCH_RM:
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 6b53876e062..f68882cc173 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2597,7 +2597,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" RR MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2618,19 +2618,22 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'RRMonDD');
+ERROR:  invalid value "/Fe" for "Mon"
+DETAIL:  The given value did not match any of the allowed values for this field.
 SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
          to_timestamp         
 ------------------------------
- Sun Feb 16 00:00:00 1997 PST
+ Sat Feb 16 00:00:00 2097 PST
 (1 row)
 
-SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+SELECT to_timestamp('97/Feb/16', 'FXRR:Mon:DD');
          to_timestamp         
 ------------------------------
  Sun Feb 16 00:00:00 1997 PST
 (1 row)
 
-SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+SELECT to_timestamp('97/Feb/16', 'FXRR/Mon/DD');
          to_timestamp         
 ------------------------------
  Sun Feb 16 00:00:00 1997 PST
@@ -2642,6 +2645,12 @@ SELECT to_timestamp('19971116', 'YYYYMMDD');
  Sun Nov 16 00:00:00 1997 PST
 (1 row)
 
+SELECT to_timestamp('19971116', 'RRRRMMDD');
+         to_timestamp         
+------------------------------
+ Sun Nov 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
          to_timestamp          
 -------------------------------
@@ -2663,11 +2672,17 @@ SELECT to_timestamp('1997 BC 11 16', 'YYYY BC MM DD');
 SELECT to_timestamp('9-1116', 'Y-MMDD');
          to_timestamp         
 ------------------------------
- Mon Nov 16 00:00:00 2009 PST
+ Fri Nov 16 00:00:00 2029 PST
 (1 row)
 
 SELECT to_timestamp('95-1116', 'YY-MMDD');
          to_timestamp         
+------------------------------
+ Wed Nov 16 00:00:00 2095 PST
+(1 row)
+
+SELECT to_timestamp('95-1116', 'RR-MMDD');
+         to_timestamp         
 ------------------------------
  Thu Nov 16 00:00:00 1995 PST
 (1 row)
@@ -2675,7 +2690,7 @@ SELECT to_timestamp('95-1116', 'YY-MMDD');
 SELECT to_timestamp('995-1116', 'YYY-MMDD');
          to_timestamp         
 ------------------------------
- Thu Nov 16 00:00:00 1995 PST
+ Mon Nov 16 00:00:00 2995 PST
 (1 row)
 
 SELECT to_timestamp('2005426', 'YYYYWWD');
@@ -2711,7 +2726,7 @@ SELECT to_timestamp('05527', 'IYIWID');
 SELECT to_timestamp('5527', 'IIWID');
          to_timestamp         
 ------------------------------
- Sun Jan 01 00:00:00 2006 PST
+ Sun Dec 28 00:00:00 2025 PST
 (1 row)
 
 SELECT to_timestamp('2005364', 'IYYYIDDD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index f7a9da1e954..e35cb448508 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -344,7 +344,7 @@ SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" RR MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -352,14 +352,18 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'RRMonDD');
+
 SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
 
-SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+SELECT to_timestamp('97/Feb/16', 'FXRR:Mon:DD');
 
-SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+SELECT to_timestamp('97/Feb/16', 'FXRR/Mon/DD');
 
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
+SELECT to_timestamp('19971116', 'RRRRMMDD');
+
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
 
 SELECT to_timestamp('1997 AD 11 16', 'YYYY BC MM DD');
@@ -369,6 +373,8 @@ SELECT to_timestamp('9-1116', 'Y-MMDD');
 
 SELECT to_timestamp('95-1116', 'YY-MMDD');
 
+SELECT to_timestamp('95-1116', 'RR-MMDD');
+
 SELECT to_timestamp('995-1116', 'YYY-MMDD');
 
 SELECT to_timestamp('2005426', 'YYYYWWD');

Vik Fearing

vik.fearing@2ndquadrant.com

about 6 years ago

In reply to: Alexander Korotkov (#1)

Re: Standard-conforming datetime years parsing

On 05/11/2019 02:45, Alexander Korotkov wrote:

3) Do we like to change behavior to_date()/to_timestamp()? Or just
jsonpath .datetime() and future CAST(... AS ... FORMAT ...) defined in
SQL 2016?

I don't want to hijack this thread, but I would like the CAST feature to
call to_timestamp() and to_char(), even if they aren't 100% standard
compliant today.

I see a new column on pg_cast where users can define the function to do
the cast with format.

Vik