Allow to_date() and to_timestamp() to accept localized month names

Started by Mattiaover 9 years ago2 messages
#1Mattia
mattia@p2pforum.it
1 attachment(s)

Hi,
attached is a patch which adds support to localized month names in
to_date() and to_timestamp() functions.

The patch is fairly simple but I want to discuss the approach and
implementation:

Using the TM modifier as in to_char() was already discussed some years
ago: 10710.1202170898@sss.pgh.pa.us [1]/messages/by-id/10710.1202170898@sss.pgh.pa.us

I thought about reusing from_char_seq_search() but localized month
names use different capitalization according to the language grammar,
so I used pg_strncasecmp to do the match.

Regression tests with TM modifier are difficult since one should have
the locale used for the test installed on his system.

Usage example:
postgres=# set lc_time to 'fr_FR';
SET
postgres=# select to_date('22 janvier 2016', 'DD TMMonth YYYY');
to_date
------------
2016-01-22
(1 row)

[1]: /messages/by-id/10710.1202170898@sss.pgh.pa.us

Thanks
Mattia

Attachments:

localized_month_names_v1.patchapplication/x-download; name=localized_month_names_v1.patchDownload
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5148095..e298f92 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6106,8 +6106,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        </row>
        <row>
         <entry><literal>TM</literal> prefix</entry>
-        <entry>translation mode (print localized day and month names based on
-         <xref linkend="guc-lc-time">)</entry>
+        <entry>translation mode (print localized day/month names or read
+        localized month names based on <xref linkend="guc-lc-time">)</entry>
         <entry><literal>TMMonth</literal></entry>
        </row>
        <row>
@@ -6138,8 +6138,6 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <literal>TM</literal> does not include trailing blanks.
-       <function>to_timestamp</> and <function>to_date</> ignore
-       the <literal>TM</literal> modifier.
       </para>
      </listitem>
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..ec692cf 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -2359,6 +2359,40 @@ from_char_seq_search(int *dest, char **src, const char *const * array, int type,
 	return len;
 }
 
+/*
+ * Perform a sequential case-insensitive search in 'array' of size 'size' for
+ * text matching the characters in 'src'.
+ *
+ * If a match is found, copy the array index in 'dest', advance 'src' to the
+ * end of the matched string, and return the number of characters consumed.
+ *
+ * If the string doesn't match, throw an error
+ */
+static int
+from_char_case_search(int *dest, char **src, char **array, int size,
+					  FormatNode *node)
+{
+	int			i;
+	size_t		len;
+
+	for (i = 0; i < size; ++i)
+	{
+		len = strlen(array[i]);
+		if (pg_strncasecmp(array[i], *src, len) == 0)
+		{
+			*dest = i;
+			*src += len;
+			return len;
+		}
+	}
+	ereport(ERROR,
+			(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+			 errmsg("invalid value \"%s\" for \"%s\"",
+					*src, node->key->name),
+			 errdetail("The given value did not match any of the allowed "
+					   "values for this field.")));
+}
+
 /* ----------
  * Process a TmToChar struct as denoted by a list of FormatNodes.
  * The formatted data is written to the string pointed to by 'out'.
@@ -2950,6 +2984,9 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				value;
 	bool		fx_mode = false;
 
+	/* cache localized days and months */
+	cache_locale_time();
+
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
 		if (n->type != NODE_TYPE_ACTION)
@@ -3064,15 +3101,23 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 			case DCH_MONTH:
 			case DCH_Month:
 			case DCH_month:
-				from_char_seq_search(&value, &s, months_full, ONE_UPPER,
-									 MAX_MONTH_LEN, n);
+				if (S_TM(n->suffix))
+					from_char_case_search(&value, &s, localized_full_months,
+										  12, n);
+				else
+					from_char_seq_search(&value, &s, months_full, ONE_UPPER,
+										 MAX_MONTH_LEN, n);
 				from_char_set_int(&out->mm, value + 1, n);
 				break;
 			case DCH_MON:
 			case DCH_Mon:
 			case DCH_mon:
-				from_char_seq_search(&value, &s, months, ONE_UPPER,
-									 MAX_MON_LEN, n);
+				if (S_TM(n->suffix))
+					from_char_case_search(&value, &s, localized_abbrev_months,
+										  12, n);
+				else
+					from_char_seq_search(&value, &s, months, ONE_UPPER,
+										 MAX_MON_LEN, n);
 				from_char_set_int(&out->mm, value + 1, n);
 				break;
 			case DCH_MM:
#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Mattia (#1)
Re: Allow to_date() and to_timestamp() to accept localized month names

Mattia <mattia@p2pforum.it> writes:

attached is a patch which adds support to localized month names in
to_date() and to_timestamp() functions.

Seems like a fine goal.

I thought about reusing from_char_seq_search() but localized month
names use different capitalization according to the language grammar,
so I used pg_strncasecmp to do the match.

pg_str(n)casecmp is really only meant to handle comparisons of ASCII
strings; it will definitely not succeed in case-folding multibyte
characters. That's not a big problem for to_date's existing usages
but I'm afraid it will be for non-English month names. I think you'll
need another solution there. You might have to resort to what citext
does, namely apply the full lower() transformation, at least whenever
the data string actually contains MB characters.

Regression tests with TM modifier are difficult since one should have
the locale used for the test installed on his system.

I suspect you'll have to give up on putting much about this into the
standard regression tests. We've used not-run-by-default test scripts
in some similar cases (eg collate.linux.utf8.sql), but personally I think
those are 99% a waste of time, precisely because they never actually
get run by anyone but the author.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers