Bug in to_timestamp().

Started by amul sulover 9 years ago136 messages

sul_amul@yahoo.co.in

over 9 years ago

Hi,

It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below:

Ex.1: Two white spaces before HH24 whereas one before input time string

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 05:43:36-07 <— incorrect time
(1 row)

Ex.2: One whitespace before YYYY format string

postgres=# SELECT TO_TIMESTAMP('2016/06/13 15:43:36', ' YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------------
0016-06-13 15:43:36-07:52:58 <— incorrect year
(1 row)

If there are one or more consecutive whitespace in the format, we should skip those as long as we could get an actual field.
Thoughts?
Thanks & Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Import Notes

Reference msg id not found: 1873520224.1784572.1465833145330.JavaMail.yahoo.ref@mail.yahoo.com

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: amul sul (#1)

Re: Bug in to_timestamp().

amul sul <sul_amul@yahoo.co.in> writes:

It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below:

No, I think this is a case of "input doesn't match the format string".

As a rule of thumb, using to_timestamp() for input that could be parsed
just fine by the standard timestamp input function is not a particularly
good idea. to_timestamp() is meant to deal with input that is in a
well-defined format that happens to not be parsable by timestamp_in.
This example doesn't meet either of those preconditions.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Tom Lane (#2)

Re: Bug in to_timestamp().

On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

amul sul <sul_amul@yahoo.co.in> writes:

It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below:

No, I think this is a case of "input doesn't match the format string".

As a rule of thumb, using to_timestamp() for input that could be parsed
just fine by the standard timestamp input function is not a particularly
good idea. to_timestamp() is meant to deal with input that is in a
well-defined format that happens to not be parsable by timestamp_in.
This example doesn't meet either of those preconditions.

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.
It's my understanding that these functions exist in no small part for
compatibility with Oracle, and Oracle declines to skip the digit '1'
on the basis of an extra space in the format string, which IMHO is the
behavior any reasonable user would expect.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Robert Haas (#3)

1 attachment(s)

Re: Bug in to_timestamp().

On Monday, 13 June 2016 9:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.

+1, enough the benefit is we are giving correct output.

PFA patch proposing this fix.

Regards,
Amul Sul.

Attachments:

0001-RM37358-space-in-the-format-string-should-skip-a-whi.patchapplication/octet-streamDownload

From e79001a912c9b8bfbcd5eef24fca61f4ac08becb Mon Sep 17 00:00:00 2001
From: Amul Sul <sul_amul@yahoo.co.in>
Date: Wed, 15 Jun 2016 14:42:43 +0530
Subject: [PATCH] RM37358: space in the format string should skip a whitespace
 in input

---
 src/backend/utils/adt/formatting.c     | 20 ++++++++++++++++----
 src/test/regress/expected/horology.out | 12 ++++++------
 src/test/regress/sql/horology.sql      |  4 ++--
 3 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index ae93fe0..ccacc09 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -2955,11 +2955,23 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 		if (n->type != NODE_TYPE_ACTION)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Ignore spaces when not in FX (fixed width) mode and space in
+			 * the format string.
 			 */
-			s++;
+			if (!fx_mode && isspace((unsigned char) n->character))
+			{
+				while (*s != '\0' && isspace((unsigned char) *s))
+					s++;
+			}
+			else
+			{
+				/*
+				 * Separator, so consume one character from input string.
+				 * Notice we don't insist that the consumed character match
+				 * the format's character.
+				 */
+				s++;
+			}
 			continue;
 		}
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 1fe02be..98a9681 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2776,7 +2776,7 @@ SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 (1 row)
 
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2789,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2912,7 +2912,7 @@ SELECT to_timestamp('  20050302', 'YYYYMMDD');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2942,7 +2942,7 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -2960,13 +2960,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index c81437b..71f0acf 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -395,12 +395,12 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
-- 
2.6.2

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Robert Haas (#3)

Re: Bug in to_timestamp().

On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

amul sul <sul_amul@yahoo.co.in> writes:

It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below:

No, I think this is a case of "input doesn't match the format string".

As a rule of thumb, using to_timestamp() for input that could be parsed
just fine by the standard timestamp input function is not a particularly
good idea. to_timestamp() is meant to deal with input that is in a
well-defined format that happens to not be parsable by timestamp_in.
This example doesn't meet either of those preconditions.

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.
It's my understanding that these functions exist in no small part for
compatibility with Oracle, and Oracle declines to skip the digit '1'
on the basis of an extra space in the format string, which IMHO is the
behavior any reasonable user would expect.

So Amul and I are of one opinion and Tom is of another. Anyone else
have an opinion?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Robert Haas (#5)

Re: Bug in to_timestamp().

On Mon, Jun 20, 2016 at 8:19 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas <robertmhaas@gmail.com>
wrote:

On Mon, Jun 13, 2016 at 12:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

amul sul <sul_amul@yahoo.co.in> writes:

It's look like bug in to_timestamp() function when format string has

more whitespaces compare to input string, see below:

No, I think this is a case of "input doesn't match the format string".

As a rule of thumb, using to_timestamp() for input that could be parsed
just fine by the standard timestamp input function is not a particularly
good idea. to_timestamp() is meant to deal with input that is in a
well-defined format that happens to not be parsable by timestamp_in.
This example doesn't meet either of those preconditions.

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.
It's my understanding that these functions exist in no small part for
compatibility with Oracle, and Oracle declines to skip the digit '1'
on the basis of an extra space in the format string, which IMHO is the
behavior any reasonable user would expect.

So Amul and I are of one opinion and Tom is of another. Anyone else
have an opinion?

At least Tom's position has the benefit of being consistent with current
behavior. The current implementation doesn't actually care what literal
value you specify - any non-special character consumes a single token from
the input, period.

SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD--HH24:MI:SS');
SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD-HH24:MI:SS');

Both the above exhibit the same behavior as if you used a space instead of
the hyphen as the group separator.

The documentation should be updated to make this particular dynamic more
clear.

I don't see changing the general behavior of these "date formatting"
functions a worthwhile endeavor. Adding a less-liberal "parse_timestamp"
function I could get behind.

IOW, the user seems to be happy with the fact that the "/" in the date
matches his "-" but them complains that the space matches the number "1".
You don't get to have it both ways.

[re-reads the third usage note]

Or maybe you do. We already define space as a being a greedy operator
(when not used in conjunction with FX). A greedy space-space sequence
makes little sense on its face and if we are going to be helpful here we
should treat it as a single greedy space matcher.

Note that "returns an error because to_timestamp expects one space only" is
wrong - it errors because only a single space is captured and then the
attempt to parse ' JUN' using "MON" fails. The following query doesn't
fail though it exhibits the same space discrepancy (it just gives the same
"wrong" result).

SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'FXYYYY/MM/DD HH24:MI:SS');

Given that we already partially special-case the space expression I'd be
inclined to consider Robert's and Amul's position on the matter. I think
I'd redefine our treatment of space to be "zero or more" instead of "one or
more" and require that it only match a literal space in the input.

Having considered that, I'm not convinced its worth a compatibility break.
I'd much rather deprecate these <to_*> versions and write
slightly-less-liberal versions named <parse_*>.

In any case I'd called the present wording a bug. Instead:

A single space consumes a single token of input and then, unless the FX
modifier is present, consumes zero or more subsequent literal spaces.
Thus, using two spaces in a row without the FX modifier, while allowed, is
unlikely to give you a satisfactory result. The first space will consume
all available consecutive spaces so that the second space will be
guaranteed to consume a non-space token from the input.

David J.

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Robert Haas (#5)

Re: Bug in to_timestamp().

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.
It's my understanding that these functions exist in no small part for
compatibility with Oracle, and Oracle declines to skip the digit '1'
on the basis of an extra space in the format string, which IMHO is the
behavior any reasonable user would expect.

So Amul and I are of one opinion and Tom is of another. Anyone else
have an opinion?

I don't necessarily have an opinion yet. I would like to see more than
just an unsupported assertion about what Oracle's behavior is. Also,
how should FM mode affect this?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Albe Laurenz

laurenz.albe@wien.gv.at

over 9 years ago

In reply to: Tom Lane (#7)

Re: Bug in to_timestamp().

Tom Lane wrote:

I don't necessarily have an opinion yet. I would like to see more than
just an unsupported assertion about what Oracle's behavior is. Also,
how should FM mode affect this?

I can supply what Oracle 12.1 does:

SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' YYYY/MM/DD HH24:MI:SS') AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS') AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS') AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

(to_timestamp_tz behaves the same way.)

So Oracle seems to make no difference between one or more spaces.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alex Ignatov

a.ignatov@postgrespro.ru

over 9 years ago

In reply to: Tom Lane (#7)

Re: Bug in to_timestamp().

On 20.06.2016 16:36, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jun 13, 2016 at 12:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:

I think a space in the format string should skip a whitespace
character in the input string, but not a non-whitespace character.
It's my understanding that these functions exist in no small part for
compatibility with Oracle, and Oracle declines to skip the digit '1'
on the basis of an extra space in the format string, which IMHO is the
behavior any reasonable user would expect.

So Amul and I are of one opinion and Tom is of another. Anyone else
have an opinion?

I don't necessarily have an opinion yet. I would like to see more than
just an unsupported assertion about what Oracle's behavior is. Also,
how should FM mode affect this?

regards, tom lane

Oracle:
SQL> SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD HH24:MI:SS')
from dual;
SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD HH24:MI:SS') from dual
*
ERROR at line 1:
ORA-01843: not a valid month

PG:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD
HH24:MI:SS');
to_timestamp
------------------------
2016-01-06 14:40:39+03
(1 row)

I know about:
"These functions interpret input liberally, with minimal error checking.
While they produce valid output, the conversion can yield unexpected
results" from docs but by providing illegal input parameters we have no
any exceptions or errors about that.
I think that to_timestamp() need to has more format checking than it has
now.

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Alex Ignatov

a.ignatov@postgrespro.ru

over 9 years ago

In reply to: amul sul (#1)

Re: Bug in to_timestamp().

On 13.06.2016 18:52, amul sul wrote:

Hi,

It's look like bug in to_timestamp() function when format string has more whitespaces compare to input string, see below:

Ex.1: Two white spaces before HH24 whereas one before input time string

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 05:43:36-07 <— incorrect time
(1 row)

Ex.2: One whitespace before YYYY format string

postgres=# SELECT TO_TIMESTAMP('2016/06/13 15:43:36', ' YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------------
0016-06-13 15:43:36-07:52:58 <— incorrect year
(1 row)

If there are one or more consecutive whitespace in the format, we should skip those as long as we could get an actual field.
Thoughts?
Thanks & Regards,
Amul Sul

From docs about to_timestamp() (
https://www.postgresql.org/docs/9.5/static/functions-formatting.html)
"These functions interpret input liberally, with minimal error checking.
While they produce valid output, the conversion can yield unexpected
results. For example, input to these functions is not restricted by
normal ranges, thus to_date('20096040','YYYYMMDD') returns 2014-01-17
rather than causing an error. Casting does not have this behavior."

And it wont stop on some simple whitespace. By using to_timestamp you
can get any output results by providing illegal input parameters values:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD
HH24:MI:SS');
to_timestamp
------------------------
2016-01-06 14:40:39+03
(1 row)

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Alex Ignatov (#10)

Re: Bug in to_timestamp().

On Monday, 20 June 2016 8:53 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:

On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using to_timestamp you
can get any output results by providing illegal input parameters values:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD
HH24:MI:SS');
to_timestamp
------------------------
2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it is in format string, see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Bruce Momjian

bruce@momjian.us

over 9 years ago

In reply to: amul sul (#11)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 07:41:26AM +0000, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:

On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using to_timestamp you
can get any output results by providing illegal input parameters values:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD
HH24:MI:SS');
to_timestamp
------------------------
2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it is in format string, see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input string comes
from the data, so I don't see having them behave the same as necessary.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Alex Ignatov

a.ignatov@postgrespro.ru

over 9 years ago

In reply to: Bruce Momjian (#12)

Re: Bug in to_timestamp().

On 23.06.2016 16:30, Bruce Momjian wrote:

On Thu, Jun 23, 2016 at 07:41:26AM +0000, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:

On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using to_timestamp you
can get any output results by providing illegal input parameters values:
postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD
HH24:MI:SS');
to_timestamp
------------------------
2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it is in format string, see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input string comes
from the data, so I don't see having them behave the same as necessary.

To be honest they not just behave differently. to_timestamp is just
incorrectly handles input data and nothing else.There is no excuse for
such behavior:

postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', 'YYYY/MM/DD
HH24:MI:SS');
to_timestamp
------------------------------
0018-08-05 13:15:43+02:30:17
(1 row)

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Alex Ignatov (#13)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 12:16 PM, Alex Ignatov <a.ignatov@postgrespro.ru>
wrote:

On 23.06.2016 16:30, Bruce Momjian wrote:

On Thu, Jun 23, 2016 at 07:41:26AM +0000, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov <a.ignatov@postgrespro.ru>
wrote:

On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using to_timestamp you
can get any output results by providing illegal input parameters values:
postgres=# SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYYMMDD
HH24:MI:SS');
to_timestamp
------------------------
2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it is in format
string, see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD
HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input string comes
from the data, so I don't see having them behave the same as necessary.

To be honest they not just behave differently. to_timestamp is just
incorrectly handles input data and nothing else.There is no excuse for
such behavior:

postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36', 'YYYY/MM/DD
HH24:MI:SS');
to_timestamp
------------------------------
0018-08-05 13:15:43+02:30:17
(1 row)

T
o be honest I don't see how this is relevant to quoted content. And
you've already made this point quite clearly - repeating it isn't
constructive. This behavior has existed for a long time and I don't see
that changing it is a worthwhile endeavor. I believe a new function is
required that has saner behavior. Otherwise given good input and a
well-formed parse string the function does exactly what it is designed to
do. Avoid giving it garbage and you will be fine. Maybe wrap the call to
the in a function that also checks for the expected layout and RAISE
EXCEPTION if it doesn't match.

David J.

#15

Alex Ignatov

a.ignatov@postgrespro.ru

over 9 years ago

In reply to: David G. Johnston (#14)

Re: Bug in to_timestamp().

On 23.06.2016 19:37, David G. Johnston wrote:

On Thu, Jun 23, 2016 at 12:16 PM, Alex Ignatov
<a.ignatov@postgrespro.ru <mailto:a.ignatov@postgrespro.ru>>wrote:

On 23.06.2016 16:30, Bruce Momjian wrote:

On Thu, Jun 23, 2016 at 07:41:26AM +0000, amul sul wrote:

On Monday, 20 June 2016 8:53 PM, Alex Ignatov
<a.ignatov@postgrespro.ru
<mailto:a.ignatov@postgrespro.ru>> wrote:

On 13.06.2016 18:52, amul sul wrote:

And it wont stop on some simple whitespace. By using
to_timestamp you
can get any output results by providing illegal input
parameters values:
postgres=# SELECT TO_TIMESTAMP('2016-06-13 99
<tel:2016-06-13%2099>:99:99', 'YYYYMMDD
HH24:MI:SS');
to_timestamp
------------------------
2016-01-06 14:40:39+03

(1 row)

We do consume extra space from input string, but not if it
is in format string, see below:

postgres=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36',
'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 15:43:36-07
(1 row)

We should have same treatment for format string too.

Thoughts? Comments?

Well, the user specifies the format string, while the input
string comes
from the data, so I don't see having them behave the same as
necessary.

To be honest they not just behave differently. to_timestamp is
just incorrectly handles input data and nothing else.There is no
excuse for such behavior:

postgres=# SELECT TO_TIMESTAMP('20:-16-06:13: 15_43:!36',
'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------------
0018-08-05 13:15:43+02:30:17
(1 row)

T
o be honest I don't see how this is relevant to quoted content. And
you've already made this point quite clearly - repeating it isn't
constructive. This behavior has existed for a long time and I don't
see that changing it is a worthwhile endeavor. I believe a new
function is required that has saner behavior. Otherwise given good
input and a well-formed parse string the function does exactly what it
is designed to do. Avoid giving it garbage and you will be fine.
Maybe wrap the call to the in a function that also checks for the
expected layout and RAISE EXCEPTION if it doesn't match.

David J.

Arguing just like that one can say that we don't even need exception
like "division by zero". Just use well-formed numbers in denominator...
Input data sometimes can be generated automagically. Without exception
throwing debugging stored function containing to_timestamp can be painful.

#16

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Alex Ignatov (#15)

Re: Bug in to_timestamp().

On Thursday, June 23, 2016, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:

Arguing just like that one can say that we don't even need exception like
"division by zero". Just use well-formed numbers in denominator...
Input data sometimes can be generated automagically. Without exception
throwing debugging stored function containing to_timestamp can be painful.

to_timestamp with its present behavior is, IMO, a poorly designed function
that would never be accepted today. Concrete proposals for either fixing
or deprecating (or both) are welcome. Fixing it should not
cause unnecessary errors to be raised.

My main point is that I'm inclined to deprecate it.

My second point is if you are going to use this badly designed function you
need to protect yourself.

My understanding is that is not going to change for 9.6.

David J.

#17

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: David G. Johnston (#14)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 12:37 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote

To be honest I don't see how this is relevant to quoted content. And you've
already made this point quite clearly - repeating it isn't constructive.
This behavior has existed for a long time and I don't see that changing it
is a worthwhile endeavor. I believe a new function is required that has
saner behavior. Otherwise given good input and a well-formed parse string
the function does exactly what it is designed to do. Avoid giving it
garbage and you will be fine. Maybe wrap the call to the in a function that
also checks for the expected layout and RAISE EXCEPTION if it doesn't match.

Well, I think other people are allowed to disagree about whether
changing it is a worthwhile endeavor. I think there's an excellent
argument that the current behavior is stupid and broken and probably
nobody is intentionally relying on it.

Obviously, if somebody is relying on the existing behavior and we
change it and it breaks things, then that's bad, and a good argument
for worrying about backward-compatibility - e.g. by adding a new
function. But who would actually like the behavior that an extra
space in the format string causes non-whitespace characters to get
skipped? Next you'll be arguing that we can't fix a server crash
that's triggered by a certain query because somebody might be relying
on that query to restart the server...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: David G. Johnston (#16)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

to_timestamp with its present behavior is, IMO, a poorly designed function
that would never be accepted today. Concrete proposals for either fixing or
deprecating (or both) are welcome. Fixing it should not cause unnecessary
errors to be raised.

Sheesh. Who put you in charge of this? You basically seem like you
are trying to shut up anyone who supports this change, and I don't
think that's right. Alex's opinion is just as valid as yours -
neither more nor less - and he has every right to express it without
being told by you that his emails are "not constructive".

My main point is that I'm inclined to deprecate it.

I can almost guarantee that would make a lot of users very unhappy.
This function is widely used.

My second point is if you are going to use this badly designed function you
need to protect yourself.

I agree that anyone using this function should test their format
strings carefully.

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: David G. Johnston (#16)

Re: Bug in to_timestamp().

David G. Johnston wrote:

On Thursday, June 23, 2016, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:

Arguing just like that one can say that we don't even need exception like
"division by zero". Just use well-formed numbers in denominator...
Input data sometimes can be generated automagically. Without exception
throwing debugging stored function containing to_timestamp can be painful.

to_timestamp with its present behavior is, IMO, a poorly designed function
that would never be accepted today.

I'm not sure about that.

to_timestamp was added to improve compatibility with Oracle, by commit
b866d2e2d794. I suppose the spec should follow what's documented here,

http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions193.htm
http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm#i34924

and that wherever we deviate from that, is a bug that should be fixed.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Robert Haas (#18)

Re: Bug in to_timestamp().

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

I would definitely agree with David on that point. Making to_timestamp
noticeably better on this score seems like a nontrivial project, and
post-beta is not the time for that sort of thing, even if we had full
consensus on what to do. I'd suggest somebody work on a patch and put
it up for review in the next cycle.

Now, if you were to narrowly define the problem as "whether to skip
non-spaces for a space in the format", maybe that could be fixed
post-beta, but I think that's a wrongheaded approach. to_timestamp's
issues with input that doesn't match the format are far wider than that.
IMO we should try to resolve the whole problem with one coherent change,
not make incremental incompatible changes at the margins.

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Tom Lane (#20)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

I would definitely agree with David on that point. Making to_timestamp
noticeably better on this score seems like a nontrivial project, and
post-beta is not the time for that sort of thing, even if we had full
consensus on what to do. I'd suggest somebody work on a patch and put
it up for review in the next cycle.

Now, if you were to narrowly define the problem as "whether to skip
non-spaces for a space in the format", maybe that could be fixed
post-beta, but I think that's a wrongheaded approach. to_timestamp's
issues with input that doesn't match the format are far wider than that.
IMO we should try to resolve the whole problem with one coherent change,
not make incremental incompatible changes at the margins.

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

I'm not averse to some further study of those issues, and I think the
first two are closely related. The third one strikes me as a somewhat
separate consideration that doesn't need to be addressed by the same
patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Robert Haas (#18)

Re: Bug in to_timestamp().

On Thursday, June 23, 2016, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston
<david.g.johnston@gmail.com <javascript:;>> wrote:

to_timestamp with its present behavior is, IMO, a poorly designed

function

that would never be accepted today. Concrete proposals for either

fixing or

deprecating (or both) are welcome. Fixing it should not cause

unnecessary

errors to be raised.

Sheesh. Who put you in charge of this? You basically seem like you
are trying to shut up anyone who supports this change, and I don't
think that's right.

I'm all for a change in this area - though I'm not impacted enough to
actually work on a design proposal. And I'm not sure how asking for ideas
constitutes trying to shut people up. Especially since if no one does
float a proposal we'll simply have this discussion next year when someone
new discovers how badly behaved this function is.

My main point is that I'm inclined to deprecate it.

I can almost guarantee that would make a lot of users very unhappy.
This function is widely used.

Tell people not to use. We'd leave it in, unchanged, on backward
compatibility grounds. This seems like acceptable behavior for the
project. Am I mistaken?

My second point is if you are going to use this badly designed function

you

need to protect yourself.

I agree that anyone using this function should test their format
strings carefully.

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

Ok. I'm having trouble seeing this justified as a bug fix - the docs
clearly state our behavior is intentional. Improved behavior, yes, but
that's a feature and we're in beta2. Please be specific if you believe
I've misinterpreted project policies on this matter.

David J.

#23

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: David G. Johnston (#22)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 1:46 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

Sheesh. Who put you in charge of this? You basically seem like you
are trying to shut up anyone who supports this change, and I don't
think that's right.

I'm all for a change in this area - though I'm not impacted enough to
actually work on a design proposal.

You weren't for a change in this area a few emails ago; back then, you
said "I don't see that changing it is a worthwhile endeavor".

And I'm not sure how asking for ideas
constitutes trying to shut people up. Especially since if no one does float
a proposal we'll simply have this discussion next year when someone new
discovers how badly behaved this function is.

In my opinion, telling people that their emails are not constructive
and that no change is possible for 9.6 is pretty much the same thing
as trying to shut people up. And that's what you did.

My main point is that I'm inclined to deprecate it.

I can almost guarantee that would make a lot of users very unhappy.
This function is widely used.

Tell people not to use. We'd leave it in, unchanged, on backward
compatibility grounds. This seems like acceptable behavior for the project.
Am I mistaken?

Yes. We're not going to deprecate widely-used functionality. We
might, however, fix it, contrary to your representations upthread.

My second point is if you are going to use this badly designed function
you
need to protect yourself.

I agree that anyone using this function should test their format
strings carefully.

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

Ok. I'm having trouble seeing this justified as a bug fix - the docs
clearly state our behavior is intentional. Improved behavior, yes, but
that's a feature and we're in beta2. Please be specific if you believe I've
misinterpreted project policies on this matter.

I would not vote to back-patch a change in this area because I think
that could break SQL code that works today. But I think the current
behavior is pretty well indefensible. It's neither intuitive nor
compatible with Oracle. And there is plenty of existing precedent for
making this sort of change during the beta period. We regard it as a
bug fix which is too dangerous to back-patch; therefore, it is neither
subject to feature freeze nor does it go into existing stable
releases. Whether to treat this particular issue in that particular
way is something that needs to be hammered out in discussion. Tom
raises the valid point that we need to make sure that we've thoroughly
studied this issue and fixed all of the related cases before
committing anything -- we don't want to change the behavior in
9.6beta, release 9.6, and then have to change it again for 9.7. But
there is certainly no project policy which says we can't change this
now for 9.6 if we decide that (1) the current behavior is wrong and
(2) we are quite sure we know how to fix it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Robert Haas (#21)

Re: Bug in to_timestamp().

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 23, 2016 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

I'm not averse to some further study of those issues, and I think the
first two are closely related. The third one strikes me as a somewhat
separate consideration that doesn't need to be addressed by the same
patch.

If you think those issues are not interrelated, you have not thought
about it carefully enough.

As an example, what we can do to handle not-expected-width fields is
very different if the format is "DDMMYY" versus if it is "DD-MM-YY".
In the first case we have little choice but to believe that each
field is exactly two digits wide. In the second case, depending on
how we decide to define matching of "-", we might be able to allow
the field widths to vary so that they're effectively "whatever is
between the dashes". But that would require insisting that "-"
match a "-", or at least a non-alphanumeric, which is not how it
behaves today.

I don't want to twiddle these behaviors in 9.6 and then again next year.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Jim Nasby

Jim.Nasby@BlueTreble.com

over 9 years ago

In reply to: Alvaro Herrera (#19)

Re: Bug in to_timestamp().

On 6/23/16 12:29 PM, Alvaro Herrera wrote:

David G. Johnston wrote:

On Thursday, June 23, 2016, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:

Arguing just like that one can say that we don't even need exception like
"division by zero". Just use well-formed numbers in denominator...
Input data sometimes can be generated automagically. Without exception
throwing debugging stored function containing to_timestamp can be painful.

to_timestamp with its present behavior is, IMO, a poorly designed function
that would never be accepted today.

I'm not sure about that.

to_timestamp was added to improve compatibility with Oracle, by commit
b866d2e2d794. I suppose the spec should follow what's documented here,

http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions193.htm
http://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm#i34924

and that wherever we deviate from that, is a bug that should be fixed.

I'm also in favor of a parsing function that actually follows the format
specifier, but not enough to write a patch.

One thing that I think could happen for 9.6 is documenting how you could
get the desired results with one of the regex functions. Not the nicest
way to handle this problem, but it is workable and having a regex
example available for people to start with would be very helpful.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

David G. Johnston

david.g.johnston@gmail.com

over 9 years ago

In reply to: Robert Haas (#23)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 2:00 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 23, 2016 at 1:46 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

Sheesh. Who put you in charge of this? You basically seem like you
are trying to shut up anyone who supports this change, and I don't
think that's right.

I'm all for a change in this area - though I'm not impacted enough to
actually work on a design proposal.

You weren't for a change in this area a few emails ago; back then, you
said "I don't see that changing it is a worthwhile endeavor".

I still don't see changing to_timestamp as being the right approach...but
that's just my opinion. I do think that this area "text to timestamp
conversions" could stand to be improved. I may not have been clear on that
distinction. But changing the behavior of a function that has been around
since 2000 seems to be just asking for trouble.

And I'm not sure how asking for ideas
constitutes trying to shut people up. Especially since if no one does

float

a proposal we'll simply have this discussion next year when someone new
discovers how badly behaved this function is.

In my opinion, telling people that their emails are not constructive
and that no change is possible for 9.6 is pretty much the same thing
as trying to shut people up. And that's what you did.

I guess I should be a bit more careful to make sure that two separate
responses on different aspects of a topic cannot be so easily construed as
"this thread is pointless". To be clear I did and still do believe that
getting a change in this area into 10.0 is worthwhile; and that our policy
(and present circumstances) appears to preclude this changing in 9.6. But
as noted below this is just an observation.

My main point is that I'm inclined to deprecate it.

I can almost guarantee that would make a lot of users very unhappy.
This function is widely used.

Tell people not to use. We'd leave it in, unchanged, on backward
compatibility grounds. This seems like acceptable behavior for the

project.

Am I mistaken?

Yes. We're not going to deprecate widely-used functionality. We
might, however, fix it, contrary to your representations upthread.

At this point I feel you are cherry-picking my words to fit your present
feelings. I stand by everything I've written upthread regarding
deprecation and fixing. I'm personally in favor of the former. I'll add
that you are a committer, I am not. The one thing going for change is if
we indeed exactly match the reference behavior and then document it as
being a compatibility function. I hadn't fully pondered this goal and how
its plays into changing 16 year old behavior. Obviously a new name for the
function doesn't cut it in this scenario.

My second point is if you are going to use this badly designed

function

you
need to protect yourself.

I agree that anyone using this function should test their format
strings carefully.

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

Ok. I'm having trouble seeing this justified as a bug fix - the docs
clearly state our behavior is intentional. Improved behavior, yes, but
that's a feature and we're in beta2. Please be specific if you believe

I've

misinterpreted project policies on this matter.

I would not vote to back-patch a change in this area because I think
that could break SQL code that works today. But I think the current
behavior is pretty well indefensible. It's neither intuitive nor
compatible with Oracle. And there is plenty of existing precedent for
making this sort of change during the beta period. We regard it as a
bug fix which is too dangerous to back-patch; therefore, it is neither
subject to feature freeze nor does it go into existing stable
releases. Whether to treat this particular issue in that particular
way is something that needs to be hammered out in discussion. Tom
raises the valid point that we need to make sure that we've thoroughly
studied this issue and fixed all of the related cases before
committing anything -- we don't want to change the behavior in
9.6beta, release 9.6, and then have to change it again for 9.7. But
there is certainly no project policy which says we can't change this
now for 9.6 if we decide that (1) the current behavior is wrong and
(2) we are quite sure we know how to fix it.

Thank You.

I still conclude that given the general lack of complaints, the fact we are
at beta2, the age of the feature and nature of the "bug", and the
non-existence of a working patch even for HEAD, that 9.6 is not going to
see this behavior changed and you will be loath to back-patch a fix once
9.6 becomes stable. I'll admit the possibility does exist but am not all
that keen on couching every statement of this form I make. If you find my
interpretation to be overly conservative then voice yours. I've been doing
this a while without any complaint so I'm apparently right considerably
more often than I am wrong.

You are welcome to say that you'd consider a patch for 9.6 if its received
by the end of beta2 - or that you are working on one yourself and hope to
have it possibly included in time for beta3. I'll go buy and then eat a
hat if this happens and we have backward incompatibility note for
to_timestamp in the 9.6 docs.

David J.

#27

Alex Ignatov

a.ignatov@postgrespro.ru

over 9 years ago

In reply to: Tom Lane (#20)

Re: Bug in to_timestamp().

On 23.06.2016 20:40, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 23, 2016 at 1:12 PM, David G. Johnston
<david.g.johnston@gmail.com> wrote:

My understanding is that is not going to change for 9.6.

That's exactly what is under discussion here.

I would definitely agree with David on that point. Making to_timestamp
noticeably better on this score seems like a nontrivial project, and
post-beta is not the time for that sort of thing, even if we had full
consensus on what to do. I'd suggest somebody work on a patch and put
it up for review in the next cycle.

Now, if you were to narrowly define the problem as "whether to skip
non-spaces for a space in the format", maybe that could be fixed
post-beta, but I think that's a wrongheaded approach. to_timestamp's
issues with input that doesn't match the format are far wider than that.
IMO we should try to resolve the whole problem with one coherent change,
not make incremental incompatible changes at the margins.

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

regards, tom lane

Totally agree that we need more discussion about error handling in this
function!

Also this behavior is observed in to_date() and to_number() function:

postgres=# SELECT
TO_DATE('2!0!1!6----!0!6-/-/-/-/-/-/-1!/-/-/-/-/-/-/-3!', 'YYYY-MM-DD');
to_date
------------
0002-01-01
(1 row)

postgres=# postgres=# select to_number('1$#@!!,2,%,%4,5,@%5@4..8-',
'999G999D9S');
to_number
-----------
12
(1 row)

On the our side we have some discussions about to write a patch that
will change this incorrect behavior. So stay tuned.

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Alex Ignatov

a.ignatov@postgrespro.ru

over 9 years ago

In reply to: Albe Laurenz (#8)

Re: Bug in to_timestamp().

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 20.06.2016 17:09, Albe Laurenz wrote:

Tom Lane wrote:

I don't necessarily have an opinion yet. I would like to see more than
just an unsupported assertion about what Oracle's behavior is. Also,
how should FM mode affect this?

I can supply what Oracle 12.1 does:

SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' YYYY/MM/DD HH24:MI:SS') AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS') AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS') AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

(to_timestamp_tz behaves the same way.)

So Oracle seems to make no difference between one or more spaces.

Yours,
Laurenz Albe

Guys, do we need to change this behavior or may be you can tell me that
is normal because this and this:

postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD
HH24:MI:SS');
to_timestamp
------------------------
2016-03-01 15:43:36+03
(1 row)

but on the other side we have :

postgres=# select '2016-02-30 15:43:36'::timestamp;
ERROR: date/time field value out of range: "2016-02-30 15:43:36"
LINE 1: select '2016-02-30 15:43:36'::timestamp;

Another bug in to_timestamp/date()?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Steve Crawford

scrawford@pinpointresearch.com

over 9 years ago

In reply to: Alex Ignatov (#28)

Re: Bug in to_timestamp().

My observation has been that the PostgreSQL development group aims for
correctness and the elimination of surprising results. This was part of the
reason to eliminate a number of automatic casts to dates in earlier
versions.

To me, 2016-02-30 is an invalid date that should generate an error.
Automatically and silently changing it to be 2016-03-01 strikes me as a
behavior I'd expect from a certain other open-source database, not
PostgreSQL.

Cheers,
Steve

On Fri, Jun 24, 2016 at 8:52 AM, Alex Ignatov <a.ignatov@postgrespro.ru>
wrote:

Show quoted text

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

On 20.06.2016 17:09, Albe Laurenz wrote:

Tom Lane wrote:

I don't necessarily have an opinion yet. I would like to see more than
just an unsupported assertion about what Oracle's behavior is. Also,
how should FM mode affect this?

I can supply what Oracle 12.1 does:

SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' YYYY/MM/DD HH24:MI:SS')
AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS')
AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', 'YYYY/MM/DD
HH24:MI:SS') AS ts FROM dual;

TS
--------------------------------
2016-06-13 15:43:36.000000000 AD

(to_timestamp_tz behaves the same way.)

So Oracle seems to make no difference between one or more spaces.

Yours,
Laurenz Albe

Guys, do we need to change this behavior or may be you can tell me that

is normal because this and this:

postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD
HH24:MI:SS');
to_timestamp
------------------------
2016-03-01 15:43:36+03
(1 row)

but on the other side we have :

postgres=# select '2016-02-30 15:43:36'::timestamp;
ERROR: date/time field value out of range: "2016-02-30 15:43:36"
LINE 1: select '2016-02-30 15:43:36'::timestamp;

Another bug in to_timestamp/date()?

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Joshua D. Drake

jd@commandprompt.com

over 9 years ago

In reply to: Steve Crawford (#29)

Re: Bug in to_timestamp().

On 06/24/2016 09:26 AM, Steve Crawford wrote:

My observation has been that the PostgreSQL development group aims for
correctness and the elimination of surprising results. This was part of
the reason to eliminate a number of automatic casts to dates in earlier
versions.

To me, 2016-02-30 is an invalid date that should generate an error.
Automatically and silently changing it to be 2016-03-01 strikes me as a
behavior I'd expect from a certain other open-source database, not
PostgreSQL.

I don't think anybody could argue with that in good faith.

Sincerely,

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#31

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Steve Crawford (#29)

Re: Bug in to_timestamp().

On Fri, Jun 24, 2016 at 12:26 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

My observation has been that the PostgreSQL development group aims for
correctness and the elimination of surprising results. This was part of the
reason to eliminate a number of automatic casts to dates in earlier
versions.

To me, 2016-02-30 is an invalid date that should generate an error.
Automatically and silently changing it to be 2016-03-01 strikes me as a
behavior I'd expect from a certain other open-source database, not
PostgreSQL.

I don't particularly disagree with that, but on the other hand, as
mentioned earlier, to_timestamp() is here for Oracle compatibility,
and if it doesn't do what Oracle's function does, then (1) it's not
useful for people migrating from Oracle and (2) we're making up the
behavior out of whole cloth. I think things that we invent ourselves
should reject stuff like this, but in a compatibility function we
might want to, say, have compatibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Robert Haas (#31)

Re: Bug in to_timestamp().

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 24, 2016 at 12:26 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

To me, 2016-02-30 is an invalid date that should generate an error.

I don't particularly disagree with that, but on the other hand, as
mentioned earlier, to_timestamp() is here for Oracle compatibility,
and if it doesn't do what Oracle's function does, then (1) it's not
useful for people migrating from Oracle and (2) we're making up the
behavior out of whole cloth. I think things that we invent ourselves
should reject stuff like this, but in a compatibility function we
might want to, say, have compatibility.

Agreed, mostly, but ... how far are we prepared to go on that? The one
thing I know about that is different from Oracle and is not something that
most people would consider clearly wrong is the behavior of the FM prefix.
We think it's a prefix that modifies only the next format code; they think
it's a toggle. If we make that act like Oracle, we will silently break an
awful lot of applications, and there will be *no way* to write code that
is correct under both interpretations. (And no, I do not want to hear
"let's fix it with a GUC".) So I'm afraid we're between a rock and a hard
place on that one --- but if we let that stand, the argument that Oracle's
to_timestamp should be treated as right by definition loses a lot of air.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

Gavin Flower

GavinFlower@archidevsys.co.nz

over 9 years ago

In reply to: Robert Haas (#31)

Re: Bug in to_timestamp().

On 25/06/16 08:33, Robert Haas wrote:

On Fri, Jun 24, 2016 at 12:26 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

My observation has been that the PostgreSQL development group aims for
correctness and the elimination of surprising results. This was part of the
reason to eliminate a number of automatic casts to dates in earlier
versions.

To me, 2016-02-30 is an invalid date that should generate an error.
Automatically and silently changing it to be 2016-03-01 strikes me as a
behavior I'd expect from a certain other open-source database, not
PostgreSQL.

I don't particularly disagree with that, but on the other hand, as
mentioned earlier, to_timestamp() is here for Oracle compatibility,
and if it doesn't do what Oracle's function does, then (1) it's not
useful for people migrating from Oracle and (2) we're making up the
behavior out of whole cloth. I think things that we invent ourselves
should reject stuff like this, but in a compatibility function we
might want to, say, have compatibility.

How about a to_timestamp_strict(), in addition?

Its very existence will help people (who bother to read the
documentation) to look more closely at the differences between the
definitions, and allow them to choose the most appropriate for their use
case.

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Joshua D. Drake

jd@commandprompt.com

over 9 years ago

In reply to: Tom Lane (#32)

Re: Bug in to_timestamp().

On 06/24/2016 02:16 PM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 24, 2016 at 12:26 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

To me, 2016-02-30 is an invalid date that should generate an error.

I don't particularly disagree with that, but on the other hand, as
mentioned earlier, to_timestamp() is here for Oracle compatibility,
and if it doesn't do what Oracle's function does, then (1) it's not
useful for people migrating from Oracle and (2) we're making up the
behavior out of whole cloth. I think things that we invent ourselves
should reject stuff like this, but in a compatibility function we
might want to, say, have compatibility.

Agreed, mostly, but ... how far are we prepared to go on that?

We don't at all. Our goal has never been Oracle compatibility. Yes, we
have "made allowances" but we aren't in a position that requires that
anymore.

Let's just do it right.

Sincerely,

/me speaking as someone who handles many, many migrations, none of which
have ever said, "do we have Oracle compatibility available".

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Steve Crawford

scrawford@pinpointresearch.com

over 9 years ago

In reply to: Joshua D. Drake (#34)

Re: Bug in to_timestamp().

On Fri, Jun 24, 2016 at 3:43 PM, Joshua D. Drake <jd@commandprompt.com>
wrote:

On 06/24/2016 02:16 PM, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 24, 2016 at 12:26 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

To me, 2016-02-30 is an invalid date that should generate an error.

I don't particularly disagree with that, but on the other hand, as

mentioned earlier, to_timestamp() is here for Oracle compatibility,
and if it doesn't do what Oracle's function does, then (1) it's not
useful for people migrating from Oracle and (2) we're making up the
behavior out of whole cloth. I think things that we invent ourselves
should reject stuff like this, but in a compatibility function we
might want to, say, have compatibility.

Agreed, mostly, but ... how far are we prepared to go on that?

We don't at all. Our goal has never been Oracle compatibility. Yes, we
have "made allowances" but we aren't in a position that requires that
anymore.

Let's just do it right.

Sincerely,

JD

/me speaking as someone who handles many, many migrations, none of which
have ever said, "do we have Oracle compatibility available".

Tongue (partlyish) in cheek:

Developer: I need a database to support my project. Based on my research
this PostgreSQL thing is awesome so we will use it.

PostgreSQL: Welcome to our community!

Developer: I need to convert a string to a timestamp. This to_timestamp()
function I tried does not operate as I expect based on the documentation.

PostgreSQL: Ah, yes, grasshopper. You are young and do not understand the
Things That Must Not Be Documented . In time you will grow a gray ponytail
and/or white beard and learn the history and ways of every database that
came before. Only then will you come to understand how The Functions
*truly* behave.

Developer: Are you #@%!$ kidding me?

I will allow that there may be selected cases where a good argument could
be made for intentionally overly permissive behavior in the pursuit of
compatibility. But in those cases the documentation should specify clearly
and in detail the deviant behavior and reason for its existence.

As one who selected PostgreSQL from the start, I am more interested in the
functions working correctly.

Cheers,
Steve

#36

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Tom Lane (#32)

Re: Bug in to_timestamp().

On Fri, Jun 24, 2016 at 5:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Fri, Jun 24, 2016 at 12:26 PM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:

To me, 2016-02-30 is an invalid date that should generate an error.

I don't particularly disagree with that, but on the other hand, as
mentioned earlier, to_timestamp() is here for Oracle compatibility,
and if it doesn't do what Oracle's function does, then (1) it's not
useful for people migrating from Oracle and (2) we're making up the
behavior out of whole cloth. I think things that we invent ourselves
should reject stuff like this, but in a compatibility function we
might want to, say, have compatibility.

Agreed, mostly, but ... how far are we prepared to go on that? The one
thing I know about that is different from Oracle and is not something that
most people would consider clearly wrong is the behavior of the FM prefix.
We think it's a prefix that modifies only the next format code; they think
it's a toggle. If we make that act like Oracle, we will silently break an
awful lot of applications, and there will be *no way* to write code that
is correct under both interpretations. (And no, I do not want to hear
"let's fix it with a GUC".) So I'm afraid we're between a rock and a hard
place on that one --- but if we let that stand, the argument that Oracle's
to_timestamp should be treated as right by definition loses a lot of air.

Well, I think that you're making several logical jumps that I
personally would decline to make. First, I don't think every issue
with these functions needs to be handled in the same way as every
other. Just because we're willing or unwilling to break compatibility
in one area doesn't mean we have to make the same decision in every
case. We're allowed to take into effect the likely impact of making a
given change in deciding whether it's worth it. Second, if in one or
more areas we decide that a hard backward compatibility break would be
too painful, then I think it's a good idea to ask ourselves how we
could ease the migration pain for people. And I'd phrase that as an
open-ended question rather than "should we add a GUC?".

For example, one idea here is to create a to_timstamp_old() function
that retains the existing behavior of to_timestamp() without any
change, and then add a new to_timestamp() function and fix every
Oracle incompatibility we can find as thoroughly as we can do in one
release cycle. So we fix this whitespace stuff, we fix the FM
modifier, and anything else that comes up, we fix it all. Then, if
people run into trouble with the new behavior when they upgrade, we
tell them that they can either fix their application or, if they want
the old behavior, they can use to_timestamp_old(). We can also
document the differences between to_timestamp() and to_timestamp_old()
so that people can easily figure out whether those differences are
significant to them.

Another idea is to add an optional third argument to to_timestamp()
that can be used to set compatibility behaviors.

I'm not altogether convinced that it's worth the effort to provide
lots of backward-compatibility here. Presumably, only a small
percentage of people use to_timestamp(), and only a percentage of
those are going to rely on the details we're talking about changing.
So it might be that if we just up and change this, a few people will
be grumpy and then they'll update their code and that will be it. On
the other hand, maybe it'll be a real pain in the butt for lots of
people and we'll lose users. I don't know how to judge how
significant these changes will be to users, and I think that the level
of impact does matter in deciding what to do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

Bruce Momjian

bruce@momjian.us

over 9 years ago

In reply to: Robert Haas (#23)

Re: Bug in to_timestamp().

On Thu, Jun 23, 2016 at 02:00:49PM -0400, Robert Haas wrote:

Ok. I'm having trouble seeing this justified as a bug fix - the docs
clearly state our behavior is intentional. Improved behavior, yes, but
that's a feature and we're in beta2. Please be specific if you believe I've
misinterpreted project policies on this matter.

I would not vote to back-patch a change in this area because I think
that could break SQL code that works today. But I think the current
behavior is pretty well indefensible. It's neither intuitive nor
compatible with Oracle. And there is plenty of existing precedent for
making this sort of change during the beta period. We regard it as a
bug fix which is too dangerous to back-patch; therefore, it is neither
subject to feature freeze nor does it go into existing stable
releases. Whether to treat this particular issue in that particular
way is something that needs to be hammered out in discussion. Tom
raises the valid point that we need to make sure that we've thoroughly
studied this issue and fixed all of the related cases before
committing anything -- we don't want to change the behavior in
9.6beta, release 9.6, and then have to change it again for 9.7. But
there is certainly no project policy which says we can't change this
now for 9.6 if we decide that (1) the current behavior is wrong and
(2) we are quite sure we know how to fix it.

If you are not going to backpatch this for compatibility reasons, I
don't think changing it during beta makes sense either because you open
the chance of breaking applications that have already been tested on
earlier 9.6 betas. This would only make sense if the to_timestamp bugs
were new in 9.6.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: Tom Lane (#24)

Re: Bug in to_timestamp().

On 23.06.2016 21:02, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 23, 2016 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

I'm not averse to some further study of those issues, and I think the
first two are closely related. The third one strikes me as a somewhat
separate consideration that doesn't need to be addressed by the same
patch.

If you think those issues are not interrelated, you have not thought
about it carefully enough.

As an example, what we can do to handle not-expected-width fields is
very different if the format is "DDMMYY" versus if it is "DD-MM-YY".
In the first case we have little choice but to believe that each
field is exactly two digits wide. In the second case, depending on
how we decide to define matching of "-", we might be able to allow
the field widths to vary so that they're effectively "whatever is
between the dashes". But that would require insisting that "-"
match a "-", or at least a non-alphanumeric, which is not how it
behaves today.

I don't want to twiddle these behaviors in 9.6 and then again next year.

regards, tom lane

Hi,

I want to start work on this patch.

As a conclusion:
- need a decision about three questions:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

- nobody wants solve this issue in 9.6.

And I have question: what about wrong input in date argument? For
example, from Alex's message:

postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD
HH24:MI:SS');
to_timestamp
------------------------
2016-03-01 15:43:36+03
(1 row)

Here '2016-02-30' is wrong date. I didn't see any conclusion about this
case in the thread.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Pavel Stehule

pavel.stehule@gmail.com

over 9 years ago

In reply to: Artur Zakirov (#38)

Re: Bug in to_timestamp().

2016-07-14 11:05 GMT+02:00 Artur Zakirov <a.zakirov@postgrespro.ru>:

On 23.06.2016 21:02, Tom Lane wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Jun 23, 2016 at 1:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

At the very least I'd want to see a thought-through proposal that
addresses all three of these interrelated points:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

I'm not averse to some further study of those issues, and I think the

first two are closely related. The third one strikes me as a somewhat
separate consideration that doesn't need to be addressed by the same
patch.

If you think those issues are not interrelated, you have not thought
about it carefully enough.

As an example, what we can do to handle not-expected-width fields is
very different if the format is "DDMMYY" versus if it is "DD-MM-YY".
In the first case we have little choice but to believe that each
field is exactly two digits wide. In the second case, depending on
how we decide to define matching of "-", we might be able to allow
the field widths to vary so that they're effectively "whatever is
between the dashes". But that would require insisting that "-"
match a "-", or at least a non-alphanumeric, which is not how it
behaves today.

I don't want to twiddle these behaviors in 9.6 and then again next year.

regards, tom lane

Hi,

I want to start work on this patch.

As a conclusion:
- need a decision about three questions:

* what should a space in the format match
* what should a non-space, non-format-code character in the format match
* how should we handle fields that are not exactly the width suggested
by the format

- nobody wants solve this issue in 9.6.

And I have question: what about wrong input in date argument? For example,
from Alex's message:

postgres=# SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD

HH24:MI:SS');
to_timestamp
------------------------
2016-03-01 15:43:36+03
(1 row)

Here '2016-02-30' is wrong date. I didn't see any conclusion about this
case in the thread.

last point was discussed in thread related to to_date_valid function.

Regards

Pavel

Show quoted text

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#40

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: Pavel Stehule (#39)

1 attachment(s)

Re: Bug in to_timestamp().

Hello,

On 14.07.2016 12:16, Pavel Stehule wrote:

last point was discussed in thread related to to_date_valid function.

Regards

Pavel

Thank you.

Here is my patch. It is a proof of concept.

Date/Time Formatting
--------------------

There are changes in date/time formatting rules:

- now to_timestamp() and to_date() skip spaces in the input string and
in the formatting string unless FX option is used, as Amul Sul wrote on
first message of this thread. But Ex.2 gives an error now with this
patch (should we fix this too?).

- in the code space characters and separator characters have different
types of FormatNode. Separator characters are characters ',', '-', '.',
'/' and ':'. This is done to have different rules of formatting to space
and separator characters.
If FX option isn't used then PostgreSQL do not insist that separator in
the formatting string should match separator in the formatting string.
But count of separators should be equal with or without FX option.

- now PostgreSQL check is there a closing quote. Otherwise the error is
raised.

Still PostgreSQL do not insist that text character in the formatting
string should match text character in the input string. It is not
obvious if this should be fixed. Because we may have different character
case or character with accent mark or without accent mark.
But I suppose that it is not right just check text character count. For
example, there is unicode version of space character U+00A0.

Code changes
------------

- new defines:

#define NODE_TYPE_SEPARATOR 4
#define NODE_TYPE_SPACE 5

- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.

This patch do not handle all noticed issues in this thread, since still
there is not consensus about them. So this patch in a proof of concept
status and it can be changed.

Of course this patch can be completely wrong. But it tries to introduce
more formal rules for formatting.

I will be grateful for notes and remarks.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

to-timestamp-format-checking.patchtext/x-patch; name=to-timestamp-format-checking.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7830334..5656245 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6083,9 +6083,10 @@ SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
-       <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       skip multiple blank spaces in the input string and in the formatting
+       string unless the <literal>FX</literal> option is used. For example,
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal>
+       and <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6121,6 +6122,19 @@ SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        skips two input characters.
       </para>
      </listitem>
+     
+     <listitem>
+      <para>
+       In <function>to_date</> and <function>to_timestamp</> separator
+       characters <literal>','</literal>, <literal>'-'</literal>,
+       <literal>'.'</literal>, <literal>'/'</literal> and <literal>':'</literal>
+       outside of double-quoted strings skip the number of input characters
+       contained in the string unless the <literal>FX</literal> option is used.
+       If <literal>FX</literal> option is specified then consumed separator
+       character in the formatting string must match the separator character
+       in the input string.
+      </para>
+     </listitem>
 
      <listitem>
       <para>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..ef39df9 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -947,6 +949,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -963,7 +966,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1036,6 +1038,16 @@ suff_search(char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(char *str)
+{
+	return (*str == ',' ||
+			*str == '-' ||
+			*str == '.' ||
+			*str == '/' ||
+			*str == ':');
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1237,9 +1249,10 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+				in_backslash = false;
 	int			node_set = 0,
-				suffix,
-				last = 0;
+				suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1251,6 +1264,49 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid escape sequence")));
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+				str++;
+				in_text = false;
+			}
+			else if (*str == '\\')
+			{
+				str++;
+				in_backslash = true;
+			}
+			else
+			{
+				n->type = NODE_TYPE_CHAR;
+				n->character = *str;
+				n->key = NULL;
+				n->suffix = 0;
+				n++;
+				str++;
+			}
+			continue;
+		}
+
 		/*
 		 * Prefix
 		 */
@@ -1290,48 +1346,30 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 		else if (*str)
 		{
-			/*
-			 * Special characters '\' and '"'
-			 */
-			if (*str == '"' && last != '\\')
+			if (*str == '"')
 			{
-				int			x = 0;
-
-				while (*(++str))
-				{
-					if (*str == '"' && x != '\\')
-					{
-						str++;
-						break;
-					}
-					else if (*str == '\\' && x != '\\')
-					{
-						x = '\\';
-						continue;
-					}
-					n->type = NODE_TYPE_CHAR;
-					n->character = *str;
-					n->key = NULL;
-					n->suffix = 0;
-					++n;
-					x = *str;
-				}
+				in_text = true;
 				node_set = 0;
-				suffix = 0;
-				last = 0;
+				str++;
 			}
-			else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"')
+			else if (*str == '\\')
 			{
-				last = *str;
+				in_backslash = true;
+				node_set = 0;
 				str++;
 			}
-			else if (*str)
+			else
 			{
-				n->type = NODE_TYPE_CHAR;
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				n->character = *str;
 				n->key = NULL;
 				node_set = 1;
-				last = 0;
 				str++;
 			}
 		}
@@ -1348,6 +1386,17 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 	}
 
+	if (in_backslash)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("invalid escape sequence")));
+
+	/* If we didn't meet closing quotes */
+	if (in_text)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("unexpected end of format string, expected '\"' character")));
+
 	n->type = NODE_TYPE_END;
 	n->suffix = 0;
 	return;
@@ -2081,20 +2130,6 @@ adjust_partial_year_to_2020(int year)
 		return year;
 }
 
-
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2164,11 +2199,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2948,20 +2978,71 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	char	   *s;
 	int			len,
 				value;
-	bool		fx_mode = false;
+	bool		in_space = false,
+				fx_mode = false;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE)
+		{
+			/*
+			 * Previous node was a space node. In non FX (fixed format) mode we
+			 * skiped all spaces in the previous step.
+			 */
+			if (in_space && !fx_mode)
+				continue;
+
+			if (!isspace(*s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("expected space character in given string"),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for space field.")));
+			s++;
+			/* If not in FX (fixed format) mode skip other spaces */
+			if (!fx_mode)
+				while (*s != '\0' && isspace(*s))
+					s++;
+
+			in_space = true;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In FX mode we don't insist that consumed separator match the
+			 * format's separator character.
 			 */
+			if (!fx_mode && !is_char_separator(s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("expected separator character in input string"),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for separator field.")));
+			else if (fx_mode && n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected separator \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for separator field.")));
 			s++;
+			in_space = false;
 			continue;
 		}
+		else if (n->type == NODE_TYPE_CHAR)
+		{
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s++;
+			in_space = false;
+			continue;
+		}
+		in_space = false;
 
 		/* Ignore spaces before fields when not in FX (fixed width) mode */
 		if (!fx_mode && n->key->id != DCH_FX)
@@ -3340,15 +3421,17 @@ datetime_to_char_body(TmToChar *tmtc, text *fmt, bool is_interval, Oid collid)
 
 		if ((ent = DCH_cache_search(fmt_str)) == NULL)
 		{
-			ent = DCH_cache_getnew(fmt_str);
+			FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, fmt_str, DCH_keywords,
+			parse_format(buf, fmt_str, DCH_keywords,
 						 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+			ent = DCH_cache_getnew(fmt_str);
+			memcpy(ent->format, buf, sizeof(buf));
 			(ent->format + fmt_len)->type = NODE_TYPE_END;		/* Paranoia? */
 
 #ifdef DEBUG_TO_FROM_CHAR
@@ -3604,15 +3687,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if ((ent = DCH_cache_search(fmt_str)) == NULL)
 			{
-				ent = DCH_cache_getnew(fmt_str);
+				FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 				/*
 				 * Not in the cache, must run parser and save a new
 				 * format-picture to the cache.
 				 */
-				parse_format(ent->format, fmt_str, DCH_keywords,
+				parse_format(buf, fmt_str, DCH_keywords,
 							 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+				ent = DCH_cache_getnew(fmt_str);
+				memcpy(ent->format, buf, sizeof(buf));
 				(ent->format + fmt_len)->type = NODE_TYPE_END;	/* Paranoia? */
 #ifdef DEBUG_TO_FROM_CHAR
 				/* dump_node(ent->format, fmt_len); */
@@ -3981,32 +4066,41 @@ NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree)
 
 		if ((ent = NUM_cache_search(str)) == NULL)
 		{
-			ent = NUM_cache_getnew(str);
+			FormatNode	buf[NUM_CACHE_SIZE + 1];
+
+			zeroize_NUM(Num);
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, str, NUM_keywords,
-						 NULL, NUM_index, NUM_TYPE, &ent->Num);
+			parse_format(buf, str, NUM_keywords,
+						 NULL, NUM_index, NUM_TYPE, Num);
+
+			ent = NUM_cache_getnew(str);
+
+			memcpy(ent->format, buf, sizeof(buf));
+			memcpy(&ent->Num, Num, sizeof(NUMDesc));
 
 			(ent->format + len)->type = NODE_TYPE_END;	/* Paranoia? */
 		}
+		else
+		{
+			/*
+			 * Copy cache to used struct
+			 */
+			Num->flag = ent->Num.flag;
+			Num->lsign = ent->Num.lsign;
+			Num->pre = ent->Num.pre;
+			Num->post = ent->Num.post;
+			Num->pre_lsign_num = ent->Num.pre_lsign_num;
+			Num->need_locale = ent->Num.need_locale;
+			Num->multi = ent->Num.multi;
+			Num->zero_start = ent->Num.zero_start;
+			Num->zero_end = ent->Num.zero_end;
+		}
 
 		format = ent->format;
-
-		/*
-		 * Copy cache to used struct
-		 */
-		Num->flag = ent->Num.flag;
-		Num->lsign = ent->Num.lsign;
-		Num->pre = ent->Num.pre;
-		Num->post = ent->Num.post;
-		Num->pre_lsign_num = ent->Num.pre_lsign_num;
-		Num->need_locale = ent->Num.need_locale;
-		Num->multi = ent->Num.multi;
-		Num->zero_start = ent->Num.zero_start;
-		Num->zero_end = ent->Num.zero_end;
 	}
 
 #ifdef DEBUG_TO_FROM_CHAR
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 1fe02be..d6290f2 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2763,6 +2763,9 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD ---> HH:MI:SS');
+ERROR:  expected separator character in input string
+DETAIL:  The given value did not match any of the allowed values for separator field.
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
          to_timestamp         
 ------------------------------
@@ -2775,8 +2778,20 @@ SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2804,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2825,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected separator "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for separator field.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2912,7 +2942,7 @@ SELECT to_timestamp('  20050302', 'YYYYMMDD');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2942,7 +2972,7 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -2960,13 +2990,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index c81437b..c3da350 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -390,17 +390,23 @@ RESET DateStyle;
 
 SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
+SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD ---> HH:MI:SS');
+
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');

#41

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Artur Zakirov (#40)

Re: Bug in to_timestamp().

On Thursday, 11 August 2016 3:18 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:

Here is my patch. It is a proof of concept.>Date/Time Formatting>-------------------->There are changes in date/time formatting rules:-> now to_timestamp() and to_date() skip spaces in the input string and >in the formatting string unless FX option is used, as Amul Sul wrote on >first message of this thread. But Ex.2 gives an error now with this >patch (should we fix this too?).

Why not, currently we are skipping whitespace exists at the start of input string but not if in format string.
[Skipped… ]

Of course this patch can be completely wrong. But it tries to introduce >more formal rules for formatting.>I will be grateful for notes and remarks.

Following are few scenarios where we break existing behaviour:
SELECT TO_TIMESTAMP('2015-12-31 13:43:36', 'YYYY MM DD HH24 MI SS');SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');SELECT TO_TIMESTAMP('2011*03*18 23^38&15', 'YYYY-MM-DD HH24:MI:SS');SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15', 'YYYY-MM-DD$$$HH24:MI:SS');
But current patch behaviour is not that much bad either at least we have errors, but I am not sure about community acceptance.
I would like to divert communities' attention on following case:SELECT TO_TIMESTAMP('2013--10-01', 'YYYY-MM-DD');
Where the hyphen (-) is not skipped. So ultimately -10 is interpreted using MM as negative 10. So the date goes back by that many months (and probably additional days because of -31), and so the final output becomes 2012-01-30. But the fix is not specific to hyphen case. Ideally the fix would have been to handle it in from_char_parse_int(). Here, -10 is converted to int using strtol. May be we could have done it using strtoul(). Is there any intention behind not considering signed integers versus unsigned ones ?
Another is, shouldn’t we have error in following cases? SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS'); SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD HH24:MI:SS');

Thanks & Regards,Amul Sul

#42

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: amul sul (#41)

Re: Bug in to_timestamp().

On Mon, Aug 15, 2016 at 10:56 AM, amul sul <sul_amul@yahoo.co.in> wrote:

On Thursday, 11 August 2016 3:18 PM, Artur Zakirov
<a.zakirov@postgrespro.ru> wrote:

Here is my patch. It is a proof of concept.
Date/Time Formatting
--------------------
There are changes in date/time formatting rules:

-> now to_timestamp() and to_date() skip spaces in the input string and

in the formatting string unless FX option is used, as Amul Sul wrote on
first message of this thread. But Ex.2 gives an error now with this
patch (should we fix this too?).

Why not, currently we are skipping whitespace exists at the start of input
string but not if in format string.

[Skipped… ]

Of course this patch can be completely wrong. But it tries to introduce
more formal rules for formatting.
I will be grateful for notes and remarks.

Following are few scenarios where we break existing behaviour:

SELECT TO_TIMESTAMP('2015-12-31 13:43:36', 'YYYY MM DD HH24 MI SS');
SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03*18 23^38&15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15', 'YYYY-MM-DD$$$HH24:MI:SS');

But current patch behaviour is not that much bad either at least we have
errors, but I am not sure about community acceptance.

I would like to divert communities' attention on following case:
SELECT TO_TIMESTAMP('2013--10-01', 'YYYY-MM-DD');

Where the hyphen (-) is not skipped. So ultimately -10 is interpreted using
MM as negative 10. So the date goes back by that many months (and probably
additional days because of -31), and so the final output becomes 2012-01-30.
But the fix is not specific to hyphen case. Ideally the fix would have been
to handle it in from_char_parse_int(). Here, -10 is converted to int using
strtol. May be we could have done it using strtoul(). Is there any intention
behind not considering signed integers versus unsigned ones ?

Another is, shouldn’t we have error in following cases?
SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD HH24:MI:SS');

Well, what's the Oracle behavior in any of these cases? I don't think
we can decide to change any of this behavior without knowing that. If
a proposed behavior change is incompatible with our previous releases,
I think it'd better at least be more compatible with Oracle.
Otherwise, we're just changing from an established behavior that we
invented ourselves to a new behavior we invented ourselves, which is
only worthwhile if it's absolutely clear that the new behavior is way
better.

(Also, note that text formatted email is generally preferred to HTML
on this mailing list; the fact that your email is in a different font
than the rest of the thread makes it hard to read.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Robert Haas (#42)

Re: Bug in to_timestamp().

Monday, 15 August 2016 9:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Aug 15, 2016 at 10:56 AM, amul sul <sul_amul@yahoo.co.in> wrote:

On Thursday, 11 August 2016 3:18 PM, Artur Zakirov
<a.zakirov@postgrespro.ru> wrote:

[Skipped..]

Well, what's the Oracle behavior in any of these cases? I don't think
we can decide to change any of this behavior without knowing that. If
a proposed behavior change is incompatible with our previous releases,
I think it'd better at least be more compatible with Oracle.
Otherwise, we're just changing from an established behavior that we
invented ourselves to a new behavior we invented ourselves, which is
only worthwhile if it's absolutely clear that the new behavior is way
better.

Following cases works as expected on Oracle (except 3rd one asking input value for &15).

SELECT TO_TIMESTAMP('2015-12-31 13:43:36', 'YYYY MM DD HH24 MI SS');
SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03*18 23^38&15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15', 'YYYY-MM-DD$$$HH24:MI:SS');

And rest throwing error as shown below:

SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');

ERROR at line 1:
ORA-01850: hour must be between 0 and 23

SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD HH24:MI:SS');

ERROR at line 1:
ORA-01839: date not valid for month specified

(Also, note that text formatted email is generally preferred to HTML
on this mailing list; the fact that your email is in a different font
than the rest of the thread makes it hard to read.)

Understood. Will try to follow this, thanks.

Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: Robert Haas (#42)

Re: Bug in to_timestamp().

On 15.08.2016 19:28, Robert Haas wrote:

Well, what's the Oracle behavior in any of these cases? I don't think
we can decide to change any of this behavior without knowing that. If
a proposed behavior change is incompatible with our previous releases,
I think it'd better at least be more compatible with Oracle.
Otherwise, we're just changing from an established behavior that we
invented ourselves to a new behavior we invented ourselves, which is
only worthwhile if it's absolutely clear that the new behavior is way
better.

1 - Oracle's output for first queries is:

-> SELECT TO_TIMESTAMP('2015-12-31 13:43:36', 'YYYY MM DD HH24 MI SS')
FROM dual;

TO_TIMESTAMP('2015-12-3113:43:36','YYYYMMDDHH24MISS')
---------------------------------------------------------------------------
31-DEC-15 01.43.36.000000000 PM

-> SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS')
FROM dual;

TO_TIMESTAMP('2011$03!1823_38_15','YYYY-MM-DDHH24:MI:SS')
---------------------------------------------------------------------------
18-MAR-11 11.38.15.000000000 PM

-> SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15',
'YYYY-MM-DD$$$HH24:MI:SS') FROM dual;

TO_TIMESTAMP('2011*03!18#%23^38$15','YYYY-MM-DD$$$HH24:MI:SS')
---------------------------------------------------------------------------
18-MAR-11 11.38.15.000000000 PM

PostgreSQL with the patch gives "ERROR: expected space character in
given string". I will fix this.

2 - Oracle's output for query with hyphen is:

-> SELECT TO_TIMESTAMP('2013--10-01', 'YYYY-MM-DD') FROM dual;
SELECT TO_TIMESTAMP('2013--10-01', 'YYYY-MM-DD') FROM dual
*
ERROR at line 1:
ORA-01843: not a valid month

Here PostgreSQL with the patch does not give an error. So I will fix
this too.

3 - The last two queries give an error. This patch do not handle such
queries intentionally, because there is the thread
/messages/by-id/57786490.9010201@wars-nicht.de .
That thread concerns to_date() function. But it should concerns
to_timestamp() also. So I suppose that should be a different patch for
this last case.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#45

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: Artur Zakirov (#44)

2 attachment(s)

Re: Bug in to_timestamp().

I attached new patch "0001-to-timestamp-format-checking-v2.patch". It
fixes behaviour for Amul's scenarious:

Following are few scenarios where we break existing behaviour:

SELECT TO_TIMESTAMP('2015-12-31 13:43:36', 'YYYY MM DD HH24 MI SS');
SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03*18 23^38&15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15', 'YYYY-MM-DD$$$HH24:MI:SS');

But current patch behaviour is not that much bad either at least we have errors, but I am not sure about community acceptance.

I would like to divert communities' attention on following case:
SELECT TO_TIMESTAMP('2013--10-01', 'YYYY-MM-DD');

For queries above the patch gives an output without any error.

Another is, shouldn’t we have error in following cases?
SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD HH24:MI:SS');

I attached second patch "0002-to-timestamp-validation-v2.patch". With it
PostgreSQL perform additional checks for date and time. But as I wrote
there is another patch in the thread "to_date_valid()" wich differs from
this patch.

Sincerely,
--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v2.patchtext/x-patch; name=0001-to-timestamp-format-checking-v2.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7830334..559c55f 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6083,9 +6083,12 @@ SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
+       skip multiple blank spaces and printable non letter and non digit
+       characters in the input string and in the formatting string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal>,
+       <literal>to_timestamp('2000----JUN', 'YYYY MON')</literal>
+       and <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..b14678d 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -947,6 +949,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -963,7 +966,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1036,6 +1038,17 @@ suff_search(char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(char *str)
+{
+	return ((pg_mblen(str) == 1) &&
+			/* printable character, but not letter and digit */
+			((*str >= '!' && *str <= '/') ||
+			 (*str >= ':' && *str <= '@') ||
+			 (*str >= '[' && *str <= '`') ||
+			 (*str >= '{' && *str <= '~')));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1237,9 +1250,10 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+				in_backslash = false;
 	int			node_set = 0,
-				suffix,
-				last = 0;
+				suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1251,6 +1265,49 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid escape sequence")));
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+				str++;
+				in_text = false;
+			}
+			else if (*str == '\\')
+			{
+				str++;
+				in_backslash = true;
+			}
+			else
+			{
+				n->type = NODE_TYPE_CHAR;
+				n->character = *str;
+				n->key = NULL;
+				n->suffix = 0;
+				n++;
+				str++;
+			}
+			continue;
+		}
+
 		/*
 		 * Prefix
 		 */
@@ -1290,48 +1347,30 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 		else if (*str)
 		{
-			/*
-			 * Special characters '\' and '"'
-			 */
-			if (*str == '"' && last != '\\')
+			if (*str == '"')
 			{
-				int			x = 0;
-
-				while (*(++str))
-				{
-					if (*str == '"' && x != '\\')
-					{
-						str++;
-						break;
-					}
-					else if (*str == '\\' && x != '\\')
-					{
-						x = '\\';
-						continue;
-					}
-					n->type = NODE_TYPE_CHAR;
-					n->character = *str;
-					n->key = NULL;
-					n->suffix = 0;
-					++n;
-					x = *str;
-				}
+				in_text = true;
 				node_set = 0;
-				suffix = 0;
-				last = 0;
+				str++;
 			}
-			else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"')
+			else if (*str == '\\')
 			{
-				last = *str;
+				in_backslash = true;
+				node_set = 0;
 				str++;
 			}
-			else if (*str)
+			else
 			{
-				n->type = NODE_TYPE_CHAR;
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				n->character = *str;
 				n->key = NULL;
 				node_set = 1;
-				last = 0;
 				str++;
 			}
 		}
@@ -1348,6 +1387,17 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 	}
 
+	if (in_backslash)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("invalid escape sequence")));
+
+	/* If we didn't meet closing quotes */
+	if (in_text)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("unexpected end of format string, expected '\"' character")));
+
 	n->type = NODE_TYPE_END;
 	n->suffix = 0;
 	return;
@@ -2081,20 +2131,6 @@ adjust_partial_year_to_2020(int year)
 		return year;
 }
 
-
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2164,11 +2200,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2952,23 +2983,62 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we skip spaces and separator
+			 * characters.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace(*s) || is_char_separator(s))
+					s++;
+				continue;
+			}
+
+			/* Checks for FX mode */
+			if (n->type == NODE_TYPE_SPACE && !isspace(*s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+			else if (n->type == NODE_TYPE_SEPARATOR && n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+
+			s++;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_CHAR)
+		{
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s++;
 			continue;
 		}
 
-		/* Ignore spaces before fields when not in FX (fixed width) mode */
+		/* Ignore spaces before fields when not in FX (fixed * width) mode */
 		if (!fx_mode && n->key->id != DCH_FX)
 		{
-			while (*s != '\0' && isspace((unsigned char) *s))
+			while (*s != '\0' && (isspace((unsigned char) *s)))
 				s++;
 		}
+		else if (isspace(*s) || is_char_separator(s))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("unexpected character \"%c\"", *s),
+					 errdetail("The given value did not match any of the allowed "
+							   "values for this field.")));
 
 		from_char_set_mode(out, n->key->date_mode);
 
@@ -3340,15 +3410,17 @@ datetime_to_char_body(TmToChar *tmtc, text *fmt, bool is_interval, Oid collid)
 
 		if ((ent = DCH_cache_search(fmt_str)) == NULL)
 		{
-			ent = DCH_cache_getnew(fmt_str);
+			FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, fmt_str, DCH_keywords,
+			parse_format(buf, fmt_str, DCH_keywords,
 						 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+			ent = DCH_cache_getnew(fmt_str);
+			memcpy(ent->format, buf, sizeof(buf));
 			(ent->format + fmt_len)->type = NODE_TYPE_END;		/* Paranoia? */
 
 #ifdef DEBUG_TO_FROM_CHAR
@@ -3604,15 +3676,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if ((ent = DCH_cache_search(fmt_str)) == NULL)
 			{
-				ent = DCH_cache_getnew(fmt_str);
+				FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 				/*
 				 * Not in the cache, must run parser and save a new
 				 * format-picture to the cache.
 				 */
-				parse_format(ent->format, fmt_str, DCH_keywords,
+				parse_format(buf, fmt_str, DCH_keywords,
 							 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+				ent = DCH_cache_getnew(fmt_str);
+				memcpy(ent->format, buf, sizeof(buf));
 				(ent->format + fmt_len)->type = NODE_TYPE_END;	/* Paranoia? */
 #ifdef DEBUG_TO_FROM_CHAR
 				/* dump_node(ent->format, fmt_len); */
@@ -3981,32 +4055,41 @@ NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree)
 
 		if ((ent = NUM_cache_search(str)) == NULL)
 		{
-			ent = NUM_cache_getnew(str);
+			FormatNode	buf[NUM_CACHE_SIZE + 1];
+
+			zeroize_NUM(Num);
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, str, NUM_keywords,
-						 NULL, NUM_index, NUM_TYPE, &ent->Num);
+			parse_format(buf, str, NUM_keywords,
+						 NULL, NUM_index, NUM_TYPE, Num);
+
+			ent = NUM_cache_getnew(str);
+
+			memcpy(ent->format, buf, sizeof(buf));
+			memcpy(&ent->Num, Num, sizeof(NUMDesc));
 
 			(ent->format + len)->type = NODE_TYPE_END;	/* Paranoia? */
 		}
+		else
+		{
+			/*
+			 * Copy cache to used struct
+			 */
+			Num->flag = ent->Num.flag;
+			Num->lsign = ent->Num.lsign;
+			Num->pre = ent->Num.pre;
+			Num->post = ent->Num.post;
+			Num->pre_lsign_num = ent->Num.pre_lsign_num;
+			Num->need_locale = ent->Num.need_locale;
+			Num->multi = ent->Num.multi;
+			Num->zero_start = ent->Num.zero_start;
+			Num->zero_end = ent->Num.zero_end;
+		}
 
 		format = ent->format;
-
-		/*
-		 * Copy cache to used struct
-		 */
-		Num->flag = ent->Num.flag;
-		Num->lsign = ent->Num.lsign;
-		Num->pre = ent->Num.pre;
-		Num->post = ent->Num.post;
-		Num->pre_lsign_num = ent->Num.pre_lsign_num;
-		Num->need_locale = ent->Num.need_locale;
-		Num->multi = ent->Num.multi;
-		Num->zero_start = ent->Num.zero_start;
-		Num->zero_end = ent->Num.zero_end;
 	}
 
 #ifdef DEBUG_TO_FROM_CHAR
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 1fe02be..0bf4287 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2912,7 +2945,7 @@ SELECT to_timestamp('  20050302', 'YYYYMMDD');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2942,7 +2975,7 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -2960,13 +2993,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index c81437b..514c124 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');

0002-to-timestamp-validation-v2.patchtext/x-patch; name=0002-to-timestamp-validation-v2.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 559c55f..e86dd0c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6100,13 +6100,9 @@ SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
        exist to handle input formats that cannot be converted by
-       simple casting.  These functions interpret input liberally,
-       with minimal error checking.  While they produce valid output,
-       the conversion can yield unexpected results.  For example,
-       input to these functions is not restricted by normal ranges,
-       thus <literal>to_date('20096040','YYYYMMDD')</literal> returns
-       <literal>2014-01-17</literal> rather than causing an error.
-       Casting does not have this behavior.
+       simple casting.  These functions have error checking as well as casting.
+       For example, <literal>to_date('20096040','YYYYMMDD')</literal> returns
+       an error.
       </para>
      </listitem>
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b14678d..98d9847 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -3635,7 +3635,10 @@ do_to_timestamp(text *date_txt, text *fmt,
 {
 	FormatNode *format;
 	TmFromChar	tmfc;
-	int			fmt_len;
+	int			fmt_len,
+				fmask = 0,		/* Bit mask for ValidateDate() */
+				dterr;
+	char	   *date_str = NULL;
 
 	ZERO_tmfc(&tmfc);
 	ZERO_tm(tm);
@@ -3646,7 +3649,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (fmt_len)
 	{
 		char	   *fmt_str;
-		char	   *date_str;
 		bool		incache;
 
 		fmt_str = text_to_cstring(fmt);
@@ -3704,7 +3706,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 		DCH_from_char(format, date_str, &tmfc);
 
-		pfree(date_str);
 		pfree(fmt_str);
 		if (!incache)
 			pfree(format);
@@ -3780,6 +3781,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			if (tmfc.bc && tm->tm_year > 0)
 				tm->tm_year = -(tm->tm_year - 1);
 		}
+		fmask |= DTK_M(YEAR);
 	}
 	else if (tmfc.cc)			/* use first year of century */
 	{
@@ -3791,10 +3793,14 @@ do_to_timestamp(text *date_txt, text *fmt,
 		else
 			/* +1 because year == 599 is 600 BC */
 			tm->tm_year = tmfc.cc * 100 + 1;
+		fmask |= DTK_M(YEAR);
 	}
 
 	if (tmfc.j)
+	{
 		j2date(tmfc.j, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+		fmask |= DTK_DATE_M;
+	}
 
 	if (tmfc.ww)
 	{
@@ -3808,6 +3814,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 				isoweekdate2date(tmfc.ww, tmfc.d, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
 			else
 				isoweek2date(tmfc.ww, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 			tmfc.ddd = (tmfc.ww - 1) * 7 + 1;
@@ -3818,11 +3825,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (tmfc.d)
 		tm->tm_wday = tmfc.d - 1;		/* convert to native numbering */
 	if (tmfc.dd)
+	{
 		tm->tm_mday = tmfc.dd;
+		fmask |= DTK_M(DAY);
+	}
 	if (tmfc.ddd)
 		tm->tm_yday = tmfc.ddd;
 	if (tmfc.mm)
+	{
 		tm->tm_mon = tmfc.mm;
+		fmask |= DTK_M(MONTH);
+	}
 
 	if (tmfc.ddd && (tm->tm_mon <= 1 || tm->tm_mday <= 1))
 	{
@@ -3845,6 +3858,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			j0 = isoweek2j(tm->tm_year, 1) - 1;
 
 			j2date(j0 + tmfc.ddd, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 		{
@@ -3867,9 +3881,36 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if (tm->tm_mday <= 1)
 				tm->tm_mday = tmfc.ddd - y[i - 1];
+
+			fmask |= DTK_M(MONTH) | DTK_M(DAY);
 		}
 	}
 
+	/* Validate date with bit mask received above */
+	if (fmask != 0 && date_str)
+	{
+		dterr = ValidateDate(fmask, false, false, false, tm);
+		if (dterr != 0)
+			DateTimeParseError(dterr, date_str, "");
+	}
+
+	/* Do checks for time part */
+#ifdef HAVE_INT64_TIMESTAMP
+	if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR - 1 ||
+		tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
+		*fsec < INT64CONST(0) ||
+		*fsec > USECS_PER_SEC)
+		DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "");
+#else
+	if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR - 1 ||
+		tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
+		*fsec < 0 || *fsec > 1)
+		DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "");
+#endif
+
+	if (date_str)
+		pfree(date_str);
+
 #ifdef HAVE_INT64_TIMESTAMP
 	if (tmfc.ms)
 		*fsec += tmfc.ms * 1000;
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 0bf4287..8ee6569 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3015,7 +3015,7 @@ SELECT to_date('2011   12 18', 'YYYY  MM DD');
 (1 row)
 
 --
--- Check errors for some incorrect usages of to_timestamp()
+-- Check errors for some incorrect usages of to_timestamp() and to_date()
 --
 -- Mixture of date conventions (ISO week and Gregorian):
 SELECT to_timestamp('2005527', 'YYYYIWID');
@@ -3043,6 +3043,12 @@ DETAIL:  Value must be an integer.
 SELECT to_timestamp('10000000000', 'FMYYYY');
 ERROR:  value for "YYYY" in source string is out of range
 DETAIL:  Value must be in the range -2147483648 to 2147483647.
+-- Time overflow:
+SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
+ERROR:  date/time field value out of range: "2016-06-13 99:99:99"
+-- Date overflow:
+SELECT to_date('2016-02-30', 'YYYY-MM-DD');
+ERROR:  date/time field value out of range: "2016-02-30"
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)
 --
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 514c124..327dbfd 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -473,7 +473,7 @@ SELECT to_date('2011  12 18', 'YYYY  MM DD');
 SELECT to_date('2011   12 18', 'YYYY  MM DD');
 
 --
--- Check errors for some incorrect usages of to_timestamp()
+-- Check errors for some incorrect usages of to_timestamp() and to_date()
 --
 
 -- Mixture of date conventions (ISO week and Gregorian):
@@ -494,6 +494,12 @@ SELECT to_timestamp('199711xy', 'YYYYMMDD');
 -- Input that doesn't fit in an int:
 SELECT to_timestamp('10000000000', 'FMYYYY');
 
+-- Time overflow:
+SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
+
+-- Date overflow:
+SELECT to_date('2016-02-30', 'YYYY-MM-DD');
+
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)
 --

#46

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Artur Zakirov (#45)

Re: Bug in to_timestamp().

On Wednesday, August 17, 2016 5:15 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:

I attached new patch "0001-to-timestamp-format-checking-v2.patch". It
fixes behaviour for Amul's scenarious:

Great.

Following are few scenarios where we break existing behaviour:

SELECT TO_TIMESTAMP('2015-12-31 13:43:36', 'YYYY MM DD HH24 MI SS');
SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03*18 23^38&15', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2011*03!18 #%23^38$15', 'YYYY-MM-DD$$$HH24:MI:SS');

But current patch behaviour is not that much bad either at least we have errors, but I am not sure about community acceptance.

I would like to divert communities' attention on following case:
SELECT TO_TIMESTAMP('2013--10-01', 'YYYY-MM-DD');

For queries above the patch gives an output without any error.

Another is, shouldn’t we have error in following cases?
SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
SELECT TO_TIMESTAMP('2016-02-30 15:43:36', 'YYYY-MM-DD HH24:MI:SS');

I attached second patch "0002-to-timestamp-validation-v2.patch". With it
PostgreSQL perform additional checks for date and time. But as I wrote
there is another patch in the thread "to_date_valid()" wich differs from
this patch.

Hmm. I haven't really looked into the code, but with applying both patches it looks precisely imitate Oracle's behaviour. Thanks.

Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: amul sul (#46)

Re: Bug in to_timestamp().

On Wed, Aug 17, 2016 at 10:35 AM, amul sul <sul_amul@yahoo.co.in> wrote:

Hmm. I haven't really looked into the code, but with applying both patches it looks precisely imitate Oracle's behaviour. Thanks.

This is good to hear, but for us to consider applying something like
this, somebody would need to look into the code pretty deeply.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Robert Haas (#47)

Re: Bug in to_timestamp().

On Friday, August 19, 2016 12:42 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 17, 2016 at 10:35 AM, amul sul <sul_amul@yahoo.co.in> wrote:

Hmm. I haven't really looked into the code, but with applying both patches it looks precisely imitate Oracle's behaviour. Thanks.

This is good to hear, but for us to consider applying something like
this, somebody would need to look into the code pretty deeply.

Sure Robert, Im planning to start initial review until next week at the earliest. Thanks

Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Artur Zakirov (#45)

Re: Bug in to_timestamp().

Hi Artur Zakirov,

Please see following review comment for "0001-to-timestamp-format-checking-v2.patch" & share your thought:

#1.

15 + <literal>to_timestamp('2000----JUN', 'YYYY MON')</literal>

Documented as working case, but unfortunatly it does not :

postgres=# SELECT to_timestamp('2000----JUN', 'YYYY MON');
ERROR: invalid value "---" for "MON"
DETAIL: The given value did not match any of the allowed values for this field.

#2.

102 + /* Previous character was a quote */
103 + else if (in_text)
104 + {
105 + if (*str == '"')
106 + {
107 + str++;
108 + in_text = false;
109 + }
110 + else if (*str == '\\')
111 + {
112 + str++;
113 + in_backslash = true;
114 + }
115 + else
116 + {
117 + n->type = NODE_TYPE_CHAR;
118 + n->character = *str;
119 + n->key = NULL;
120 + n->suffix = 0;
121 + n++;
122 + str++;
123 + }
124 + continue;
125 + }
126 +

NODE_TYPE_CHAR assumption in else block seems incorrect. What if we have space after double quote? It will again have incorrect output without any error, see below:

postgres=# SELECT to_timestamp('Year: 1976, Month: May, Day: 16',
postgres(# '" Year:" YYYY, "Month:" FMMonth, "Day:" DD');
to_timestamp
------------------------------
0006-05-16 00:00:00-07:52:58
(1 row)

I guess, we might need NODE_TYPE_SEPARATOR and NODE_TYPE_SPACE check as well?

#3.

296 - /* Ignore spaces before fields when not in FX (fixed width) mode */
297 + /* Ignore spaces before fields when not in FX (fixed * width) mode */
298 if (!fx_mode && n->key->id != DCH_FX)
299 {
300 - while (*s != '\0' && isspace((unsigned char) *s))
301 + while (*s != '\0' && (isspace((unsigned char) *s)))
302 s++;

Unnecessary hunk?
We should not touch any code unless it necessary to implement proposed feature, otherwise it will add unnecessary diff to the patch and eventually extra burden to review the code. Similarly hunk in the patch @ line # 313 - 410, nothing to do with to_timestamp behaviour improvement, IIUC.

If you think this changes need to be in, please submit separate cleanup-patch.

I attached second patch "0002-to-timestamp-validation-v2.patch". With it
PostgreSQL perform additional checks for date and time. But as I wrote
there is another patch in the thread "to_date_valid()" wich differs from
this patch.

@community : I am not sure what to do with this patch, should we keep it as separate enhancement?

Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: Robert Haas (#47)

2 attachment(s)

Re: Bug in to_timestamp().

Sorry. I did not get last two mails from Amul. Don't know why. So I
reply to another mail.

Documented as working case, but unfortunatly it does not :

postgres=# SELECT to_timestamp('2000----JUN', 'YYYY MON');
ERROR: invalid value "---" for "MON"
DETAIL: The given value did not match any of the allowed values for this field.

Indeed! Fixed it. Now this query executes without error. Added this case
to tests.

NODE_TYPE_CHAR assumption in else block seems incorrect. What if we have space after double quote? It will again have incorrect output without any error, see below:

postgres=# SELECT to_timestamp('Year: 1976, Month: May, Day: 16',
postgres(# '" Year:" YYYY, "Month:" FMMonth, "Day:" DD');
to_timestamp
------------------------------
0006-05-16 00:00:00-07:52:58
(1 row)

I guess, we might need NODE_TYPE_SEPARATOR and NODE_TYPE_SPACE check as well?

Agree. Fixed and added to tests.

Unnecessary hunk?
We should not touch any code unless it necessary to implement proposed feature, otherwise it will add unnecessary diff to the patch and eventually extra burden to review the code. Similarly hunk in the patch @ line # 313 - 410, nothing to do with to_timestamp behaviour improvement, IIUC.

If you think this changes need to be in, please submit separate cleanup-patch.

Fixed it. It was a typo.
About lines # 313 - 410. It is necessary to avoid bugs. I wrote aboud it
in
/messages/by-id/b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru
:

- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.

For example, you can test incorrect inputs for to_timestamp(). Try to
execute such query several times.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v3.patchtext/x-patch; name=0001-to-timestamp-format-checking-v3.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 6355300..0fe50e1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6146,9 +6146,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
+       skip multiple blank spaces and printable non letter and non digit
+       characters in the input string and in the formatting string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal>,
+       <literal>to_timestamp('2000----JUN', 'YYYY MON')</literal>
+       and <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..a3dbcaf 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -947,6 +949,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -963,7 +966,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1036,6 +1038,17 @@ suff_search(char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(char *str)
+{
+	return ((pg_mblen(str) == 1) &&
+			/* printable character, but not letter and digit */
+			((*str >= '!' && *str <= '/') ||
+			 (*str >= ':' && *str <= '@') ||
+			 (*str >= '[' && *str <= '`') ||
+			 (*str >= '{' && *str <= '~')));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1237,9 +1250,10 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+				in_backslash = false;
 	int			node_set = 0,
-				suffix,
-				last = 0;
+				suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1251,6 +1265,55 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid escape sequence")));
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+				str++;
+				in_text = false;
+			}
+			else if (*str == '\\')
+			{
+				str++;
+				in_backslash = true;
+			}
+			else
+			{
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
+				n->character = *str;
+				n->key = NULL;
+				n->suffix = 0;
+				n++;
+				str++;
+			}
+			continue;
+		}
+
 		/*
 		 * Prefix
 		 */
@@ -1290,48 +1353,30 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 		else if (*str)
 		{
-			/*
-			 * Special characters '\' and '"'
-			 */
-			if (*str == '"' && last != '\\')
+			if (*str == '"')
 			{
-				int			x = 0;
-
-				while (*(++str))
-				{
-					if (*str == '"' && x != '\\')
-					{
-						str++;
-						break;
-					}
-					else if (*str == '\\' && x != '\\')
-					{
-						x = '\\';
-						continue;
-					}
-					n->type = NODE_TYPE_CHAR;
-					n->character = *str;
-					n->key = NULL;
-					n->suffix = 0;
-					++n;
-					x = *str;
-				}
+				in_text = true;
 				node_set = 0;
-				suffix = 0;
-				last = 0;
+				str++;
 			}
-			else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"')
+			else if (*str == '\\')
 			{
-				last = *str;
+				in_backslash = true;
+				node_set = 0;
 				str++;
 			}
-			else if (*str)
+			else
 			{
-				n->type = NODE_TYPE_CHAR;
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				n->character = *str;
 				n->key = NULL;
 				node_set = 1;
-				last = 0;
 				str++;
 			}
 		}
@@ -1348,6 +1393,17 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 	}
 
+	if (in_backslash)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("invalid escape sequence")));
+
+	/* If we didn't meet closing quotes */
+	if (in_text)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("unexpected end of format string, expected '\"' character")));
+
 	n->type = NODE_TYPE_END;
 	n->suffix = 0;
 	return;
@@ -2081,20 +2137,6 @@ adjust_partial_year_to_2020(int year)
 		return year;
 }
 
-
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2164,11 +2206,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2947,19 +2984,56 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	FormatNode *n;
 	char	   *s;
 	int			len,
-				value;
+				value,
+				prev_type;
 	bool		fx_mode = false;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we skip spaces and separator
+			 * characters.
 			 */
+			if (!fx_mode)
+			{
+				if (isspace(*s) || is_char_separator(s))
+					s++;
+				prev_type = n->type;
+				continue;
+			}
+
+			/* Checks for FX mode */
+			if (n->type == NODE_TYPE_SPACE && !isspace(*s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+			else if (n->type == NODE_TYPE_SEPARATOR && n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+
 			s++;
+			prev_type = n->type;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_CHAR)
+		{
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s++;
+			prev_type = n->type;
 			continue;
 		}
 
@@ -2968,7 +3042,20 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 		{
 			while (*s != '\0' && isspace((unsigned char) *s))
 				s++;
+			/*
+			 * Ignore separator characters if previous node was space or 
+			 * separator and current is not.
+			 */
+			if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
+				while (*s != '\0' && is_char_separator(s))
+					s++;
 		}
+		else if (isspace(*s) || is_char_separator(s))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("unexpected character \"%c\"", *s),
+					 errdetail("The given value did not match any of the allowed "
+							   "values for this field.")));
 
 		from_char_set_mode(out, n->key->date_mode);
 
@@ -3207,6 +3294,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+		prev_type = n->type;
 	}
 }
 
@@ -3340,15 +3428,17 @@ datetime_to_char_body(TmToChar *tmtc, text *fmt, bool is_interval, Oid collid)
 
 		if ((ent = DCH_cache_search(fmt_str)) == NULL)
 		{
-			ent = DCH_cache_getnew(fmt_str);
+			FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, fmt_str, DCH_keywords,
+			parse_format(buf, fmt_str, DCH_keywords,
 						 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+			ent = DCH_cache_getnew(fmt_str);
+			memcpy(ent->format, buf, sizeof(buf));
 			(ent->format + fmt_len)->type = NODE_TYPE_END;		/* Paranoia? */
 
 #ifdef DEBUG_TO_FROM_CHAR
@@ -3604,15 +3694,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if ((ent = DCH_cache_search(fmt_str)) == NULL)
 			{
-				ent = DCH_cache_getnew(fmt_str);
+				FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 				/*
 				 * Not in the cache, must run parser and save a new
 				 * format-picture to the cache.
 				 */
-				parse_format(ent->format, fmt_str, DCH_keywords,
+				parse_format(buf, fmt_str, DCH_keywords,
 							 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+				ent = DCH_cache_getnew(fmt_str);
+				memcpy(ent->format, buf, sizeof(buf));
 				(ent->format + fmt_len)->type = NODE_TYPE_END;	/* Paranoia? */
 #ifdef DEBUG_TO_FROM_CHAR
 				/* dump_node(ent->format, fmt_len); */
@@ -3981,32 +4073,41 @@ NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree)
 
 		if ((ent = NUM_cache_search(str)) == NULL)
 		{
-			ent = NUM_cache_getnew(str);
+			FormatNode	buf[NUM_CACHE_SIZE + 1];
+
+			zeroize_NUM(Num);
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, str, NUM_keywords,
-						 NULL, NUM_index, NUM_TYPE, &ent->Num);
+			parse_format(buf, str, NUM_keywords,
+						 NULL, NUM_index, NUM_TYPE, Num);
+
+			ent = NUM_cache_getnew(str);
+
+			memcpy(ent->format, buf, sizeof(buf));
+			memcpy(&ent->Num, Num, sizeof(NUMDesc));
 
 			(ent->format + len)->type = NODE_TYPE_END;	/* Paranoia? */
 		}
+		else
+		{
+			/*
+			 * Copy cache to used struct
+			 */
+			Num->flag = ent->Num.flag;
+			Num->lsign = ent->Num.lsign;
+			Num->pre = ent->Num.pre;
+			Num->post = ent->Num.post;
+			Num->pre_lsign_num = ent->Num.pre_lsign_num;
+			Num->need_locale = ent->Num.need_locale;
+			Num->multi = ent->Num.multi;
+			Num->zero_start = ent->Num.zero_start;
+			Num->zero_end = ent->Num.zero_end;
+		}
 
 		format = ent->format;
-
-		/*
-		 * Copy cache to used struct
-		 */
-		Num->flag = ent->Num.flag;
-		Num->lsign = ent->Num.lsign;
-		Num->pre = ent->Num.pre;
-		Num->post = ent->Num.post;
-		Num->pre_lsign_num = ent->Num.pre_lsign_num;
-		Num->need_locale = ent->Num.need_locale;
-		Num->multi = ent->Num.multi;
-		Num->zero_start = ent->Num.zero_start;
-		Num->zero_end = ent->Num.zero_end;
 	}
 
 #ifdef DEBUG_TO_FROM_CHAR
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 1fe02be..46d4c7b 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,12 +2807,18 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
 (1 row)
 
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
+         to_timestamp         
+------------------------------
+ Sun May 16 00:00:00 1976 PDT
+(1 row)
+
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
          to_timestamp         
 ------------------------------
@@ -2810,6 +2834,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2912,7 +2951,7 @@ SELECT to_timestamp('  20050302', 'YYYYMMDD');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2942,7 +2981,13 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -2960,13 +3005,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index c81437b..966cde1 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,23 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
+
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +416,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -452,6 +466,8 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

0002-to-timestamp-validation-v2.patchtext/x-patch; name=0002-to-timestamp-validation-v2.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 559c55f..e86dd0c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6100,13 +6100,9 @@ SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
        exist to handle input formats that cannot be converted by
-       simple casting.  These functions interpret input liberally,
-       with minimal error checking.  While they produce valid output,
-       the conversion can yield unexpected results.  For example,
-       input to these functions is not restricted by normal ranges,
-       thus <literal>to_date('20096040','YYYYMMDD')</literal> returns
-       <literal>2014-01-17</literal> rather than causing an error.
-       Casting does not have this behavior.
+       simple casting.  These functions have error checking as well as casting.
+       For example, <literal>to_date('20096040','YYYYMMDD')</literal> returns
+       an error.
       </para>
      </listitem>
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b14678d..98d9847 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -3635,7 +3635,10 @@ do_to_timestamp(text *date_txt, text *fmt,
 {
 	FormatNode *format;
 	TmFromChar	tmfc;
-	int			fmt_len;
+	int			fmt_len,
+				fmask = 0,		/* Bit mask for ValidateDate() */
+				dterr;
+	char	   *date_str = NULL;
 
 	ZERO_tmfc(&tmfc);
 	ZERO_tm(tm);
@@ -3646,7 +3649,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (fmt_len)
 	{
 		char	   *fmt_str;
-		char	   *date_str;
 		bool		incache;
 
 		fmt_str = text_to_cstring(fmt);
@@ -3704,7 +3706,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 		DCH_from_char(format, date_str, &tmfc);
 
-		pfree(date_str);
 		pfree(fmt_str);
 		if (!incache)
 			pfree(format);
@@ -3780,6 +3781,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			if (tmfc.bc && tm->tm_year > 0)
 				tm->tm_year = -(tm->tm_year - 1);
 		}
+		fmask |= DTK_M(YEAR);
 	}
 	else if (tmfc.cc)			/* use first year of century */
 	{
@@ -3791,10 +3793,14 @@ do_to_timestamp(text *date_txt, text *fmt,
 		else
 			/* +1 because year == 599 is 600 BC */
 			tm->tm_year = tmfc.cc * 100 + 1;
+		fmask |= DTK_M(YEAR);
 	}
 
 	if (tmfc.j)
+	{
 		j2date(tmfc.j, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+		fmask |= DTK_DATE_M;
+	}
 
 	if (tmfc.ww)
 	{
@@ -3808,6 +3814,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 				isoweekdate2date(tmfc.ww, tmfc.d, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
 			else
 				isoweek2date(tmfc.ww, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 			tmfc.ddd = (tmfc.ww - 1) * 7 + 1;
@@ -3818,11 +3825,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (tmfc.d)
 		tm->tm_wday = tmfc.d - 1;		/* convert to native numbering */
 	if (tmfc.dd)
+	{
 		tm->tm_mday = tmfc.dd;
+		fmask |= DTK_M(DAY);
+	}
 	if (tmfc.ddd)
 		tm->tm_yday = tmfc.ddd;
 	if (tmfc.mm)
+	{
 		tm->tm_mon = tmfc.mm;
+		fmask |= DTK_M(MONTH);
+	}
 
 	if (tmfc.ddd && (tm->tm_mon <= 1 || tm->tm_mday <= 1))
 	{
@@ -3845,6 +3858,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			j0 = isoweek2j(tm->tm_year, 1) - 1;
 
 			j2date(j0 + tmfc.ddd, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 		{
@@ -3867,9 +3881,36 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if (tm->tm_mday <= 1)
 				tm->tm_mday = tmfc.ddd - y[i - 1];
+
+			fmask |= DTK_M(MONTH) | DTK_M(DAY);
 		}
 	}
 
+	/* Validate date with bit mask received above */
+	if (fmask != 0 && date_str)
+	{
+		dterr = ValidateDate(fmask, false, false, false, tm);
+		if (dterr != 0)
+			DateTimeParseError(dterr, date_str, "");
+	}
+
+	/* Do checks for time part */
+#ifdef HAVE_INT64_TIMESTAMP
+	if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR - 1 ||
+		tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
+		*fsec < INT64CONST(0) ||
+		*fsec > USECS_PER_SEC)
+		DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "");
+#else
+	if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR - 1 ||
+		tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
+		*fsec < 0 || *fsec > 1)
+		DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "");
+#endif
+
+	if (date_str)
+		pfree(date_str);
+
 #ifdef HAVE_INT64_TIMESTAMP
 	if (tmfc.ms)
 		*fsec += tmfc.ms * 1000;
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 0bf4287..8ee6569 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3015,7 +3015,7 @@ SELECT to_date('2011   12 18', 'YYYY  MM DD');
 (1 row)
 
 --
--- Check errors for some incorrect usages of to_timestamp()
+-- Check errors for some incorrect usages of to_timestamp() and to_date()
 --
 -- Mixture of date conventions (ISO week and Gregorian):
 SELECT to_timestamp('2005527', 'YYYYIWID');
@@ -3043,6 +3043,12 @@ DETAIL:  Value must be an integer.
 SELECT to_timestamp('10000000000', 'FMYYYY');
 ERROR:  value for "YYYY" in source string is out of range
 DETAIL:  Value must be in the range -2147483648 to 2147483647.
+-- Time overflow:
+SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
+ERROR:  date/time field value out of range: "2016-06-13 99:99:99"
+-- Date overflow:
+SELECT to_date('2016-02-30', 'YYYY-MM-DD');
+ERROR:  date/time field value out of range: "2016-02-30"
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)
 --
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 514c124..327dbfd 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -473,7 +473,7 @@ SELECT to_date('2011  12 18', 'YYYY  MM DD');
 SELECT to_date('2011   12 18', 'YYYY  MM DD');
 
 --
--- Check errors for some incorrect usages of to_timestamp()
+-- Check errors for some incorrect usages of to_timestamp() and to_date()
 --
 
 -- Mixture of date conventions (ISO week and Gregorian):
@@ -494,6 +494,12 @@ SELECT to_timestamp('199711xy', 'YYYYMMDD');
 -- Input that doesn't fit in an int:
 SELECT to_timestamp('10000000000', 'FMYYYY');
 
+-- Time overflow:
+SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
+
+-- Date overflow:
+SELECT to_date('2016-02-30', 'YYYY-MM-DD');
+
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)
 --

#51

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Artur Zakirov (#50)

Re: Bug in to_timestamp().

Hi Artur Zakirov,

0001-to-timestamp-format-checking-v3.patch looks pretty reasonable to me, other than following concern:

#1.
Whitespace @ line # 317.

#2. Warning at compilation;

formatting.c: In function ‘do_to_timestamp’:
formatting.c:3049:37: warning: ‘prev_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
^
formatting.c:2988:5: note: ‘prev_type’ was declared here
prev_type;
^

You can avoid this by assigning zero (or introduce NODE_TYPE_INVAL ) to prev_type at following line:

256 + prev_type;

Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#52

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: Artur Zakirov (#50)

2 attachment(s)

Re: Bug in to_timestamp().

Hi,

#1.
Whitespace @ line # 317.

Sorry, fixed.

#2. Warning at compilation;

formatting.c: In function ‘do_to_timestamp’:
formatting.c:3049:37: warning: ‘prev_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
^
formatting.c:2988:5: note: ‘prev_type’ was declared here
prev_type;
^

You can avoid this by assigning zero (or introduce NODE_TYPE_INVAL ) to prev_type at following line:

256 + prev_type;

You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to
execute such query:

SELECT to_timestamp('---2000----JUN', 'YYYY MON');

Will be it a proper behaviour?

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v4.patchtext/x-patch; name=0001-to-timestamp-format-checking-v4.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5c1c4f6..36d8b3e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6146,9 +6146,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
+       skip multiple blank spaces and printable non letter and non digit
+       characters in the input string and in the formatting string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal>,
+       <literal>to_timestamp('2000----JUN', 'YYYY MON')</literal>
+       and <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..7430013 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -947,6 +949,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -963,7 +966,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1036,6 +1038,17 @@ suff_search(char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(char *str)
+{
+	return ((pg_mblen(str) == 1) &&
+			/* printable character, but not letter and digit */
+			((*str >= '!' && *str <= '/') ||
+			 (*str >= ':' && *str <= '@') ||
+			 (*str >= '[' && *str <= '`') ||
+			 (*str >= '{' && *str <= '~')));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1237,9 +1250,10 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+				in_backslash = false;
 	int			node_set = 0,
-				suffix,
-				last = 0;
+				suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1251,6 +1265,55 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid escape sequence")));
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+				str++;
+				in_text = false;
+			}
+			else if (*str == '\\')
+			{
+				str++;
+				in_backslash = true;
+			}
+			else
+			{
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
+				n->character = *str;
+				n->key = NULL;
+				n->suffix = 0;
+				n++;
+				str++;
+			}
+			continue;
+		}
+
 		/*
 		 * Prefix
 		 */
@@ -1290,48 +1353,30 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 		else if (*str)
 		{
-			/*
-			 * Special characters '\' and '"'
-			 */
-			if (*str == '"' && last != '\\')
+			if (*str == '"')
 			{
-				int			x = 0;
-
-				while (*(++str))
-				{
-					if (*str == '"' && x != '\\')
-					{
-						str++;
-						break;
-					}
-					else if (*str == '\\' && x != '\\')
-					{
-						x = '\\';
-						continue;
-					}
-					n->type = NODE_TYPE_CHAR;
-					n->character = *str;
-					n->key = NULL;
-					n->suffix = 0;
-					++n;
-					x = *str;
-				}
+				in_text = true;
 				node_set = 0;
-				suffix = 0;
-				last = 0;
+				str++;
 			}
-			else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"')
+			else if (*str == '\\')
 			{
-				last = *str;
+				in_backslash = true;
+				node_set = 0;
 				str++;
 			}
-			else if (*str)
+			else
 			{
-				n->type = NODE_TYPE_CHAR;
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				n->character = *str;
 				n->key = NULL;
 				node_set = 1;
-				last = 0;
 				str++;
 			}
 		}
@@ -1348,6 +1393,17 @@ parse_format(FormatNode *node, char *str, const KeyWord *kw,
 		}
 	}
 
+	if (in_backslash)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("invalid escape sequence")));
+
+	/* If we didn't meet closing quotes */
+	if (in_text)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("unexpected end of format string, expected '\"' character")));
+
 	n->type = NODE_TYPE_END;
 	n->suffix = 0;
 	return;
@@ -2081,20 +2137,6 @@ adjust_partial_year_to_2020(int year)
 		return year;
 }
 
-
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2164,11 +2206,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2947,19 +2984,56 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	FormatNode *n;
 	char	   *s;
 	int			len,
-				value;
+				value,
+				prev_type = NODE_TYPE_SPACE;
 	bool		fx_mode = false;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we skip spaces and separator
+			 * characters.
 			 */
+			if (!fx_mode)
+			{
+				if (isspace(*s) || is_char_separator(s))
+					s++;
+				prev_type = n->type;
+				continue;
+			}
+
+			/* Checks for FX mode */
+			if (n->type == NODE_TYPE_SPACE && !isspace(*s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+			else if (n->type == NODE_TYPE_SEPARATOR && n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+
 			s++;
+			prev_type = n->type;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_CHAR)
+		{
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s++;
+			prev_type = n->type;
 			continue;
 		}
 
@@ -2968,7 +3042,20 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 		{
 			while (*s != '\0' && isspace((unsigned char) *s))
 				s++;
+			/*
+			 * Ignore separator characters if previous node was space or
+			 * separator and current is not.
+			 */
+			if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
+				while (*s != '\0' && is_char_separator(s))
+					s++;
 		}
+		else if (isspace(*s) || is_char_separator(s))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("unexpected character \"%c\"", *s),
+					 errdetail("The given value did not match any of the allowed "
+							   "values for this field.")));
 
 		from_char_set_mode(out, n->key->date_mode);
 
@@ -3207,6 +3294,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+		prev_type = n->type;
 	}
 }
 
@@ -3340,15 +3428,17 @@ datetime_to_char_body(TmToChar *tmtc, text *fmt, bool is_interval, Oid collid)
 
 		if ((ent = DCH_cache_search(fmt_str)) == NULL)
 		{
-			ent = DCH_cache_getnew(fmt_str);
+			FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, fmt_str, DCH_keywords,
+			parse_format(buf, fmt_str, DCH_keywords,
 						 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+			ent = DCH_cache_getnew(fmt_str);
+			memcpy(ent->format, buf, sizeof(buf));
 			(ent->format + fmt_len)->type = NODE_TYPE_END;		/* Paranoia? */
 
 #ifdef DEBUG_TO_FROM_CHAR
@@ -3604,15 +3694,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if ((ent = DCH_cache_search(fmt_str)) == NULL)
 			{
-				ent = DCH_cache_getnew(fmt_str);
+				FormatNode	buf[DCH_CACHE_SIZE + 1];
 
 				/*
 				 * Not in the cache, must run parser and save a new
 				 * format-picture to the cache.
 				 */
-				parse_format(ent->format, fmt_str, DCH_keywords,
+				parse_format(buf, fmt_str, DCH_keywords,
 							 DCH_suff, DCH_index, DCH_TYPE, NULL);
 
+				ent = DCH_cache_getnew(fmt_str);
+				memcpy(ent->format, buf, sizeof(buf));
 				(ent->format + fmt_len)->type = NODE_TYPE_END;	/* Paranoia? */
 #ifdef DEBUG_TO_FROM_CHAR
 				/* dump_node(ent->format, fmt_len); */
@@ -3981,32 +4073,41 @@ NUM_cache(int len, NUMDesc *Num, text *pars_str, bool *shouldFree)
 
 		if ((ent = NUM_cache_search(str)) == NULL)
 		{
-			ent = NUM_cache_getnew(str);
+			FormatNode	buf[NUM_CACHE_SIZE + 1];
+
+			zeroize_NUM(Num);
 
 			/*
 			 * Not in the cache, must run parser and save a new format-picture
 			 * to the cache.
 			 */
-			parse_format(ent->format, str, NUM_keywords,
-						 NULL, NUM_index, NUM_TYPE, &ent->Num);
+			parse_format(buf, str, NUM_keywords,
+						 NULL, NUM_index, NUM_TYPE, Num);
+
+			ent = NUM_cache_getnew(str);
+
+			memcpy(ent->format, buf, sizeof(buf));
+			memcpy(&ent->Num, Num, sizeof(NUMDesc));
 
 			(ent->format + len)->type = NODE_TYPE_END;	/* Paranoia? */
 		}
+		else
+		{
+			/*
+			 * Copy cache to used struct
+			 */
+			Num->flag = ent->Num.flag;
+			Num->lsign = ent->Num.lsign;
+			Num->pre = ent->Num.pre;
+			Num->post = ent->Num.post;
+			Num->pre_lsign_num = ent->Num.pre_lsign_num;
+			Num->need_locale = ent->Num.need_locale;
+			Num->multi = ent->Num.multi;
+			Num->zero_start = ent->Num.zero_start;
+			Num->zero_end = ent->Num.zero_end;
+		}
 
 		format = ent->format;
-
-		/*
-		 * Copy cache to used struct
-		 */
-		Num->flag = ent->Num.flag;
-		Num->lsign = ent->Num.lsign;
-		Num->pre = ent->Num.pre;
-		Num->post = ent->Num.post;
-		Num->pre_lsign_num = ent->Num.pre_lsign_num;
-		Num->need_locale = ent->Num.need_locale;
-		Num->multi = ent->Num.multi;
-		Num->zero_start = ent->Num.zero_start;
-		Num->zero_end = ent->Num.zero_end;
 	}
 
 #ifdef DEBUG_TO_FROM_CHAR
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 1fe02be..46d4c7b 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,12 +2807,18 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
 (1 row)
 
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
+         to_timestamp         
+------------------------------
+ Sun May 16 00:00:00 1976 PDT
+(1 row)
+
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
          to_timestamp         
 ------------------------------
@@ -2810,6 +2834,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2912,7 +2951,7 @@ SELECT to_timestamp('  20050302', 'YYYYMMDD');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2942,7 +2981,13 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -2960,13 +3005,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index c81437b..966cde1 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,23 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
+
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +416,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -452,6 +466,8 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

0002-to-timestamp-validation-v2.patchtext/x-patch; name=0002-to-timestamp-validation-v2.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 559c55f..e86dd0c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6100,13 +6100,9 @@ SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
        exist to handle input formats that cannot be converted by
-       simple casting.  These functions interpret input liberally,
-       with minimal error checking.  While they produce valid output,
-       the conversion can yield unexpected results.  For example,
-       input to these functions is not restricted by normal ranges,
-       thus <literal>to_date('20096040','YYYYMMDD')</literal> returns
-       <literal>2014-01-17</literal> rather than causing an error.
-       Casting does not have this behavior.
+       simple casting.  These functions have error checking as well as casting.
+       For example, <literal>to_date('20096040','YYYYMMDD')</literal> returns
+       an error.
       </para>
      </listitem>
 
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b14678d..98d9847 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -3635,7 +3635,10 @@ do_to_timestamp(text *date_txt, text *fmt,
 {
 	FormatNode *format;
 	TmFromChar	tmfc;
-	int			fmt_len;
+	int			fmt_len,
+				fmask = 0,		/* Bit mask for ValidateDate() */
+				dterr;
+	char	   *date_str = NULL;
 
 	ZERO_tmfc(&tmfc);
 	ZERO_tm(tm);
@@ -3646,7 +3649,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (fmt_len)
 	{
 		char	   *fmt_str;
-		char	   *date_str;
 		bool		incache;
 
 		fmt_str = text_to_cstring(fmt);
@@ -3704,7 +3706,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 		DCH_from_char(format, date_str, &tmfc);
 
-		pfree(date_str);
 		pfree(fmt_str);
 		if (!incache)
 			pfree(format);
@@ -3780,6 +3781,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			if (tmfc.bc && tm->tm_year > 0)
 				tm->tm_year = -(tm->tm_year - 1);
 		}
+		fmask |= DTK_M(YEAR);
 	}
 	else if (tmfc.cc)			/* use first year of century */
 	{
@@ -3791,10 +3793,14 @@ do_to_timestamp(text *date_txt, text *fmt,
 		else
 			/* +1 because year == 599 is 600 BC */
 			tm->tm_year = tmfc.cc * 100 + 1;
+		fmask |= DTK_M(YEAR);
 	}
 
 	if (tmfc.j)
+	{
 		j2date(tmfc.j, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+		fmask |= DTK_DATE_M;
+	}
 
 	if (tmfc.ww)
 	{
@@ -3808,6 +3814,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 				isoweekdate2date(tmfc.ww, tmfc.d, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
 			else
 				isoweek2date(tmfc.ww, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 			tmfc.ddd = (tmfc.ww - 1) * 7 + 1;
@@ -3818,11 +3825,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (tmfc.d)
 		tm->tm_wday = tmfc.d - 1;		/* convert to native numbering */
 	if (tmfc.dd)
+	{
 		tm->tm_mday = tmfc.dd;
+		fmask |= DTK_M(DAY);
+	}
 	if (tmfc.ddd)
 		tm->tm_yday = tmfc.ddd;
 	if (tmfc.mm)
+	{
 		tm->tm_mon = tmfc.mm;
+		fmask |= DTK_M(MONTH);
+	}
 
 	if (tmfc.ddd && (tm->tm_mon <= 1 || tm->tm_mday <= 1))
 	{
@@ -3845,6 +3858,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			j0 = isoweek2j(tm->tm_year, 1) - 1;
 
 			j2date(j0 + tmfc.ddd, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 		{
@@ -3867,9 +3881,36 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if (tm->tm_mday <= 1)
 				tm->tm_mday = tmfc.ddd - y[i - 1];
+
+			fmask |= DTK_M(MONTH) | DTK_M(DAY);
 		}
 	}
 
+	/* Validate date with bit mask received above */
+	if (fmask != 0 && date_str)
+	{
+		dterr = ValidateDate(fmask, false, false, false, tm);
+		if (dterr != 0)
+			DateTimeParseError(dterr, date_str, "");
+	}
+
+	/* Do checks for time part */
+#ifdef HAVE_INT64_TIMESTAMP
+	if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR - 1 ||
+		tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
+		*fsec < INT64CONST(0) ||
+		*fsec > USECS_PER_SEC)
+		DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "");
+#else
+	if (tm->tm_hour < 0 || tm->tm_min < 0 || tm->tm_min > MINS_PER_HOUR - 1 ||
+		tm->tm_sec < 0 || tm->tm_sec > SECS_PER_MINUTE ||
+		*fsec < 0 || *fsec > 1)
+		DateTimeParseError(DTERR_FIELD_OVERFLOW, date_str, "");
+#endif
+
+	if (date_str)
+		pfree(date_str);
+
 #ifdef HAVE_INT64_TIMESTAMP
 	if (tmfc.ms)
 		*fsec += tmfc.ms * 1000;
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 0bf4287..8ee6569 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3015,7 +3015,7 @@ SELECT to_date('2011   12 18', 'YYYY  MM DD');
 (1 row)
 
 --
--- Check errors for some incorrect usages of to_timestamp()
+-- Check errors for some incorrect usages of to_timestamp() and to_date()
 --
 -- Mixture of date conventions (ISO week and Gregorian):
 SELECT to_timestamp('2005527', 'YYYYIWID');
@@ -3043,6 +3043,12 @@ DETAIL:  Value must be an integer.
 SELECT to_timestamp('10000000000', 'FMYYYY');
 ERROR:  value for "YYYY" in source string is out of range
 DETAIL:  Value must be in the range -2147483648 to 2147483647.
+-- Time overflow:
+SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
+ERROR:  date/time field value out of range: "2016-06-13 99:99:99"
+-- Date overflow:
+SELECT to_date('2016-02-30', 'YYYY-MM-DD');
+ERROR:  date/time field value out of range: "2016-02-30"
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)
 --
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 514c124..327dbfd 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -473,7 +473,7 @@ SELECT to_date('2011  12 18', 'YYYY  MM DD');
 SELECT to_date('2011   12 18', 'YYYY  MM DD');
 
 --
--- Check errors for some incorrect usages of to_timestamp()
+-- Check errors for some incorrect usages of to_timestamp() and to_date()
 --
 
 -- Mixture of date conventions (ISO week and Gregorian):
@@ -494,6 +494,12 @@ SELECT to_timestamp('199711xy', 'YYYYMMDD');
 -- Input that doesn't fit in an int:
 SELECT to_timestamp('10000000000', 'FMYYYY');
 
+-- Time overflow:
+SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS');
+
+-- Date overflow:
+SELECT to_date('2016-02-30', 'YYYY-MM-DD');
+
 --
 -- Check behavior with SQL-style fixed-GMT-offset time zone (cf bug #8572)
 --

#53

amul sul

sul_amul@yahoo.co.in

over 9 years ago

In reply to: Artur Zakirov (#52)

Re: Bug in to_timestamp().

On Thursday, August 25, 2016 1:56 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:

#2. Warning at compilation;

formatting.c: In function ‘do_to_timestamp’:
formatting.c:3049:37: warning: ‘prev_type’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
^
formatting.c:2988:5: note: ‘prev_type’ was declared here
prev_type;
^

You can avoid this by assigning zero (or introduce NODE_TYPE_INVAL ) to prev_type at following line:

256 + prev_type;

You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to
execute such query:

SELECT to_timestamp('---2000----JUN', 'YYYY MON');

Will be it a proper behaviour?

Looks good to me, no one will complain if something working on PG but not on Oracle.

Thanks & Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#54

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: Artur Zakirov (#52)

Re: Bug in to_timestamp().

You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to
execute such query:

SELECT to_timestamp('---2000----JUN', 'YYYY MON');

Will be it a proper behaviour?

Looks good to me, no one will complain if something working on PG but not on Oracle.

Thanks. I've created the entry in
https://commitfest.postgresql.org/10/713/ . You can add yourself as a
reviewer.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#55

amul sul

sulamul@gmail.com

over 9 years ago

In reply to: Artur Zakirov (#54)

Re: Bug in to_timestamp().

On Thu, Aug 25, 2016 at 3:41 PM, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:

You are right. I assigned to prev_type NODE_TYPE_SPACE to be able to
execute such query:

SELECT to_timestamp('---2000----JUN', 'YYYY MON');

Will be it a proper behaviour?

Looks good to me, no one will complain if something working on PG but not
on Oracle.

Thanks. I've created the entry in https://commitfest.postgresql.org/10/713/
. You can add yourself as a reviewer.

Done, added myself as reviewer & changed status to "Ready for
Committer". Thanks !

Regards,
Amul Sul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#56

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: amul sul (#55)

Re: Bug in to_timestamp().

On 25.08.2016 13:26, amul sul wrote:

Thanks. I've created the entry in https://commitfest.postgresql.org/10/713/
. You can add yourself as a reviewer.

Done, added myself as reviewer & changed status to "Ready for
Committer". Thanks !

Regards,
Amul Sul

It seems there is no need to rebase patches. But I will not be able to
fix patches for two weeks because of travel if someone will have a note
or remark.

So it would be nice if someone will be able to fix possible issues for
above reasone.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#57

amul sul

sulamul@gmail.com

over 9 years ago

In reply to: Artur Zakirov (#56)

Re: Bug in to_timestamp().

On Fri, Sep 16, 2016 at 10:01 PM, Artur Zakirov
<a.zakirov@postgrespro.ru> wrote:

On 25.08.2016 13:26, amul sul wrote:

Thanks. I've created the entry in
https://commitfest.postgresql.org/10/713/
. You can add yourself as a reviewer.

Done, added myself as reviewer & changed status to "Ready for
Committer". Thanks !

Regards,
Amul Sul

It seems there is no need to rebase patches. But I will not be able to fix
patches for two weeks because of travel if someone will have a note or
remark.

So it would be nice if someone will be able to fix possible issues for above
reasone.

Sure, Ill do that, if required.

Regards,
Amul

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#58

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Artur Zakirov (#40)

Re: Bug in to_timestamp().

Artur Zakirov <a.zakirov@postgrespro.ru> writes:

- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.

I started looking at your 0001-to-timestamp-format-checking-v4.patch
and this point immediately jumped out at me. Currently the code relies
... without any documentation ... on no elog being thrown out of
parse_format(). That's at the very least trouble waiting to happen.
There's a hack to deal with errors from within the NUMDesc_prepare
subroutine, but it's a pretty ugly and underdocumented hack. And what
you had here was randomly different from that solution, too.

After a bit of thought it seemed to me that a much cleaner fix is to add
a "valid" flag to the cache entries, which we can leave clear until we
have finished parsing the new format string. That avoids adding extra
data copying as you suggested, removes the need for PG_TRY, and just
generally seems cleaner and more bulletproof.

I've pushed a patch that does it that way. The 0001 patch will need
to be rebased over that (might just require removal of some hunks,
not sure).

I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
(it'd broken acceptance of BC dates, among other things, but I think
I fixed everything).

Since you told us earlier that you'd be on vacation through the end of
September, I'm assuming that nothing more will happen on this patch during
this commitfest, so I will mark the CF entry Returned With Feedback.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#59

amul sul

sulamul@gmail.com

over 9 years ago

In reply to: Tom Lane (#58)

1 attachment(s)

Re: Bug in to_timestamp().

On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Artur Zakirov <a.zakirov@postgrespro.ru> writes:

- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.

I started looking at your 0001-to-timestamp-format-checking-v4.patch
and this point immediately jumped out at me. Currently the code relies
... without any documentation ... on no elog being thrown out of
parse_format(). That's at the very least trouble waiting to happen.
There's a hack to deal with errors from within the NUMDesc_prepare
subroutine, but it's a pretty ugly and underdocumented hack. And what
you had here was randomly different from that solution, too.

After a bit of thought it seemed to me that a much cleaner fix is to add
a "valid" flag to the cache entries, which we can leave clear until we
have finished parsing the new format string. That avoids adding extra
data copying as you suggested, removes the need for PG_TRY, and just
generally seems cleaner and more bulletproof.

I've pushed a patch that does it that way. The 0001 patch will need
to be rebased over that (might just require removal of some hunks,
not sure).

I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
(it'd broken acceptance of BC dates, among other things, but I think
I fixed everything).

Since you told us earlier that you'd be on vacation through the end of
September, I'm assuming that nothing more will happen on this patch during
this commitfest, so I will mark the CF entry Returned With Feedback.

Behalf of Artur I've rebased patch, removed hunk dealing with broken
cache entries by copying it, which is no longer required after 83bed06
commit.

Commitfest status left untouched, I am not sure how to deal with
"Returned With Feedback" patch. Is there any chance that, this can be
considered again for committer review?

Thanks !

Regards,
Amul

Attachments:

0001-to-timestamp-format-checking-v5.patchapplication/octet-stream; name=0001-to-timestamp-format-checking-v5.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a588350..96ed2c3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6157,9 +6157,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
+       skip multiple blank spaces and printable non letter and non digit
+       characters in the input string and in the formatting string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal>,
+       <literal>to_timestamp('2000----JUN', 'YYYY MON')</literal>
+       and <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index d4eaa50..7d50543 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -951,6 +953,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -967,7 +970,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1040,6 +1042,17 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(char *str)
+{
+	return ((pg_mblen(str) == 1) &&
+			/* printable character, but not letter and digit */
+			((*str >= '!' && *str <= '/') ||
+			 (*str >= ':' && *str <= '@') ||
+			 (*str >= '[' && *str <= '`') ||
+			 (*str >= '{' && *str <= '~')));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1225,9 +1238,10 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+				in_backslash = false;
 	int			node_set = 0,
-				suffix,
-				last = 0;
+				suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1239,6 +1253,55 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid escape sequence")));
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+				str++;
+				in_text = false;
+			}
+			else if (*str == '\\')
+			{
+				str++;
+				in_backslash = true;
+			}
+			else
+			{
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
+				n->character = *str;
+				n->key = NULL;
+				n->suffix = 0;
+				n++;
+				str++;
+			}
+			continue;
+		}
+
 		/*
 		 * Prefix
 		 */
@@ -1278,48 +1341,30 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 		}
 		else if (*str)
 		{
-			/*
-			 * Special characters '\' and '"'
-			 */
-			if (*str == '"' && last != '\\')
+			if (*str == '"')
 			{
-				int			x = 0;
-
-				while (*(++str))
-				{
-					if (*str == '"' && x != '\\')
-					{
-						str++;
-						break;
-					}
-					else if (*str == '\\' && x != '\\')
-					{
-						x = '\\';
-						continue;
-					}
-					n->type = NODE_TYPE_CHAR;
-					n->character = *str;
-					n->key = NULL;
-					n->suffix = 0;
-					++n;
-					x = *str;
-				}
+				in_text = true;
 				node_set = 0;
-				suffix = 0;
-				last = 0;
+				str++;
 			}
-			else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"')
+			else if (*str == '\\')
 			{
-				last = *str;
+				in_backslash = true;
+				node_set = 0;
 				str++;
 			}
-			else if (*str)
+			else
 			{
-				n->type = NODE_TYPE_CHAR;
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				n->character = *str;
 				n->key = NULL;
 				node_set = 1;
-				last = 0;
 				str++;
 			}
 		}
@@ -1336,6 +1381,17 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 		}
 	}
 
+	if (in_backslash)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("invalid escape sequence")));
+
+	/* If we didn't meet closing quotes */
+	if (in_text)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("unexpected end of format string, expected '\"' character")));
+
 	n->type = NODE_TYPE_END;
 	n->suffix = 0;
 }
@@ -2068,20 +2124,6 @@ adjust_partial_year_to_2020(int year)
 		return year;
 }
 
-
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2151,11 +2193,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2934,19 +2971,56 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	FormatNode *n;
 	char	   *s;
 	int			len,
-				value;
+				value,
+				prev_type = NODE_TYPE_SPACE;
 	bool		fx_mode = false;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we skip spaces and separator
+			 * characters.
 			 */
+			if (!fx_mode)
+			{
+				if (isspace(*s) || is_char_separator(s))
+					s++;
+				prev_type = n->type;
+				continue;
+			}
+
+			/* Checks for FX mode */
+			if (n->type == NODE_TYPE_SPACE && !isspace(*s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+			else if (n->type == NODE_TYPE_SEPARATOR && n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+
 			s++;
+			prev_type = n->type;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_CHAR)
+		{
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s++;
+			prev_type = n->type;
 			continue;
 		}
 
@@ -2955,7 +3029,20 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 		{
 			while (*s != '\0' && isspace((unsigned char) *s))
 				s++;
+			/*
+			 * Ignore separator characters if previous node was space or
+			 * separator and current is not.
+			 */
+			if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
+				while (*s != '\0' && is_char_separator(s))
+					s++;
 		}
+		else if (isspace(*s) || is_char_separator(s))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("unexpected character \"%c\"", *s),
+					 errdetail("The given value did not match any of the allowed "
+							   "values for this field.")));
 
 		from_char_set_mode(out, n->key->date_mode);
 
@@ -3194,6 +3281,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+		prev_type = n->type;
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index f9d12e0..8c71c72 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,12 +2807,18 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
 (1 row)
 
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
+         to_timestamp         
+------------------------------
+ Sun May 16 00:00:00 1976 PDT
+(1 row)
+
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
          to_timestamp         
 ------------------------------
@@ -2810,6 +2834,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2936,7 +2975,7 @@ SELECT to_timestamp('2011-12-18 11:38 PM', 'YYYY-MM-DD HH12:MI PM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2966,7 +3005,13 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -2984,13 +3029,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index a7bc9dc..6984f26 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,23 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
+
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +416,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -458,6 +472,8 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#60

Artur Zakirov

a.zakirov@postgrespro.ru

over 9 years ago

In reply to: amul sul (#59)

1 attachment(s)

Re: Bug in to_timestamp().

2016-09-29 13:54 GMT+05:00 amul sul <sulamul@gmail.com>:

On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I started looking at your 0001-to-timestamp-format-checking-v4.patch
and this point immediately jumped out at me. Currently the code relies
... without any documentation ... on no elog being thrown out of
parse_format(). That's at the very least trouble waiting to happen.
There's a hack to deal with errors from within the NUMDesc_prepare
subroutine, but it's a pretty ugly and underdocumented hack. And what
you had here was randomly different from that solution, too.

After a bit of thought it seemed to me that a much cleaner fix is to add
a "valid" flag to the cache entries, which we can leave clear until we
have finished parsing the new format string. That avoids adding extra
data copying as you suggested, removes the need for PG_TRY, and just
generally seems cleaner and more bulletproof.

I've pushed a patch that does it that way. The 0001 patch will need
to be rebased over that (might just require removal of some hunks,
not sure).

I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
(it'd broken acceptance of BC dates, among other things, but I think
I fixed everything).

Thank you for committing the 0002 part of the patch! I wanted to fix
cache functions too, but wasn't sure about that.

Since you told us earlier that you'd be on vacation through the end of
September, I'm assuming that nothing more will happen on this patch during
this commitfest, so I will mark the CF entry Returned With Feedback.

Behalf of Artur I've rebased patch, removed hunk dealing with broken
cache entries by copying it, which is no longer required after 83bed06
commit.

Commitfest status left untouched, I am not sure how to deal with
"Returned With Feedback" patch. Is there any chance that, this can be
considered again for committer review?

Thank you for fixing the patch!
Today I have access to the internet and able to fix and test the
patch. I've looked at your 0001-to-timestamp-format-checking-v5.patch.
It seems nice to me. I suppose it is necessary to fix
is_char_separator() declaration.

from:
static bool is_char_separator(char *str);

to:
static bool is_char_separator(const char *str);

Because now in parse_format() *str argument is const.
I attached new version of the patch, which fix is_char_separator()
declaration too.

Sorry for confusing!

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v6.patchapplication/octet-stream; name=0001-to-timestamp-format-checking-v6.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index a588350..96ed2c3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6157,9 +6157,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
+       skip multiple blank spaces and printable non letter and non digit
+       characters in the input string and in the formatting string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal>,
+       <literal>to_timestamp('2000----JUN', 'YYYY MON')</literal>
+       and <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index d4eaa50..7986228 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -951,6 +953,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_char_separator(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -967,7 +970,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1040,6 +1042,17 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_char_separator(const char *str)
+{
+	return ((pg_mblen(str) == 1) &&
+			/* printable character, but not letter and digit */
+			((*str >= '!' && *str <= '/') ||
+			 (*str >= ':' && *str <= '@') ||
+			 (*str >= '[' && *str <= '`') ||
+			 (*str >= '{' && *str <= '~')));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1225,9 +1238,10 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+				in_backslash = false;
 	int			node_set = 0,
-				suffix,
-				last = 0;
+				suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1239,6 +1253,55 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid escape sequence")));
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+				str++;
+				in_text = false;
+			}
+			else if (*str == '\\')
+			{
+				str++;
+				in_backslash = true;
+			}
+			else
+			{
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
+				n->character = *str;
+				n->key = NULL;
+				n->suffix = 0;
+				n++;
+				str++;
+			}
+			continue;
+		}
+
 		/*
 		 * Prefix
 		 */
@@ -1278,48 +1341,30 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 		}
 		else if (*str)
 		{
-			/*
-			 * Special characters '\' and '"'
-			 */
-			if (*str == '"' && last != '\\')
+			if (*str == '"')
 			{
-				int			x = 0;
-
-				while (*(++str))
-				{
-					if (*str == '"' && x != '\\')
-					{
-						str++;
-						break;
-					}
-					else if (*str == '\\' && x != '\\')
-					{
-						x = '\\';
-						continue;
-					}
-					n->type = NODE_TYPE_CHAR;
-					n->character = *str;
-					n->key = NULL;
-					n->suffix = 0;
-					++n;
-					x = *str;
-				}
+				in_text = true;
 				node_set = 0;
-				suffix = 0;
-				last = 0;
+				str++;
 			}
-			else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"')
+			else if (*str == '\\')
 			{
-				last = *str;
+				in_backslash = true;
+				node_set = 0;
 				str++;
 			}
-			else if (*str)
+			else
 			{
-				n->type = NODE_TYPE_CHAR;
+				if (ver == DCH_TYPE && is_char_separator(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace(*str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				n->character = *str;
 				n->key = NULL;
 				node_set = 1;
-				last = 0;
 				str++;
 			}
 		}
@@ -1336,6 +1381,17 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 		}
 	}
 
+	if (in_backslash)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("invalid escape sequence")));
+
+	/* If we didn't meet closing quotes */
+	if (in_text)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("unexpected end of format string, expected '\"' character")));
+
 	n->type = NODE_TYPE_END;
 	n->suffix = 0;
 }
@@ -2068,20 +2124,6 @@ adjust_partial_year_to_2020(int year)
 		return year;
 }
 
-
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2151,11 +2193,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2934,19 +2971,56 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	FormatNode *n;
 	char	   *s;
 	int			len,
-				value;
+				value,
+				prev_type = NODE_TYPE_SPACE;
 	bool		fx_mode = false;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we skip spaces and separator
+			 * characters.
 			 */
+			if (!fx_mode)
+			{
+				if (isspace(*s) || is_char_separator(s))
+					s++;
+				prev_type = n->type;
+				continue;
+			}
+
+			/* Checks for FX mode */
+			if (n->type == NODE_TYPE_SPACE && !isspace(*s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+			else if (n->type == NODE_TYPE_SEPARATOR && n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+
 			s++;
+			prev_type = n->type;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_CHAR)
+		{
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s++;
+			prev_type = n->type;
 			continue;
 		}
 
@@ -2955,7 +3029,20 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 		{
 			while (*s != '\0' && isspace((unsigned char) *s))
 				s++;
+			/*
+			 * Ignore separator characters if previous node was space or
+			 * separator and current is not.
+			 */
+			if (prev_type == NODE_TYPE_SPACE || prev_type == NODE_TYPE_SEPARATOR)
+				while (*s != '\0' && is_char_separator(s))
+					s++;
 		}
+		else if (isspace(*s) || is_char_separator(s))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("unexpected character \"%c\"", *s),
+					 errdetail("The given value did not match any of the allowed "
+							   "values for this field.")));
 
 		from_char_set_mode(out, n->key->date_mode);
 
@@ -3194,6 +3281,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+		prev_type = n->type;
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index f9d12e0..8c71c72 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,12 +2807,18 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
 (1 row)
 
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
+         to_timestamp         
+------------------------------
+ Sun May 16 00:00:00 1976 PDT
+(1 row)
+
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
          to_timestamp         
 ------------------------------
@@ -2810,6 +2834,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2936,7 +2975,7 @@ SELECT to_timestamp('2011-12-18 11:38 PM', 'YYYY-MM-DD HH12:MI PM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2966,7 +3005,13 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -2984,13 +3029,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index a7bc9dc..6984f26 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,23 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
+
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +416,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -458,6 +472,8 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#61

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: amul sul (#59)

Re: Bug in to_timestamp().

On Thu, Sep 29, 2016 at 4:54 AM, amul sul <sulamul@gmail.com> wrote:

On Thu, Sep 29, 2016 at 2:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Artur Zakirov <a.zakirov@postgrespro.ru> writes:

- now DCH_cache_getnew() is called after parse_format(). Because now
parse_format() can raise an error and in the next attempt
DCH_cache_search() could return broken cache entry.

I started looking at your 0001-to-timestamp-format-checking-v4.patch
and this point immediately jumped out at me. Currently the code relies
... without any documentation ... on no elog being thrown out of
parse_format(). That's at the very least trouble waiting to happen.
There's a hack to deal with errors from within the NUMDesc_prepare
subroutine, but it's a pretty ugly and underdocumented hack. And what
you had here was randomly different from that solution, too.

After a bit of thought it seemed to me that a much cleaner fix is to add
a "valid" flag to the cache entries, which we can leave clear until we
have finished parsing the new format string. That avoids adding extra
data copying as you suggested, removes the need for PG_TRY, and just
generally seems cleaner and more bulletproof.

I've pushed a patch that does it that way. The 0001 patch will need
to be rebased over that (might just require removal of some hunks,
not sure).

I also pushed 0002-to-timestamp-validation-v2.patch with some revisions
(it'd broken acceptance of BC dates, among other things, but I think
I fixed everything).

Since you told us earlier that you'd be on vacation through the end of
September, I'm assuming that nothing more will happen on this patch during
this commitfest, so I will mark the CF entry Returned With Feedback.

Behalf of Artur I've rebased patch, removed hunk dealing with broken
cache entries by copying it, which is no longer required after 83bed06
commit.

Commitfest status left untouched, I am not sure how to deal with
"Returned With Feedback" patch. Is there any chance that, this can be
considered again for committer review?

You may be able to set the status back to "Ready for Committer", or
else you can move the entry to the next CommitFest and do it there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#62

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Robert Haas (#61)

Re: Bug in to_timestamp().

Robert Haas <robertmhaas@gmail.com> writes:

On Thu, Sep 29, 2016 at 4:54 AM, amul sul <sulamul@gmail.com> wrote:

Commitfest status left untouched, I am not sure how to deal with
"Returned With Feedback" patch. Is there any chance that, this can be
considered again for committer review?

You may be able to set the status back to "Ready for Committer", or
else you can move the entry to the next CommitFest and do it there.

I already switched it back to "Needs Review". AFAIK none of the statuses
are immovable.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#63

Amul Sul

sulamul@gmail.com

over 9 years ago

In reply to: Tom Lane (#62)

Re: Bug in to_timestamp().

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Appreciate your support.
I've tested v6 patch again, looks good to me, code in v6 does not differ much from v4 patch. Ready for committer review. Thanks !

Regards,
Amul Sul

The new status of this patch is: Ready for Committer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#64

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Amul Sul (#63)

Re: Bug in to_timestamp().

On Fri, Sep 30, 2016 at 12:34 PM, Amul Sul <sulamul@gmail.com> wrote:

The new status of this patch is: Ready for Committer

And moved to next CF with same status.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#65

Tom Lane

tgl@sss.pgh.pa.us

about 9 years ago

In reply to: Artur Zakirov (#60)

Re: Bug in to_timestamp().

Artur Zakirov <a.zakirov@postgrespro.ru> writes:

I attached new version of the patch, which fix is_char_separator()
declaration too.

I did some experimenting using
http://rextester.com/l/oracle_online_compiler

It appears that Oracle will consider a single space in the pattern
to match zero or more spaces in the input, as all of these produce
the expected result:

SELECT to_timestamp('2000JUN', 'YYYY MON') FROM dual
SELECT to_timestamp('2000 JUN', 'YYYY MON') FROM dual
SELECT to_timestamp('2000 JUN', 'YYYY MON') FROM dual

Also, a space in the pattern will match a single separator character
in the input, but not multiple separators:

SELECT to_timestamp('2000-JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('2000--JUN', 'YYYY MON') FROM dual
ORA-01843: not a valid month

And you can have whitespace along with that single separator:

SELECT to_timestamp('2000 + JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('2000 + JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('2000 ++ JUN', 'YYYY MON') FROM dual
ORA-01843: not a valid month

You can have leading whitespace, but not leading separators:

SELECT to_timestamp(' 2000 JUN', 'YYYY MON') FROM dual
-- ok
SELECT to_timestamp('/2000 JUN', 'YYYY MON') FROM dual
ORA-01841: (full) year must be between -4713 and +9999, and not be 0

These all work:

SELECT to_timestamp('2000 JUN', 'YYYY/MON') FROM dual
SELECT to_timestamp('2000JUN', 'YYYY/MON') FROM dual
SELECT to_timestamp('2000/JUN', 'YYYY/MON') FROM dual
SELECT to_timestamp('2000-JUN', 'YYYY/MON') FROM dual

but not

SELECT to_timestamp('2000//JUN', 'YYYY/MON') FROM dual
ORA-01843: not a valid month
SELECT to_timestamp('2000--JUN', 'YYYY/MON') FROM dual
ORA-01843: not a valid month

which makes it look a lot like Oracle treats separator characters in the
pattern the same as spaces (but I haven't checked their documentation to
confirm that).

The proposed patch doesn't seem to me to be trying to follow
these Oracle behaviors, but I think there is very little reason for
changing any of this stuff unless we move it closer to Oracle.

Some other nitpicking:

* I think the is-separator function would be better coded like

static bool
is_separator_char(const char *str)
{
/* ASCII printable character, but not letter or digit */
return (*str > 0x20 && *str < 0x7F &&
!(*str >= 'A' && *str <= 'Z') &&
!(*str >= 'a' && *str <= 'z') &&
!(*str >= '0' && *str <= '9'));
}

The previous way is neither readable nor remarkably efficient, and it
knows much more about the ASCII character set than it needs to.

* Don't forget the cast to unsigned char when using isspace() or other
<ctype.h> functions.

* I do not see the reason for throwing an error here:

+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			/* After backslash should go non-space character */
+			if (isspace(*str))
+				ereport(ERROR,
+						(errcode(ERRCODE_SYNTAX_ERROR),
+						 errmsg("invalid escape sequence")));
+			in_backslash = false;

Why shouldn't backslash-space be a valid quoting sequence?

I'll set this back to Waiting on Author.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#66

Artur Zakirov

a.zakirov@postgrespro.ru

about 9 years ago

In reply to: Tom Lane (#65)

1 attachment(s)

Re: Bug in to_timestamp().

Thank you for your comments,

2016-11-04 20:36 GMT+02:00 Tom Lane <tgl@sss.pgh.pa.us>:

Artur Zakirov <a.zakirov@postgrespro.ru> writes:

I attached new version of the patch, which fix is_char_separator()
declaration too.

I did some experimenting using
http://rextester.com/l/oracle_online_compiler

which makes it look a lot like Oracle treats separator characters in the
pattern the same as spaces (but I haven't checked their documentation to
confirm that).

The proposed patch doesn't seem to me to be trying to follow
these Oracle behaviors, but I think there is very little reason for
changing any of this stuff unless we move it closer to Oracle.

Previous versions of the patch doesn't try to follow all Oracle
behaviors. It tries to fix Amul Sul's behaviors. Because I was
confused by dealing with spaces and separators by Oracle
to_timestamp() and there is not information about it in the Oracle
documentation:
https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm#g195443

I've thought better about it now and fixed the patch. Now parser
removes spaces after and before fields and insists that count of
separators in the input string should match count of spaces or
separators in the formatting string (but in formatting string we can
have more separators than in input string).

Some other nitpicking:

* I think the is-separator function would be better coded like

static bool
is_separator_char(const char *str)
{
/* ASCII printable character, but not letter or digit */
return (*str > 0x20 && *str < 0x7F &&
!(*str >= 'A' && *str <= 'Z') &&
!(*str >= 'a' && *str <= 'z') &&
!(*str >= '0' && *str <= '9'));
}

The previous way is neither readable nor remarkably efficient, and it
knows much more about the ASCII character set than it needs to.

Fixed.

* Don't forget the cast to unsigned char when using isspace() or other
<ctype.h> functions.

Fixed.

* I do not see the reason for throwing an error here:

+               /* Previous character was a backslash */
+               if (in_backslash)
+               {
+                       /* After backslash should go non-space character */
+                       if (isspace(*str))
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_SYNTAX_ERROR),
+                                                errmsg("invalid escape sequence")));
+                       in_backslash = false;

Why shouldn't backslash-space be a valid quoting sequence?

Hm, truly. Fixed it.

I've attached the fixed patch.

--
Sincerely,
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v7.patchtext/x-patch; charset=US-ASCII; name=0001-to-timestamp-format-checking-v7.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2e64cc4..5a4e248 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6159,7 +6159,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6169,6 +6170,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 
      <listitem>
       <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple printable non letter and non digit characters in the input
+       string, but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
        Ordinary text is allowed in <function>to_char</function>
        templates and will be output literally.  You can put a substring
        in double quotes to force it to be interpreted as literal text
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index d4eaa50..d28ceec 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -169,6 +169,8 @@ struct FormatNode
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -951,6 +953,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -967,7 +970,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1040,6 +1042,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1225,9 +1237,10 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 {
 	const KeySuffix *s;
 	FormatNode *n;
+	bool		in_text = false,
+				in_backslash = false;
 	int			node_set = 0,
-				suffix,
-				last = 0;
+				suffix;
 
 #ifdef DEBUG_TO_FROM_CHAR
 	elog(DEBUG_elog_output, "to_char/number(): run parser");
@@ -1239,6 +1252,50 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 	{
 		suffix = 0;
 
+		/* Previous character was a backslash */
+		if (in_backslash)
+		{
+			in_backslash = false;
+
+			n->type = NODE_TYPE_CHAR;
+			n->character = *str;
+			n->key = NULL;
+			n->suffix = 0;
+			n++;
+			str++;
+			continue;
+		}
+		/* Previous character was a quote */
+		else if (in_text)
+		{
+			if (*str == '"')
+			{
+				str++;
+				in_text = false;
+			}
+			else if (*str == '\\')
+			{
+				str++;
+				in_backslash = true;
+			}
+			else
+			{
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
+				n->character = *str;
+				n->key = NULL;
+				n->suffix = 0;
+				n++;
+				str++;
+			}
+			continue;
+		}
+
 		/*
 		 * Prefix
 		 */
@@ -1278,48 +1335,30 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 		}
 		else if (*str)
 		{
-			/*
-			 * Special characters '\' and '"'
-			 */
-			if (*str == '"' && last != '\\')
+			if (*str == '"')
 			{
-				int			x = 0;
-
-				while (*(++str))
-				{
-					if (*str == '"' && x != '\\')
-					{
-						str++;
-						break;
-					}
-					else if (*str == '\\' && x != '\\')
-					{
-						x = '\\';
-						continue;
-					}
-					n->type = NODE_TYPE_CHAR;
-					n->character = *str;
-					n->key = NULL;
-					n->suffix = 0;
-					++n;
-					x = *str;
-				}
+				in_text = true;
 				node_set = 0;
-				suffix = 0;
-				last = 0;
+				str++;
 			}
-			else if (*str && *str == '\\' && last != '\\' && *(str + 1) == '"')
+			else if (*str == '\\')
 			{
-				last = *str;
+				in_backslash = true;
+				node_set = 0;
 				str++;
 			}
-			else if (*str)
+			else
 			{
-				n->type = NODE_TYPE_CHAR;
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				n->character = *str;
 				n->key = NULL;
 				node_set = 1;
-				last = 0;
 				str++;
 			}
 		}
@@ -1336,6 +1375,17 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 		}
 	}
 
+	if (in_backslash)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("invalid escape sequence")));
+
+	/* If we didn't meet closing quotes */
+	if (in_text)
+		ereport(ERROR,
+				(errcode(ERRCODE_SYNTAX_ERROR),
+				 errmsg("unexpected end of format string, expected '\"' character")));
+
 	n->type = NODE_TYPE_END;
 	n->suffix = 0;
 }
@@ -2068,20 +2118,6 @@ adjust_partial_year_to_2020(int year)
 		return year;
 }
 
-
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2151,11 +2187,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2934,19 +2965,62 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	FormatNode *n;
 	char	   *s;
 	int			len,
-				value;
+				value,
+				prev_type = NODE_TYPE_SPACE;
 	bool		fx_mode = false;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		/* Ignore all spaces after previous field */
+		if (prev_type == NODE_TYPE_ACTION)
+			while (*s != '\0' && isspace((unsigned char) *s))
+				s++;
+
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
+		{
+			/*
+			 * In non FX (fixed format) mode we skip spaces and separator
+			 * characters.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+					s++;
+
+				prev_type = n->type;
+				continue;
+			}
+
+			/* Checks for FX mode */
+			if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+			else if (n->type == NODE_TYPE_SEPARATOR && n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\", expected \"%c\"",
+								*s, n->character),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
+
+			s++;
+			prev_type = n->type;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_CHAR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s++;
+			prev_type = n->type;
 			continue;
 		}
 
@@ -2955,7 +3029,25 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 		{
 			while (*s != '\0' && isspace((unsigned char) *s))
 				s++;
+
+			/*
+			 * Lost separator character from the input string. Separator
+			 * character should match at least one space or separator character
+			 * from the format string.
+			 */
+			if (is_separator_char(s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\"", *s),
+						 errdetail("The given value did not match any of the allowed "
+								   "values for this field.")));
 		}
+		else if (isspace((unsigned char) *s) || is_separator_char(s))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("unexpected character \"%c\"", *s),
+					 errdetail("The given value did not match any of the allowed "
+							   "values for this field.")));
 
 		from_char_set_mode(out, n->key->date_mode);
 
@@ -3194,6 +3286,7 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+		prev_type = n->type;
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index f9d12e0..96773fa 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,12 +2807,18 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
 (1 row)
 
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
+         to_timestamp         
+------------------------------
+ Sun May 16 00:00:00 1976 PDT
+(1 row)
+
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
          to_timestamp         
 ------------------------------
@@ -2808,8 +2832,23 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 (1 row)
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
-ERROR:  invalid value "/Fe" for "Mon"
+ERROR:  unexpected character "/"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2936,7 +2975,7 @@ SELECT to_timestamp('2011-12-18 11:38 PM', 'YYYY-MM-DD HH12:MI PM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2966,9 +3005,30 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+ERROR:  unexpected character "-"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 +   JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + + JUN', 'YYYY MON');
+ERROR:  unexpected character "+"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000--JUN', 'YYYY/MON');
+ERROR:  unexpected character "-"
+DETAIL:  The given value did not match any of the allowed values for this field.
 SELECT to_date('2011 12  18', 'YYYY MM DD');
   to_date   
 ------------
@@ -2984,13 +3044,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index a7bc9dc..2563980 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,23 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT TO_TIMESTAMP('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
+
+SELECT to_timestamp('Year: 1976, Month: May, Day: 16', '"    Year:" YYYY, "Month:" FMMonth, "Day:"   DD');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +416,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -458,6 +472,12 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+SELECT to_timestamp('2000 +   JUN', 'YYYY MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+SELECT to_timestamp('2000--JUN', 'YYYY/MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#67

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Artur Zakirov (#66)

Re: Bug in to_timestamp().

On Mon, Nov 7, 2016 at 2:26 AM, Artur Zakirov <a.zakirov@postgrespro.ru>
wrote:

Hm, truly. Fixed it.

I've attached the fixed patch.

Moved to next CF with "needs review" status.

Regards,
Hari Babu
Fujitsu Australia

#68

Michael Paquier

michael.paquier@gmail.com

almost 9 years ago

In reply to: Haribabu Kommi (#67)

Re: Bug in to_timestamp().

On Mon, Dec 5, 2016 at 11:35 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:

Moved to next CF with "needs review" status.

Same, this time to CF 2017-03.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#69

Amul Sul

sulamul@gmail.com

almost 9 years ago

In reply to: Michael Paquier (#68)

Re: Bug in to_timestamp().

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Review of v7 patch:
- Patch applies to the top of master HEAD cleanly & make check-world also fine.
- Tom Lane's review comments are fixed.

The new status of this patch is: Ready for Committer

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#70

Artur Zakirov

a.zakirov@postgrespro.ru

almost 9 years ago

In reply to: Amul Sul (#69)

Re: Bug in to_timestamp().

On 15.02.2017 15:26, Amul Sul wrote:

The following review has been posted through the commitfest application:
make installcheck-world: not tested
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Review of v7 patch:
- Patch applies to the top of master HEAD cleanly & make check-world also fine.
- Tom Lane's review comments are fixed.

The new status of this patch is: Ready for Committer

Thank you for your review!

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#71

David Steele

david@pgmasters.net

almost 9 years ago

In reply to: Artur Zakirov (#70)

Re: Bug in to_timestamp().

On 2/27/17 4:19 AM, Artur Zakirov wrote:

On 15.02.2017 15:26, Amul Sul wrote:

The new status of this patch is: Ready for Committer

Thank you for your review!

This submission has been moved to CF 2017-07.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#72

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Artur Zakirov (#66)

Re: [HACKERS] Bug in to_timestamp().

Artur Zakirov <a.zakirov@postgrespro.ru> writes:

[ 0001-to-timestamp-format-checking-v7.patch ]

This patch needs a rebase over the formatting.c fixes that have gone
in over the last couple of days.

Looking at the rejects, I notice that in your changes to parse_format(),
you seem to be making it rely even more heavily on remembering state about
the previous input. I recommend against that --- it was broken before,
and it's a pretty fragile approach. Backslashes are not that hard to
deal with in-line.

The larger picture though is that I'm not real sure that this patch is
going in the direction we want. All of these cases work in both our
code and Oracle:

select to_timestamp('97/Feb/16', 'YY:Mon:DD')
select to_timestamp('97/Feb/16', 'YY Mon DD')
select to_timestamp('97 Feb 16', 'YY/Mon/DD')

(Well, Oracle thinks these mean 2097 where we think 1997, but the point is
you don't get an error.) I see from your regression test additions that
you want to make some of these throw an error, and I think that any such
proposal is dead in the water. Nobody is going to consider it an
improvement if it both breaks working PG apps and disagrees with Oracle.

regards, tom lane

#73

Arthur Zakirov

a.zakirov@postgrespro.ru

about 8 years ago

In reply to: Tom Lane (#72)

Re: [HACKERS] Bug in to_timestamp().

On Sat, Nov 18, 2017 at 05:48:26PM -0500, Tom Lane wrote:

This patch needs a rebase over the formatting.c fixes that have gone
in over the last couple of days.

Looking at the rejects, I notice that in your changes to parse_format(),
you seem to be making it rely even more heavily on remembering state about
the previous input. I recommend against that --- it was broken before,
and it's a pretty fragile approach. Backslashes are not that hard to
deal with in-line.

I can continue to work on a better approach. Though, the patch was made a long time
ago I'll refresh my memory. The main intent was to fix the following
behaviour:

=# SELECT TO_TIMESTAMP('2016-06-13 15:43:36', 'YYYY/MM/DD HH24:MI:SS');
to_timestamp
------------------------
2016-06-13 05:43:36-07 <— incorrect time

select to_timestamp('97/Feb/16', 'YY:Mon:DD')
select to_timestamp('97/Feb/16', 'YY Mon DD')
select to_timestamp('97 Feb 16', 'YY/Mon/DD')

(Well, Oracle thinks these mean 2097 where we think 1997, but the point is
you don't get an error.) I see from your regression test additions that
you want to make some of these throw an error, and I think that any such
proposal is dead in the water. Nobody is going to consider it an
improvement if it both breaks working PG apps and disagrees with Oracle.

regards, tom lane

If I understand correctly, these queries don't throw an error and the patch tried to follow Oracle's rules.
Only queries with FX field throw an error. For example:

=# SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
ERROR: unexpected character "/", expected ":"

From Oracle's documentation [1]:

FX - Requires exact matching between the character data and the format model.

I agree that compatibility breaking is not good and a fu
ure patch may only try to fix wrong output date and time as in Amul's first email.

1 - https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements004.htm#i34924

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#74

Tom Lane

tgl@sss.pgh.pa.us

about 8 years ago

In reply to: Arthur Zakirov (#73)

Re: [HACKERS] Bug in to_timestamp().

Arthur Zakirov <a.zakirov@postgrespro.ru> writes:

On Sat, Nov 18, 2017 at 05:48:26PM -0500, Tom Lane wrote:

... I see from your regression test additions that
you want to make some of these throw an error, and I think that any such
proposal is dead in the water. Nobody is going to consider it an
improvement if it both breaks working PG apps and disagrees with Oracle.

Only queries with FX field throw an error.

Ah, I see --- I'd missed that FX was relevant to this. Yeah, maybe
it'd be okay to tighten up in that case, since that's making it act more
like Oracle, which seems to be generally agreed to be a good thing.

I don't much like the error message that you've got:

+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.

The DETAIL message seems to have been copied-and-pasted into a context
where it's not terribly accurate. I'd consider replacing it with
something along the lines of "HINT: In FX mode, punctuation in the input
string must exactly match the format string." This way the report will
contain a pretty clear statement of the new rule that was broken. (BTW,
it's a hint not a detail because it might be guessing wrong as to what
the real problem is.)

regards, tom lane

#75

Arthur Zakirov

a.zakirov@postgrespro.ru

about 8 years ago

In reply to: Tom Lane (#74)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

I've attached new version of the patch. It is a little bit simpler now than the previous one.
The patch doesn't handle backslashes now, since there was a commit which fixes quoted-substring handling recently.
Anyway I'm not sure that this handling was necessary.

I've checked queries from this thread. It seems that they give right timestamps and work like in Oracle (except different messages).

The patch contains documentation and regression test fixes.

On Sun, Nov 19, 2017 at 12:26:39PM -0500, Tom Lane wrote:

I don't much like the error message that you've got:
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+DETAIL:  The given value did not match any of the allowed values for this field.
The DETAIL message seems to have been copied-and-pasted into a context
where it's not terribly accurate. I'd consider replacing it with
something along the lines of "HINT: In FX mode, punctuation in the input
string must exactly match the format string." This way the report will
contain a pretty clear statement of the new rule that was broken. (BTW,
it's a hint not a detail because it might be guessing wrong as to what
the real problem is.)

regards, tom lane

Messages have the following format now:

SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
ERROR: unexpected character "/", expected ":"
HINT: In FX mode, punctuation in the input string must exactly match the format string.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v8.patchtext/x-diff; charset=us-asciiDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 698daf69ea..1c8541d552 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6166,7 +6166,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6174,6 +6175,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple printable non letter and non digit characters in the input
+       string, but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index ec97de0ad2..81565aab96 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -953,6 +955,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -969,7 +972,6 @@ static void dump_node(FormatNode *node, int max);
 static const char *get_th(char *num, int type);
 static char *str_numth(char *dest, char *num, int type);
 static int	adjust_partial_year_to_2020(int year);
-static int	strspace_len(char *str);
 static void from_char_set_mode(TmFromChar *tmfc, const FromCharDateMode mode);
 static void from_char_set_int(int *dest, const int value, const FormatNode *node);
 static int	from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node);
@@ -1042,6 +1044,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1317,7 +1329,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2115,19 +2134,6 @@ adjust_partial_year_to_2020(int year)
 }
 
 
-static int
-strspace_len(char *str)
-{
-	int			len = 0;
-
-	while (*str && isspace((unsigned char) *str))
-	{
-		str++;
-		len++;
-	}
-	return len;
-}
-
 /*
  * Set the date mode of a from-char conversion.
  *
@@ -2197,11 +2203,6 @@ from_char_parse_int_len(int *dest, char **src, const int len, FormatNode *node)
 	char	   *init = *src;
 	int			used;
 
-	/*
-	 * Skip any whitespace before parsing the integer.
-	 */
-	*src += strspace_len(*src);
-
 	Assert(len <= DCH_MAX_ITEM_SIZ);
 	used = (int) strlcpy(copy, *src, len + 1);
 
@@ -2975,17 +2976,60 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
+		{
+			/*
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+					s++;
+
+				continue;
+			}
+
+			/*
+			 * In FX mode we insist that whitespace from the format string
+			 * matches whitespace from the input string.
+			 */
+			if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+			/*
+			 * In FX mode we insist that separator character from the format
+			 * string matches separator character from the input string.
+			 */
+			else if (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+
+			s++;
+			continue;
+		}
+		else if (n->type == NODE_TYPE_CHAR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
 		}
 
+		/* n->type == NODE_TYPE_ACTION */
+
 		/* Ignore spaces before fields when not in FX (fixed width) mode */
 		if (!fx_mode && n->key->id != DCH_FX)
 		{
@@ -2993,6 +3037,31 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				s++;
 		}
 
+		if (!fx_mode)
+		{
+			/*
+			 * Lost separator character from the input string. Separator
+			 * character should match at least one space or separator character
+			 * from the format string.
+			 */
+			if (is_separator_char(s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%c\"", *s),
+						 errhint("Separator character from the input string "
+								 "must match at least one space or separator "
+								 "character from the format string.")));
+		}
+		/*
+		 * In FX mode we don't expect a whitespace or a character.
+		 */
+		else if (isspace((unsigned char) *s) || is_separator_char(s))
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+					 errmsg("unexpected character \"%c\"", *s),
+					 errhint("In FX mode, punctuation in the input string "
+							 "must exactly match the format string.")));
+
 		from_char_set_mode(out, n->key->date_mode);
 
 		switch (n->key->id)
@@ -3232,6 +3301,13 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+
+		/* Ignore all spaces after fields */
+		if (!fx_mode)
+		{
+			while (*s != '\0' && isspace((unsigned char) *s))
+				s++;
+		}
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 7b3d058425..54a42ee42e 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2808,8 +2826,23 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 (1 row)
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
-ERROR:  invalid value "/Fe" for "Mon"
-DETAIL:  The given value did not match any of the allowed values for this field.
+ERROR:  unexpected character "/"
+HINT:  Separator character from the input string must match at least one space or separator character from the format string.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2936,7 +2969,7 @@ SELECT to_timestamp('2011-12-18 11:38 PM', 'YYYY-MM-DD HH12:MI PM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2966,9 +2999,33 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+ERROR:  unexpected character "-"
+HINT:  Separator character from the input string must match at least one space or separator character from the format string.
+SELECT to_timestamp('2000 +   JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + + JUN', 'YYYY MON');
+ERROR:  unexpected character "+"
+HINT:  Separator character from the input string must match at least one space or separator character from the format string.
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000--JUN', 'YYYY/MON');
+ERROR:  unexpected character "-"
+HINT:  Separator character from the input string must match at least one space or separator character from the format string.
+SELECT to_timestamp('2000    JUN', 'FXYYYY MON');
+ERROR:  unexpected character " "
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
 SELECT to_date('2011 12  18', 'YYYY MM DD');
   to_date   
 ------------
@@ -2984,13 +3041,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index a7bc9dcfc4..1dbaf8ee46 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -458,6 +470,13 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000----JUN', 'YYYY MON');
+SELECT to_timestamp('2000 +   JUN', 'YYYY MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+SELECT to_timestamp('2000--JUN', 'YYYY/MON');
+SELECT to_timestamp('2000    JUN', 'FXYYYY MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#76

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Arthur Zakirov (#75)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Nov 23, 2017 at 12:23 AM, Arthur Zakirov
<a.zakirov@postgrespro.ru> wrote:

Messages have the following format now:

SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
ERROR: unexpected character "/", expected ":"
HINT: In FX mode, punctuation in the input string must exactly match the format string.

Moved to next CF as this is fresh and got no reviews.
--
Michael

#77

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Michael Paquier (#76)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Nov 29, 2017 at 10:50 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Thu, Nov 23, 2017 at 12:23 AM, Arthur Zakirov
<a.zakirov@postgrespro.ru> wrote:

Messages have the following format now:

SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
ERROR: unexpected character "/", expected ":"
HINT: In FX mode, punctuation in the input string must exactly match the format string.

Moved to next CF as this is fresh and got no reviews.

So is it now the case that all of the regression test cases in this
patch produce the same results as they would on Oracle?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#78

Arthur Zakirov

a.zakirov@postgrespro.ru

about 8 years ago

In reply to: Robert Haas (#77)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Nov 30, 2017 at 10:39:56AM -0500, Robert Haas wrote:

So is it now the case that all of the regression test cases in this
patch produce the same results as they would on Oracle?

Yes, exactly. All new cases give the same result as in Oracle, except
text of error messages.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#79

Thomas Munro

thomas.munro@enterprisedb.com

about 8 years ago

In reply to: Arthur Zakirov (#75)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Nov 23, 2017 at 4:23 AM, Arthur Zakirov
<a.zakirov@postgrespro.ru> wrote:

I've attached new version of the patch. It is a little bit simpler now than the previous one.
The patch doesn't handle backslashes now, since there was a commit which fixes quoted-substring handling recently.
Anyway I'm not sure that this handling was necessary.

I've checked queries from this thread. It seems that they give right timestamps and work like in Oracle (except different messages).

The patch contains documentation and regression test fixes.

I'm guessing that commit 11b623dd0a2c385719ebbbdd42dd4ec395dcdc9d had
something to do with the following failure, when your patch is
applied:

horology ... FAILED

========= Contents of ./src/test/regress/regression.diffs
*** /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/expected/horology.out
2018-01-11 17:11:49.744128698 +0000
--- /home/travis/build/postgresql-cfbot/postgresql/src/test/regress/results/horology.out
2018-01-11 17:18:53.988815652 +0000
***************
*** 2972,2978 ****
  SELECT to_timestamp('2011-12-18 11:38 -05',    'YYYY-MM-DD HH12:MI TZH');
           to_timestamp
  ------------------------------
!  Sun Dec 18 08:38:00 2011 PST
  (1 row)

  SELECT to_timestamp('2011-12-18 11:38 +05:20', 'YYYY-MM-DD HH12:MI TZH:TZM');
--- 2972,2978 ----
  SELECT to_timestamp('2011-12-18 11:38 -05',    'YYYY-MM-DD HH12:MI TZH');
           to_timestamp
  ------------------------------
!  Sat Dec 17 22:38:00 2011 PST
  (1 row)

SELECT to_timestamp('2011-12-18 11:38 +05:20', 'YYYY-MM-DD HH12:MI TZH:TZM');
***************
*** 2984,2990 ****
SELECT to_timestamp('2011-12-18 11:38 -05:20', 'YYYY-MM-DD HH12:MI TZH:TZM');
to_timestamp
------------------------------
! Sun Dec 18 08:58:00 2011 PST
(1 row)

  SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
--- 2984,2990 ----
  SELECT to_timestamp('2011-12-18 11:38 -05:20', 'YYYY-MM-DD HH12:MI TZH:TZM');
           to_timestamp
  ------------------------------
!  Sat Dec 17 22:18:00 2011 PST
  (1 row)

SELECT to_timestamp('2011-12-18 11:38 20', 'YYYY-MM-DD HH12:MI TZM');
======================================================================

--
Thomas Munro
http://www.enterprisedb.com

#80

Arthur Zakirov

a.zakirov@postgrespro.ru

almost 8 years ago

In reply to: Thomas Munro (#79)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

Hello,

On Fri, Jan 12, 2018 at 02:48:40PM +1300, Thomas Munro wrote:

I'm guessing that commit 11b623dd0a2c385719ebbbdd42dd4ec395dcdc9d had
something to do with the following failure, when your patch is
applied:

horology ... FAILED

Thank you a lot for pointing on that.

It seems to me that it happens because the patch eats minus sign "-" before "05". And it is wrong to do that.

I attached new version of the patch. Now (expected output):

=# SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZH');
to_timestamp
------------------------
2011-12-18 20:38:00+04

But these queries may confuse:

=# SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI TZH');
to_timestamp
------------------------
2011-12-18 10:38:00+04

=# SELECT to_timestamp('2011-12-18 11:38 -05', 'YYYY-MM-DD HH12:MI -TZH');
to_timestamp
------------------------
2011-12-18 10:38:00+04

And these queries don't work anymore using new version of the patch:

=# SELECT to_timestamp('2000 + JUN', 'YYYY MON');
ERROR: invalid value "+ J" for "MON"
DETAIL: The given value did not match any of the allowed values for this field.

=# SELECT to_timestamp('2000 + JUN', 'YYYY MON');
ERROR: invalid value "+ " for "MON"
DETAIL: The given value did not match any of the allowed values for this field.

Other queries mentioned in the thread work as before.

Any thoughts? If someone has better approach, feel free to share it.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v9.patchtext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 2428434030..545129334c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple printable non letter and non digit characters in the input
+       string, but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..dddfc4b1bf 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -961,6 +963,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1050,6 +1053,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1325,7 +1338,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2996,12 +3016,53 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
+		{
+			/*
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+					s++;
+
+				continue;
+			}
+
+			/*
+			 * In FX mode we insist that whitespace from the format string
+			 * matches whitespace from the input string.
+			 */
+			if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+			/*
+			 * In FX mode we insist that separator character from the format
+			 * string matches separator character from the input string.
+			 */
+			else if (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+
+			s++;
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e6..3ebfa5111d 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,7 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3047,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cf..e3305fc836 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');

#81

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Arthur Zakirov (#80)

Re: [HACKERS] Bug in to_timestamp().

On 12 January 2018 at 13:58, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

I attached new version of the patch. Now (expected output):

Thanks for working on that! I haven't followed this thread before, and after a
quick review I have few side questions.

Why not write `is_separator_char` using `isprint`, `isalpha`, `isdigit` from
ctype.h? Something like:

return isprint(*str) && !isalpha(*str) && !isdigit(*str)

From what I see in the source code they do exactly the same and tests are
successfully passing with this change.

What do you think about providing two slightly different messages for these two
situations:

if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
ereport(ERROR,
(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
errmsg("unexpected character \"%.*s\", expected \"%s\"",
pg_mblen(s), s, n->character),
errhint("In FX mode, punctuation in the input string "
"must exactly match the format string.")));
/*
* In FX mode we insist that separator character from the format
* string matches separator character from the input string.
*/
else if (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)
ereport(ERROR,
(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
errmsg("unexpected character \"%.*s\", expected \"%s\"",
pg_mblen(s), s, n->character),
errhint("In FX mode, punctuation in the input string "
"must exactly match the format string.")));

E.g. "unexpected space character" and "unexpected separator character". The
difference is quite subtle, but I think a bit of context would never hurt.

#82

Arthur Zakirov

a.zakirov@postgrespro.ru

almost 8 years ago

In reply to: Dmitry Dolgov (#81)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Jan 31, 2018 at 05:53:29PM +0100, Dmitry Dolgov wrote:

Thanks for working on that! I haven't followed this thread before, and after a
quick review I have few side questions.

Thank you for your comments!

Why not write `is_separator_char` using `isprint`, `isalpha`, `isdigit` from
ctype.h? Something like:

return isprint(*str) && !isalpha(*str) && !isdigit(*str)

From what I see in the source code they do exactly the same and tests are
successfully passing with this change.

Fixed. The patch uses those functions now. I made is_separator_char() as a
IS_SEPARATOR_CHAR() macro.

What do you think about providing two slightly different messages for these two
situations:

if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
ereport(ERROR,
(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
errmsg("unexpected character \"%.*s\", expected \"%s\"",
pg_mblen(s), s, n->character),
errhint("In FX mode, punctuation in the input string "
"must exactly match the format string.")));
/*
* In FX mode we insist that separator character from the format
* string matches separator character from the input string.
*/
else if (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)
ereport(ERROR,
(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
errmsg("unexpected character \"%.*s\", expected \"%s\"",
pg_mblen(s), s, n->character),
errhint("In FX mode, punctuation in the input string "
"must exactly match the format string.")));

E.g. "unexpected space character" and "unexpected separator character". The
difference is quite subtle, but I think a bit of context would never hurt.

I fixed those messages, but in a different manner. I think that an
unexpected character is unknown and neither space nor separator. And
better to say that was expected space/separator character.

Attached fixed patch.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v10.patchtext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 487c7ff750..053d153d35 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple printable non letter and non digit characters in the input
+       string, but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..35007bec5c 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -542,6 +544,10 @@ static const KeySuffix DCH_suff[] = {
 	{NULL, 0, 0, 0}
 };
 
+#define IS_SEPARATOR_CHAR(s) ( \
+	isprint(*(unsigned char*)(s)) && \
+	!isalpha(*(unsigned char*)(s)) && \
+	!isdigit(*(unsigned char*)(s)))
 
 /* ----------
  * Format-pictures (KeyWord).
@@ -1325,7 +1331,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && IS_SEPARATOR_CHAR(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2996,12 +3009,53 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
+		{
+			/*
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace((unsigned char) *s) || IS_SEPARATOR_CHAR(s))
+					s++;
+
+				continue;
+			}
+
+			/*
+			 * In FX mode we insist that whitespace from the format string
+			 * matches whitespace from the input string.
+			 */
+			if (n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected space character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+			/*
+			 * In FX mode we insist that separator character from the format
+			 * string matches separator character from the input string.
+			 */
+			else if (n->type == NODE_TYPE_SEPARATOR && *n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected separator character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+
+			s++;
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e6..6e50cfdbfa 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected separator character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,7 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3047,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cf..e3305fc836 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');

#83

Robert Haas

robertmhaas@gmail.com

almost 8 years ago

In reply to: Dmitry Dolgov (#81)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Jan 31, 2018 at 11:53 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

Why not write `is_separator_char` using `isprint`, `isalpha`, `isdigit` from
ctype.h? Something like:

return isprint(*str) && !isalpha(*str) && !isdigit(*str)

From what I see in the source code they do exactly the same and tests are
successfully passing with this change.

I'm not quite sure whether it's relevant here, but I think some of the
ctype.h functions have locale-dependent behavior. By implementing our
own test we make sure that we don't accidentally inherit any such
behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#84

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Robert Haas (#83)

Re: [HACKERS] Bug in to_timestamp().

On 1 February 2018 at 11:24, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

I fixed those messages, but in a different manner. I think that an
unexpected character is unknown and neither space nor separator. And
better to say that was expected space/separator character.

Sounds good, thanks.

Attached fixed patch.

For some reason I can't apply it clean to the latest master:

(Stripping trailing CRs from patch; use --binary to disable.)
patching file doc/src/sgml/func.sgml
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/utils/adt/formatting.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/test/regress/expected/horology.out
Hunk #1 FAILED at 2769.
Hunk #2 FAILED at 2789.
Hunk #3 succeeded at 2810 with fuzz 2.
Hunk #4 succeeded at 2981 with fuzz 2.
Hunk #5 succeeded at 3011 with fuzz 2.
Hunk #6 FAILED at 3029.
3 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/horology.out.rej
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/test/regress/sql/horology.sql

On 2 February 2018 at 16:40, Robert Haas <robertmhaas@gmail.com> wrote:

I'm not quite sure whether it's relevant here, but I think some of the
ctype.h functions have locale-dependent behavior. By implementing our
own test we make sure that we don't accidentally inherit any such
behavior.

Yes, you're right, `isalpha` is actually locale dependent function.

In some locales, there may be additional characters for which isalpha() is
true—letters which are neither uppercase nor lowercase.

So, if I understand the patch correctly, with `isalpha` the function
`is_separator_char` will return false for some locale-specific characters, and
without - those characters will be treated as separators. Is it desire
behavior?

#85

Arthur Zakirov

a.zakirov@postgrespro.ru

almost 8 years ago

In reply to: Dmitry Dolgov (#84)

Re: [HACKERS] Bug in to_timestamp().

On Fri, Feb 02, 2018 at 09:54:45PM +0100, Dmitry Dolgov wrote:

For some reason I can't apply it clean to the latest master:

(Stripping trailing CRs from patch; use --binary to disable.)
patching file doc/src/sgml/func.sgml
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/backend/utils/adt/formatting.c
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/test/regress/expected/horology.out
Hunk #1 FAILED at 2769.
Hunk #2 FAILED at 2789.
Hunk #3 succeeded at 2810 with fuzz 2.
Hunk #4 succeeded at 2981 with fuzz 2.
Hunk #5 succeeded at 3011 with fuzz 2.
Hunk #6 FAILED at 3029.
3 out of 6 hunks FAILED -- saving rejects to file
src/test/regress/expected/horology.out.rej
(Stripping trailing CRs from patch; use --binary to disable.)
patching file src/test/regress/sql/horology.sql

It is strange. I still can apply both v9 [1] and v10 [2] via 'git
apply'. And Patch Tester [3] says that it is applied. But maybe
it is because of my git (git version 2.16.1).

You can try also 'patch -p1':

$ patch -p1 < 0001-to-timestamp-format-checking-v10.patch

1 - /messages/by-id/20180112125848.GA32559@zakirov.localdomain
2 - /messages/by-id/20180201102449.GA29082@zakirov.localdomain
3 - http://commitfest.cputube.org/

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#86

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Arthur Zakirov (#85)

Re: [HACKERS] Bug in to_timestamp().

On 6 February 2018 at 10:17, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
It is strange. I still can apply both v9 [1] and v10 [2] via 'git
apply'. And Patch Tester [3] says that it is applied. But maybe
it is because of my git (git version 2.16.1).

You can try also 'patch -p1':

Yes, looks like the problem is on my side, sorry.

#87

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Dmitry Dolgov (#86)

Re: [HACKERS] Bug in to_timestamp().

On 7 February 2018 at 22:51, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On 6 February 2018 at 10:17, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
It is strange. I still can apply both v9 [1] and v10 [2] via 'git
apply'. And Patch Tester [3] says that it is applied. But maybe
it is because of my git (git version 2.16.1).

You can try also 'patch -p1':

Yes, looks like the problem is on my side, sorry.

I went through this thread, and want to summarize a bit:

From what I see this patch addresses most important concerns that were
mentioned in the thread, i.e. to make `to_timestamp` less confusing and be
close to Oracles behavior. The code itself looks clear and sufficient, with the
required documentation and green tests.

Looks like there are just two questions left so far:

* I've noticed what I think a difference between current that was introduced in
this patch and Oracle. In the following case we can have any number of spaces
after a separator `+` in the input string

SELECT to_timestamp('2000+JUN', 'YYYY/MON');
to_timestamp
------------------------
2000-06-01 00:00:00+02
(1 row)

SELECT to_timestamp('2000+ JUN', 'YYYY/MON');
to_timestamp
------------------------
2000-06-01 00:00:00+02
(1 row)

But no spaces before it (it actually depends on how many separators do we have
in the format string)

SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
ERROR: 22007: invalid value "+JU" for "MON"
DETAIL: The given value did not match any of the allowed values
for this field.
LOCATION: from_char_seq_search, formatting.c:2410

SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
to_timestamp
------------------------
2000-06-01 00:00:00+02
(1 row)

SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
ERROR: 22007: invalid value "+JU" for "MON"
DETAIL: The given value did not match any of the allowed values
for this field.
LOCATION: from_char_seq_search, formatting.c:2410

Judging from this http://rextester.com/l/oracle_online_compiler in Oracle it's
possible to have any number of spaces before or after `+` independently from
the number of separators in an input string. Is it intended?

SELECT to_timestamp('2000 + JUN', 'YYYY/MON') FROM dual
01.06.2000 00:00:00

* About usage of locale dependent functions e.g. `isalpha`. Yes, looks like
it's better to have locale-agnostic implementation, but then I'm confused - all
functions except `isdigit`/`isxdigit` are locale-dependent, including
`isspace`, which is also in use.

#88

Arthur Zakirov

a.zakirov@postgrespro.ru

almost 8 years ago

In reply to: Dmitry Dolgov (#87)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

Hello,

On Fri, Feb 09, 2018 at 04:31:14PM +0100, Dmitry Dolgov wrote:

I went through this thread, and want to summarize a bit:

From what I see this patch addresses most important concerns that were
mentioned in the thread, i.e. to make `to_timestamp` less confusing and be
close to Oracles behavior. The code itself looks clear and sufficient, with the
required documentation and green tests.

Thank you for reviewing!

Looks like there are just two questions left so far:

* I've noticed what I think a difference between current that was introduced in
this patch and Oracle. In the following case we can have any number of spaces
after a separator `+` in the input string
...
But no spaces before it (it actually depends on how many separators do we have
in the format string)
...
Judging from this http://rextester.com/l/oracle_online_compiler in Oracle it's
possible to have any number of spaces before or after `+` independently from
the number of separators in an input string. Is it intended?

Yes, I somehow missed it. I changed the behaviour of separator
characters in format string. They are greedy now and eat whitespaces
before a separator character in an input string. I suppose that there
should be no such problems as in [1].

* About usage of locale dependent functions e.g. `isalpha`. Yes, looks like
it's better to have locale-agnostic implementation, but then I'm confused - all
functions except `isdigit`/`isxdigit` are locale-dependent, including
`isspace`, which is also in use.

I returned is_separator_char() function for now.

1 - /messages/by-id/20180112125848.GA32559@zakirov.localdomain

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v11.patchtext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 640ff09a7b..a34781378e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple printable non letter and non digit characters in the input
+       string, but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..470cd4e187 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -542,7 +544,6 @@ static const KeySuffix DCH_suff[] = {
 	{NULL, 0, 0, 0}
 };
 
-
 /* ----------
  * Format-pictures (KeyWord).
  *
@@ -961,6 +962,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1050,6 +1052,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1325,7 +1337,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2996,17 +3015,80 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+					s++;
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that whitespace from the format string
+			 * matches whitespace from the input string.
+			 */
+			else if (!isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected space character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+		}
+		else if (n->type == NODE_TYPE_SEPARATOR)
+		{
+			if (!fx_mode)
+			{
+				/*
+				 * Ignore spaces before separator character when not in FX
+				 * mode.
+				 */
+				while (*s != '\0' && isspace((unsigned char) *s))
+					s++;
+
+				/*
+				 * In non FX mode we don't insist that the consumed character
+				 * matches the format's character.
+				 */
+				if (is_separator_char(s))
+					s++;
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that separator character from the format
+			 * string matches separator character from the input string.
+			 */
+			else if (*n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected separator character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+
+			s++;
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
+		{
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
 		}
 
+		/* n->type == NODE_TYPE_ACTION */
+
 		/* Ignore spaces before fields when not in FX (fixed width) mode */
 		if (!fx_mode && n->key->id != DCH_FX)
 		{
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e6..fb595c7b17 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected separator character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,31 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3071,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cf..e9accdf30c 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,11 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#89

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Arthur Zakirov (#88)

Re: [HACKERS] Bug in to_timestamp().

On 12 February 2018 at 12:49, Arthur Zakirov <a.zakirov@postgrespro.ru>

wrote:

Yes, I somehow missed it. I changed the behaviour of separator
characters in format string. They are greedy now and eat whitespaces
before a separator character in an input string. I suppose that there
should be no such problems as in [1].

Thanks. I have a few minor complains about some noise:

* On line 52 there is a removed empty line, without any other changes around

* On line 177 there is a new commented out line of code. I assume it's not
an
explanation or something and we don't need it, am I right?

Also, I spotted one more difference between this patch and Oracle. In the
situation, when a format string doesn't have anything meaningful, with the
patch we've got:

SELECT to_timestamp('2000 + JUN', '/');
to_timestamp
---------------------------------
0001-01-01 00:00:00+00:53:28 BC
(1 row)

SELECT to_timestamp('2000 + JUN', ' ');
to_timestamp
---------------------------------
0001-01-01 00:00:00+00:53:28 BC
(1 row)

And Oracle complains about this:

SELECT to_timestamp('2000 + JUN', ' /') FROM dual
ORA-01830: date format picture ends before converting entire input string

SELECT to_timestamp('2000 + JUN', ' ') FROM dual
ORA-01830: date format picture ends before converting entire input string

It's sort of corner case, but anyway maybe you would be interested to handle
it.

About usage of locale dependent functions e.g. `isalpha`. Yes, looks like
it's better to have locale-agnostic implementation, but then I'm

confused - all

functions except `isdigit`/`isxdigit` are locale-dependent, including
`isspace`, which is also in use.

I returned is_separator_char() function for now.

Thanks. Answering my own question about `isspace`, I finally noticed, that
this
function was used before the patch, and there were no complains - so
probably
it's fine.

#90

Arthur Zakirov

a.zakirov@postgrespro.ru

almost 8 years ago

In reply to: Dmitry Dolgov (#89)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Feb 14, 2018 at 05:23:53PM +0100, Dmitry Dolgov wrote:

* On line 52 there is a removed empty line, without any other changes around

Agree, it is unnecessary change. There was the macro there before.

* On line 177 there is a new commented out line of code. I assume it's not
an
explanation or something and we don't need it, am I right?

It explains a node type we deal with. But maybe it is not very useful,
so I removed it.

And Oracle complains about this:

SELECT to_timestamp('2000 + JUN', ' /') FROM dual
ORA-01830: date format picture ends before converting entire input string

SELECT to_timestamp('2000 + JUN', ' ') FROM dual
ORA-01830: date format picture ends before converting entire input string

It's sort of corner case, but anyway maybe you would be interested to handle
it.

I think it could be fixed by another patch. But I'm not sure that it
will be accepted as well as this patch :).

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v12.patchtext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1e535cf215..1cc8b078f7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6174,7 +6174,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6182,6 +6183,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple printable non letter and non digit characters in the input
+       string, but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index b8bd4caa3e..a23122d3de 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -171,6 +171,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -961,6 +963,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1050,6 +1053,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1325,7 +1338,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2996,12 +3016,73 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE)
+		{
+			/*
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+					s++;
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that whitespace from the format string
+			 * matches whitespace from the input string.
+			 */
+			else if (!isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected space character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+		}
+		else if (n->type == NODE_TYPE_SEPARATOR)
+		{
+			if (!fx_mode)
+			{
+				/*
+				 * Ignore spaces before separator character when not in FX
+				 * mode.
+				 */
+				while (*s != '\0' && isspace((unsigned char) *s))
+					s++;
+
+				/*
+				 * In non FX mode we don't insist that the consumed character
+				 * matches the format's character.
+				 */
+				if (is_separator_char(s))
+					s++;
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that separator character from the format
+			 * string matches separator character from the input string.
+			 */
+			else if (*n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected separator character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+
+			s++;
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e6..fb595c7b17 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected separator character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,31 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3071,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cf..e9accdf30c 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,11 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#91

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Arthur Zakirov (#90)

Re: [HACKERS] Bug in to_timestamp().

On 17 February 2018 at 10:02, Arthur Zakirov <a.zakirov@postgrespro.ru>

wrote:

I think it could be fixed by another patch. But I'm not sure that it
will be accepted as well as this patch :).

Why do you think there should be another patch for it?

#92

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Dmitry Dolgov (#91)

Re: [HACKERS] Bug in to_timestamp().

On 18 February 2018 at 18:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On 17 February 2018 at 10:02, Arthur Zakirov <a.zakirov@postgrespro.ru>

wrote:

On Wed, Feb 14, 2018 at 05:23:53PM +0100, Dmitry Dolgov wrote:

SELECT to_timestamp('2000 + JUN', ' /') FROM dual
ORA-01830: date format picture ends before converting entire input

string

SELECT to_timestamp('2000 + JUN', ' ') FROM dual
ORA-01830: date format picture ends before converting entire input

string

It's sort of corner case, but anyway maybe you would be interested to

handle

it.

I think it could be fixed by another patch. But I'm not sure that it
will be accepted as well as this patch :).

Why do you think there should be another patch for it?

Just to make myself clear. I think it's fair enough to mark this patch as
ready
for committer - the implementation is neat and sufficient as for me, and it
provides a visible improvement for user experience. The question above is
probably the only thing that prevents me from proposing this.

#93

Arthur Zakirov

a.zakirov@postgrespro.ru

almost 8 years ago

In reply to: Dmitry Dolgov (#92)

Re: [HACKERS] Bug in to_timestamp().

On Fri, Mar 02, 2018 at 12:58:57AM +0100, Dmitry Dolgov wrote:

On 18 February 2018 at 18:49, Dmitry Dolgov <9erthalion6@gmail.com> wrote:

On 17 February 2018 at 10:02, Arthur Zakirov <a.zakirov@postgrespro.ru>

wrote:

...
I think it could be fixed by another patch. But I'm not sure that it
will be accepted as well as this patch :).

Why do you think there should be another patch for it?

Just to make myself clear. I think it's fair enough to mark this patch as
ready
for committer - the implementation is neat and sufficient as for me, and it
provides a visible improvement for user experience. The question above is
probably the only thing that prevents me from proposing this.

The fix you proposed isn't related with the initial report [1] by Amul
Sul, IMHO. The patch tries to fix that behaviour and additionally tries
to be more smart in handling and matching separator characters within
input and format strings. And unfortunately, it partly breaks backward
compatibility.

Your propose could break backward compatibility too, I think. And in
different manner than the patch I think. And that's why it needs another
patch I think and needs an additional decision about breaking backward
compatibility. All this is IMHO.

1 - /messages/by-id/1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#94

Dmitry Dolgov

9erthalion6@gmail.com

almost 8 years ago

In reply to: Arthur Zakirov (#93)

Re: [HACKERS] Bug in to_timestamp().

On 2 March 2018 at 10:33, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

The fix you proposed isn't related with the initial report [1] by Amul
Sul, IMHO. The patch tries to fix that behaviour and additionally tries
to be more smart in handling and matching separator characters within
input and format strings. And unfortunately, it partly breaks backward
compatibility.

Your propose could break backward compatibility too, I think. And in
different manner than the patch I think. And that's why it needs another
patch I think and needs an additional decision about breaking backward
compatibility. All this is IMHO.

1 - /messages/by-id/1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com

Well, I'm not sure that it's completely unrelated, but I see your point - it
makes sense to discuss this and move forward in small steps. So, if there are
no objections I'll mark this patch as ready.

#95

Arthur Zakirov

a.zakirov@postgrespro.ru

almost 8 years ago

In reply to: Dmitry Dolgov (#94)

Re: [HACKERS] Bug in to_timestamp().

On Fri, Mar 02, 2018 at 01:44:53PM +0100, Dmitry Dolgov wrote:

Well, I'm not sure that it's completely unrelated, but I see your point - it
makes sense to discuss this and move forward in small steps. So, if there are
no objections I'll mark this patch as ready.

Thank you!

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#96

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#95)

Re: [HACKERS] Bug in to_timestamp().

On Fri, Mar 2, 2018 at 3:52 PM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

On Fri, Mar 02, 2018 at 01:44:53PM +0100, Dmitry Dolgov wrote:

Well, I'm not sure that it's completely unrelated, but I see your point - it
makes sense to discuss this and move forward in small steps. So, if there are
no objections I'll mark this patch as ready.

Thank you!

I tool a look at this patch. It looks good for me. It applies
cleanly on last master, builds without warnings, passes tests.
Functionality seems to be well-covered by documentation and regression
tests. So, I'm going to commit it if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#97

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Alexander Korotkov (#96)

Re: [HACKERS] Bug in to_timestamp().

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:

I tool a look at this patch. It looks good for me. It applies
cleanly on last master, builds without warnings, passes tests.
Functionality seems to be well-covered by documentation and regression
tests. So, I'm going to commit it if no objections.

AFAIR, the argument was mainly over whether we agreed with the proposed
behavioral changes. It seems a bit premature to me to commit given that
there's not consensus on that.

regards, tom lane

#98

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Tom Lane (#97)

Re: [HACKERS] Bug in to_timestamp().

On Sat, Jul 7, 2018 at 12:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:

I tool a look at this patch. It looks good for me. It applies
cleanly on last master, builds without warnings, passes tests.
Functionality seems to be well-covered by documentation and regression
tests. So, I'm going to commit it if no objections.

AFAIR, the argument was mainly over whether we agreed with the proposed
behavioral changes. It seems a bit premature to me to commit given that
there's not consensus on that.

Ok. I've briefly looked to the thread, and I thought there is
consensus already. Given your concern, I'll investigate this thread
in detail and come up with summary.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#99

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#98)

Re: [HACKERS] Bug in to_timestamp().

On Sun, Jul 8, 2018 at 12:15 AM Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

On Sat, Jul 7, 2018 at 12:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:

I tool a look at this patch. It looks good for me. It applies
cleanly on last master, builds without warnings, passes tests.
Functionality seems to be well-covered by documentation and regression
tests. So, I'm going to commit it if no objections.

AFAIR, the argument was mainly over whether we agreed with the proposed
behavioral changes. It seems a bit premature to me to commit given that
there's not consensus on that.

Ok. I've briefly looked to the thread, and I thought there is
consensus already. Given your concern, I'll investigate this thread
in detail and come up with summary.

So, I've studies this thread in more details. Let me share my short
summary.

This thread was started from complaint about confusing behavior of
to_timestamp() function in some cases. Basically confusing behavior is
related to two issues: interpretation of spaces and separators in both
input string and format string, tolerance to invalid dates and times
(i.e. 2009-06-40 becomes 2009-07-10). Fix for the second issue was already
committed by Tom Lane (d3cd36a1). So, the improvement under consideration
is interpretation of spaces and separators.

to_timestamp()/to_date() functions are not part of SQL standard. So,
unfortunately we can't use standard as a guideline and have to search for
other criteria. On the one hand to_timestamp()/to_date() is here for quite
long time and there might be existing applications relying on its
behavior. Changing the behavior of these functions might appear to be a
pain for users. On the other hand these functions were introduced
basically for Oracle compatibility. So it would be nice to keep their
behavior as close to Oracle as possible. On the third hand, if current
behavior of these functions is agreed to be confusing, and new behavior is
agreed to be less confusing and more close to Oracle, then we can accept
it. If even it would discourage small fraction of users, who are relying
on the behavior which we assume to be confusing, way larger fraction of
users would be encouraged by this change.

So, as I get from this thread, current patch brings these function very
close to Oracle behavior. The only divergence found yet is related to
handling of unmatched tails of input strings (Oracle throws an error while
we swallow that silently) [1]. But this issue can be be addressed by a
separate patch.

Patch seems to be in a pretty good shape. So, the question is whether
there is a consensus that we want a backward-incompatible behavior change,
which this patch does.

My personal opinion is +1 for committing this, because I see that this
patch takes a lot of community efforts on discussion, coding, review etc.
All these efforts give a substantial result in a patch, which makes
behavior more Oracle-compatible and (IMHO) more clear.

However, in this thread other opinions were expressed. In
particular, David G. Johnston expressed opinion that we shouldn't change
behavior of existing functions, alternatively we could introduce new
functions with new behavior. However, I see David doesn't participate this
thread for a quite long time.

Thus, in the current situation I can propose following. Let's establish
some period when people can express objections against committing this
patch (reasonable short period, say 1 week). If no objections were
expressed then commit. Otherwise, discuss the objections expressed.

1.
/messages/by-id/CA+q6zcUS3fQGLoeLcuJLtz-gRD6vHqpn1fBe0cnORdx93QtO4w@mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#100

David G. Johnston

david.g.johnston@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#99)

Re: [HACKERS] Bug in to_timestamp().

On Sat, Jul 21, 2018 at 2:34 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

So, as I get from this thread, current patch brings these function very
close to Oracle behavior. The only divergence found yet is related to
handling of unmatched tails of input strings (Oracle throws an error while
we swallow that silently) [1]. But this issue can be be addressed by a
separate patch.

Patch seems to be in a pretty good shape. So, the question is whether
there is a consensus that we want a backward-incompatible behavior change,
which this patch does.

My personal opinion is +1 for committing this, because I see that this
patch takes a lot of community efforts on discussion, coding, review etc.
All these efforts give a substantial result in a patch, which makes
behavior more Oracle-compatible and (IMHO) more clear.

However, in this thread other opinions were expressed. In
particular, David G. Johnston expressed opinion that we shouldn't change
behavior of existing functions, alternatively we could introduce new
functions with new behavior. However, I see David doesn't participate this
thread for a quite long time.

Been lurking about...I'm still of the opinion that this capability should
exist in PostgreSQL as "our" function and not just as an Oracle
compatibility.

That said the thing I'm most curious to read is the release note entry that
would go along with this change that informs users what will be changing:
silently, visibly, or otherwise. Knowing how much we (don't) diverge from
our chosen "standard" after making this change is important but the change
in behavior from current is, IMO, more important in deciding whether this
particular change is worth making.

My re-read of the thread the other day left me with a feeling of
contentment that this was an acceptable change but I also get the feeling
like I'm missing the downside trade-off too...I was hoping your review
would help in that regard but as it did not speak to specific
incompatibilities it has not.

David J.

#101

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: David G. Johnston (#100)

Re: [HACKERS] Bug in to_timestamp().

On Sun, Jul 22, 2018 at 6:22 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Sat, Jul 21, 2018 at 2:34 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

So, as I get from this thread, current patch brings these function very close to Oracle behavior. The only divergence found yet is related to handling of unmatched tails of input strings (Oracle throws an error while we swallow that silently) [1]. But this issue can be be addressed by a separate patch.

Patch seems to be in a pretty good shape. So, the question is whether there is a consensus that we want a backward-incompatible behavior change, which this patch does.

My personal opinion is +1 for committing this, because I see that this patch takes a lot of community efforts on discussion, coding, review etc. All these efforts give a substantial result in a patch, which makes behavior more Oracle-compatible and (IMHO) more clear.

However, in this thread other opinions were expressed. In particular, David G. Johnston expressed opinion that we shouldn't change behavior of existing functions, alternatively we could introduce new functions with new behavior. However, I see David doesn't participate this thread for a quite long time.

Been lurking about...I'm still of the opinion that this capability should exist in PostgreSQL as "our" function and not just as an Oracle compatibility.

For sure, we're not going to copy Oracle behavior "bug to bug". But
when users find our behavior to be confusing, there is nothing wrong
to look for Oracle behavior and accept it if it looks good.

That said the thing I'm most curious to read is the release note entry that would go along with this change that informs users what will be changing: silently, visibly, or otherwise.

I'm sure that release note entry should directly inform users about
backward-incompatible changes in to_timestamp()/to_date() functions.
Users need to be advised to test their applications before porting
them to new major release of PostgreSQL.

Knowing how much we (don't) diverge from our chosen "standard" after making this change is important but the change in behavior from current is, IMO, more important in deciding whether this particular change is worth making.
My re-read of the thread the other day left me with a feeling of contentment that this was an acceptable change but I also get the feeling like I'm missing the downside trade-off too...I was hoping your review would help in that regard but as it did not speak to specific incompatibilities it has not.

So, if I understand you correctly, downside of trade-off are use-cases
when current behavior looks correct, while patched behavior looks
incorrect. Yes, while looking at this thread we can't find such
use-cases. Intuitively, they should be present. But I thought about
that and didn't find it yet...

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#102

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: David G. Johnston (#100)

Re: [HACKERS] Bug in to_timestamp().

On Sun, Jul 22, 2018 at 08:22:23AM -0700, David G. Johnston wrote:

My re-read of the thread the other day left me with a feeling of
contentment that this was an acceptable change but I also get the feeling
like I'm missing the downside trade-off too...I was hoping your review
would help in that regard but as it did not speak to specific
incompatibilities it has not.

I like more behaviour of the function with the patch. It gives less
unexpected results. For example, the query mentioned above:

SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD HH24:MI:SS')

I looked for some tradeoffs of the patch. I think it could be parsing
strings like the following input strings:

SELECT TO_TIMESTAMP('2011年5月1日', 'yyyy-MM-DD');
SELECT TO_TIMESTAMP('2011y5m1d', 'yyyy-MM-DD');

HEAD extracts year, month and day from the string. But patched
to_timestamp() raises an error. Someone could rely on such behaviour.
The patch divides separator characters from letters and digits. And
'年' or 'y' are letters here. And so the format string doesn't match the
input string.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#103

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#102)

Re: [HACKERS] Bug in to_timestamp().

On Mon, Jul 23, 2018 at 04:30:43PM +0300, Arthur Zakirov wrote:

I looked for some tradeoffs of the patch. I think it could be parsing
strings like the following input strings:

SELECT TO_TIMESTAMP('2011年5月1日', 'yyyy-MM-DD');
SELECT TO_TIMESTAMP('2011y5m1d', 'yyyy-MM-DD');

HEAD extracts year, month and day from the string. But patched
to_timestamp() raises an error. Someone could rely on such behaviour.
The patch divides separator characters from letters and digits. And
'年' or 'y' are letters here. And so the format string doesn't match the
input string.

Sorry, I forgot to mention that the patch can handle this by using
different format string. You can execute:

=# SELECT TO_TIMESTAMP('2011年5月1日', 'yyyy年MM月DD日');
to_timestamp
------------------------
2011-05-01 00:00:00+04

=# SELECT TO_TIMESTAMP('2011y5m1d', 'yyyy"y"MM"m"DD"d"');
to_timestamp
------------------------
2011-05-01 00:00:00+04

or:

=# SELECT TO_TIMESTAMP('2011y5m1d', 'yyyytMMtDDt');
to_timestamp
------------------------
2011-05-01 00:00:00+04

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#104

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#103)

Re: [HACKERS] Bug in to_timestamp().

On Mon, Jul 23, 2018 at 5:12 PM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

On Mon, Jul 23, 2018 at 04:30:43PM +0300, Arthur Zakirov wrote:

I looked for some tradeoffs of the patch. I think it could be parsing
strings like the following input strings:

SELECT TO_TIMESTAMP('2011年5月1日', 'yyyy-MM-DD');
SELECT TO_TIMESTAMP('2011y5m1d', 'yyyy-MM-DD');

HEAD extracts year, month and day from the string. But patched
to_timestamp() raises an error. Someone could rely on such behaviour.
The patch divides separator characters from letters and digits. And
'年' or 'y' are letters here. And so the format string doesn't match the
input string.

Sorry, I forgot to mention that the patch can handle this by using
different format string. You can execute:

=# SELECT TO_TIMESTAMP('2011年5月1日', 'yyyy年MM月DD日');
to_timestamp
------------------------
2011-05-01 00:00:00+04

=# SELECT TO_TIMESTAMP('2011y5m1d', 'yyyy"y"MM"m"DD"d"');
to_timestamp
------------------------
2011-05-01 00:00:00+04

or:

=# SELECT TO_TIMESTAMP('2011y5m1d', 'yyyytMMtDDt');
to_timestamp
------------------------
2011-05-01 00:00:00+04

Thank you, Arthur. These examples shows downside of this patch, where
users may be faced with incompatibility. But it's good that this
situation can be handled by altering format string. I think these
examples should be added to the documentation and highlighted in
release notes.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#105

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#104)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

Hello, Alexander,

On Mon, Jul 23, 2018 at 05:21:43PM +0300, Alexander Korotkov wrote:

Thank you, Arthur. These examples shows downside of this patch, where
users may be faced with incompatibility. But it's good that this
situation can be handled by altering format string. I think these
examples should be added to the documentation and highlighted in
release notes.

I updated the documentation. I added a tip text which explains
how to_timestamp() and to_date() handled ordinary text prior to
PostgreSQL 11 and how they should handle ordinary text now.

There is now changes in the code and tests.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v13.patchtext/plain; charset=us-asciiDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be92a6..db1da7bee7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6264,7 +6264,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
        <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6272,6 +6273,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple non letter and non digit characters in the input string,
+       but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
@@ -6287,6 +6301,20 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        string; for example <literal>"XX"</literal> skips two input characters
        (whether or not they are <literal>XX</literal>).
       </para>
+      <tip>
+        <para>
+          Prior to <productname>PostgreSQL</productname> 11 it was possible to
+          skip input ordinary text using non letter and non digit characters.
+          For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> worked
+          before.  But currently to skip ordinary text in the input string it
+          is necessary to use only letter characters.  For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> or
+          <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal> will
+          skip <literal>y</literal>, <literal>m</literal> and
+          <literal>d</literal> now.
+        </para>
+      </tip>
      </listitem>
 
      <listitem>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 30696e3575..e7547091cd 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2990,12 +3010,73 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		if (n->type == NODE_TYPE_SPACE)
+		{
+			/*
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
+			 */
+			if (!fx_mode)
+			{
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+					s++;
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that whitespace from the format string
+			 * matches whitespace from the input string.
+			 */
+			else if (!isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected space character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+		}
+		else if (n->type == NODE_TYPE_SEPARATOR)
+		{
+			if (!fx_mode)
+			{
+				/*
+				 * Ignore spaces before separator character when not in FX
+				 * mode.
+				 */
+				while (*s != '\0' && isspace((unsigned char) *s))
+					s++;
+
+				/*
+				 * In non FX mode we don't insist that the consumed character
+				 * matches the format's character.
+				 */
+				if (is_separator_char(s))
+					s++;
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that separator character from the format
+			 * string matches separator character from the input string.
+			 */
+			else if (*n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected separator character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
+
+			s++;
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e6..fb595c7b17 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected separator character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,31 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3071,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cf..e9accdf30c 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,11 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#106

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#105)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Aug 1, 2018 at 3:17 PM Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:

On Mon, Jul 23, 2018 at 05:21:43PM +0300, Alexander Korotkov wrote:

Thank you, Arthur. These examples shows downside of this patch, where
users may be faced with incompatibility. But it's good that this
situation can be handled by altering format string. I think these
examples should be added to the documentation and highlighted in
release notes.

I updated the documentation. I added a tip text which explains
how to_timestamp() and to_date() handled ordinary text prior to
PostgreSQL 11 and how they should handle ordinary text now.

There is now changes in the code and tests.

Thank you, Arthur!

I made following observations on Oracle behavior.
1) Oracle also supports parsing TZH in to_timestamp_tz() function.
Previously, in order to handle TZH (which could be negative) we
decided to skip spaces before fields, but not after fields [1][2].
That leads to behavior, which is both incompatible with Oracle and
unlogical (at least in my opinion). But after exploration of
to_timestamp_tz() behavior I found that Oracle can distinguish minus
sign used as separator from minus sign in TZH field.

# select to_char(to_timestamp_tz('2018-01-01 -10', 'YYYY MM DD TZH'),
'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00

# select to_char(to_timestamp_tz('2018-01-01--10', 'YYYY MM DD TZH'),
'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 +10:00

# select to_char(to_timestamp_tz('2018-01-01 -10', 'YYYY MM DD
TZH'), 'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00

# select to_char(to_timestamp_tz('2018-01-01---10', 'YYYY MM DD
TZH'), 'YYYY-MM-DD HH24:MI:SS TZH:TZM') from dual
2018-01-01 00:00:00 -10:00

So, if number of spaces/separators between fields in input string is
more than in format string and list space/separator skipped is minus
sign, then it decides to glue that minus sign to TZH. I think this
behavior is nice to follow, because it allows us to skip spaces after
fields.

2) It appears that Oracle skips spaces not only before fields, but
also in the beginning of the input string.

SELECT to_timestamp(' -2018', ' YYYY') FROM dual
# 01.08.2018 00:00:00

In the example given it seems that first space is skipped, while minus
sign is matched to space.

3) When multiple separators are specified in the format string, Oracle
doesn't allow skipping arbitrary number of spaces before each
separator as we did.

# SELECT to_timestamp('2018- -01 02', 'YYYY--MM-DD') FROM dual
ORA-01843: not a valid month

4) Spaces and separators in format string seem to be handled in the
same way in non-FX mode. But strange things happen when you mix
spaces and separators there.

# SELECT to_timestamp('2018- -01 02', 'YYYY---MM-DD') FROM dual
02.01.2018 00:00:00

# SELECT to_timestamp('2018- -01 02', 'YYYY MM-DD') FROM dual
02.01.2018 00:00:00

# SELECT to_timestamp('2018- -01 02', 'YYYY- -MM-DD') FROM dual
ORA-01843: not a valid month

After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)

1+2+3 are implemented in the attached patch, but not 4. I think that
it's only worth to follow Oracle when it does reasonable things.

I also plan to revise documentation and regression tests in the patch.
But I wanted to share my results so that everybody can give a feedback
if any.

1. /messages/by-id/CAEepm=1Y7z1VSisBKxa6x3E-jP-+=rWfw25q_PH2nGjqVcX-rw@mail.gmail.com
2. /messages/by-id/20180112125848.GA32559@zakirov.localdomain------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v14.patchapplication/octet-stream; name=0001-to-timestamp-format-checking-v14.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be92a6e..9281421f470 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6263,8 +6263,9 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
        skip multiple blank spaces in the input string unless the
-       <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       <literal>FX</literal> option is used.  For example,
+       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000&nbsp;JUN', 'YYYY&nbsp;&nbsp;&nbsp;&nbsp;MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6272,6 +6273,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       <function>to_timestamp</function> and <function>to_date</function> don't
+       skip multiple non letter and non digit characters in the input string,
+       but skip them in the formatting string. For example,
+       <literal>to_timestamp('2000-JUN', 'YYYY/MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY//MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of the "/" character in the input string
+       doesn't match count of it in the formatting string.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
@@ -6287,6 +6301,20 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        string; for example <literal>"XX"</literal> skips two input characters
        (whether or not they are <literal>XX</literal>).
       </para>
+      <tip>
+        <para>
+          Prior to <productname>PostgreSQL</productname> 11 it was possible to
+          skip input ordinary text using non letter and non digit characters.
+          For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> worked
+          before.  But currently to skip ordinary text in the input string it
+          is necessary to use only letter characters.  For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> or
+          <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal> will
+          skip <literal>y</literal>, <literal>m</literal> and
+          <literal>d</literal> now.
+        </para>
+      </tip>
      </listitem>
 
      <listitem>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 30696e3575d..2a7c049fc59 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2987,25 +3007,97 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	int			len,
 				value;
 	bool		fx_mode = false;
+	/* number of extra skipped charaters (more than given in format string) */
+	int			extra_skip = 0;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		/*
+		 * Ignore spaces in the beginning of the string and before fields when
+		 * not in FX (fixed width) mode.
+		 */
+		if (!fx_mode && (n->type != NODE_TYPE_ACTION || n->key->id != DCH_FX) &&
+			(n->type == NODE_TYPE_ACTION || n == node))
+		{
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
+
+
+		if (n->type == NODE_TYPE_SPACE)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
 			 */
-			s += pg_mblen(s);
-			continue;
+			if (!fx_mode)
+			{
+				extra_skip--;
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+				{
+					s++;
+					extra_skip++;
+				}
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that whitespace from the format string
+			 * matches whitespace from the input string.
+			 */
+			else if (!isspace((unsigned char) *s))
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected space character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
 		}
+		else if (n->type == NODE_TYPE_SEPARATOR)
+		{
+			/*
+			 * In non FX mode we don't insist that the consumed character
+			 * matches the format's character.
+			 */
+			if (!fx_mode)
+			{
+				extra_skip--;
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+				{
+					s++;
+					extra_skip++;
+				}
+
+				continue;
+			}
+			/*
+			 * In FX mode we insist that separator character from the format
+			 * string matches separator character from the input string.
+			 */
+			else if (*n->character != *s)
+				ereport(ERROR,
+						(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+						 errmsg("unexpected character \"%.*s\", expected separator character \"%s\"",
+								pg_mblen(s), s, n->character),
+						 errhint("In FX mode, punctuation in the input string "
+								 "must exactly match the format string.")));
 
-		/* Ignore spaces before fields when not in FX (fixed width) mode */
-		if (!fx_mode && n->key->id != DCH_FX)
+			s++;
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
 		{
-			while (*s != '\0' && isspace((unsigned char) *s))
-				s++;
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s += pg_mblen(s);
+			continue;
 		}
 
 		from_char_set_mode(out, n->key->date_mode);
@@ -3086,10 +3178,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 								n->key->name)));
 				break;
 			case DCH_TZH:
-				out->tzsign = *s == '-' ? -1 : +1;
-
+				/*
+				 * Value of TZH might be negative.  And the issue is that we
+				 * might swallow minus sign as the separator.  So, if we have
+				 * skipped more charaters than specified in the format string,
+				 * then we consider prepending last skipped minus to TZH.
+				 */
 				if (*s == '+' || *s == '-' || *s == ' ')
+				{
+					out->tzsign = *s == '-' ? -1 : +1;
 					s++;
+				}
+				else
+				{
+					if (extra_skip > 0 && *(s - 1) == '-')
+						out->tzsign = -1;
+					else
+						out->tzsign = +1;
+				}
 
 				from_char_parse_int_len(&out->tzh, &s, 2, n);
 				break;
@@ -3261,6 +3367,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+
+		/* Ignore all spaces after fields */
+		if (!fx_mode)
+		{
+			extra_skip = 0;
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e68..fb595c7b17d 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected separator character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,31 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3071,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cfc..e9accdf30ca 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,11 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp('2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#107

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#106)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Aug 2, 2018 at 6:17 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)

Ooops... I'm sorry, but I've posted wrong results here. Correct
version is here.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
ORA-01843: not a valid month
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 3)

So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle. And that seems
ridiculous for me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#108

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#106)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Aug 02, 2018 at 06:17:01PM +0300, Alexander Korotkov wrote:

So, if number of spaces/separators between fields in input string is
more than in format string and list space/separator skipped is minus
sign, then it decides to glue that minus sign to TZH. I think this
behavior is nice to follow, because it allows us to skip spaces after
fields.

Good approach! With this change the following queries work now:

=# SELECT to_timestamp('2000 + JUN', 'YYYY MON');
=# SELECT to_timestamp('2000 + JUN', 'YYYY MON');

I think it is worth to add them to regression tests.

4) Spaces and separators in format string seem to be handled in the
same way in non-FX mode. But strange things happen when you mix
spaces and separators there.
...
1+2+3 are implemented in the attached patch, but not 4. I think that
it's only worth to follow Oracle when it does reasonable things.

I agree with you. I think it isn't worth to copy this behaviour.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#109

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#107)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Aug 2, 2018 at 9:06 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Thu, Aug 2, 2018 at 6:17 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)

Ooops... I'm sorry, but I've posted wrong results here. Correct
version is here.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
ORA-01843: not a valid month
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 3)

So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle. And that seems
ridiculous for me.

BTW, I've also revised documentation and regression tests. Patch is attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v15.patchapplication/octet-stream; name=0001-to-timestamp-format-checking-v15.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be92a6e..6c5c722995a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6262,9 +6262,10 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
-       <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       skip multiple blank spaces in the beginng of input string, before and
+       after patterns unless the <literal>FX</literal> option is used.  For example,
+       <literal>to_timestamp('&nbsp;2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
        because <function>to_timestamp</function> expects one space only.
        <literal>FX</literal> must be specified as the first item in
@@ -6272,6 +6273,39 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       Space or separator (non letter and non digit character) in the formatting string of
+       <function>to_timestamp</function> and <function>to_date</function>
+       matches to nothing or one space or separator in the input string
+       unless the <literal>FX</literal> option is used.
+       <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because count of separators in the input string
+       doesn't match count of it in the formatting string.
+       When <literal>FX</literal> is specified then space matches only space
+       and separator matches only the same separator.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <literal>TZH</literal> template pattern may match signed number.  When
+       <literal>FX</literal> option isn't used that may lead to an ambiguity in
+       interpreting of minus sign, which may be also interpreted as separator.
+       This ambiguity is resolving as following.  When number of spaces and
+       separators before minus sign in input string is greater or equal to the
+       number of spaces and separators before <literal>TZH</literal> in formatting
+       string, then this minus is interpreted as part of time-zone hours.
+       Otherwise minus sign is considered as separator between values.
+       <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches
+       <literal>-10</literal> to <literal>TZH</literal>, but
+       <literal>to_timestamp('2000 -10', 'YYYY&nbsp;&nbsp;TZH')</literal>
+       matches just <literal>10</literal> to <literal>TZH</literal>
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
@@ -6287,6 +6321,20 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        string; for example <literal>"XX"</literal> skips two input characters
        (whether or not they are <literal>XX</literal>).
       </para>
+      <tip>
+        <para>
+          Prior to <productname>PostgreSQL</productname> 12 it was possible to
+          skip input ordinary text using non letter and non digit characters.
+          For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> worked
+          before.  But currently to skip ordinary text in the input string it
+          is necessary to use only letter characters.  For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> or
+          <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal> will
+          skip <literal>y</literal>, <literal>m</literal> and
+          <literal>d</literal> now.
+        </para>
+      </tip>
      </listitem>
 
      <listitem>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 30696e3575d..f5b692a0d9a 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2987,25 +3007,69 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	int			len,
 				value;
 	bool		fx_mode = false;
+	/* number of extra skipped charaters (more than given in format string) */
+	int			extra_skip = 0;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		/*
+		 * Ignore spaces in the beginning of the string and before fields when
+		 * not in FX (fixed width) mode.
+		 */
+		if (!fx_mode && (n->type != NODE_TYPE_ACTION || n->key->id != DCH_FX) &&
+			(n->type == NODE_TYPE_ACTION || n == node))
+		{
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
+
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
 			 */
-			s += pg_mblen(s);
+			if (!fx_mode)
+			{
+				extra_skip--;
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+				{
+					s++;
+					extra_skip++;
+				}
+			}
+			else
+			{
+				/*
+				 * In FX mode we insist that whitespaces and separator
+				 * characters from the format string match the same in input
+				 * string.
+				 */
+				if ((n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s)) ||
+					(n->type == NODE_TYPE_SEPARATOR && *n->character != *s))
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+							 errmsg("unexpected character \"%.*s\", expected space character \"%s\"",
+									pg_mblen(s), s, n->character),
+							 errhint("In FX mode, punctuation in the input string "
+									 "must exactly match the format string.")));
+				s++;
+			}
 			continue;
 		}
-
-		/* Ignore spaces before fields when not in FX (fixed width) mode */
-		if (!fx_mode && n->key->id != DCH_FX)
+		else if (n->type != NODE_TYPE_ACTION)
 		{
-			while (*s != '\0' && isspace((unsigned char) *s))
-				s++;
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s += pg_mblen(s);
+			continue;
 		}
 
 		from_char_set_mode(out, n->key->date_mode);
@@ -3086,10 +3150,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 								n->key->name)));
 				break;
 			case DCH_TZH:
-				out->tzsign = *s == '-' ? -1 : +1;
-
+				/*
+				 * Value of TZH might be negative.  And the issue is that we
+				 * might swallow minus sign as the separator.  So, if we have
+				 * skipped more charaters than specified in the format string,
+				 * then we consider prepending last skipped minus to TZH.
+				 */
 				if (*s == '+' || *s == '-' || *s == ' ')
+				{
+					out->tzsign = *s == '-' ? -1 : +1;
 					s++;
+				}
+				else
+				{
+					if (extra_skip > 0 && *(s - 1) == '-')
+						out->tzsign = -1;
+					else
+						out->tzsign = +1;
+				}
 
 				from_char_parse_int_len(&out->tzh, &s, 2, n);
 				break;
@@ -3261,6 +3339,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+
+		/* Ignore all spaces after fields */
+		if (!fx_mode)
+		{
+			extra_skip = 0;
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e68..63fb1c44fe9 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected space character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,64 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+         to_timestamp         
+------------------------------
+ Sat Jan 01 02:00:00 2000 PST
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+         to_timestamp         
+------------------------------
+ Fri Dec 31 06:00:00 1999 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3104,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cfc..807037be76e 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,17 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#110

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#109)

Re: [HACKERS] Bug in to_timestamp().

On Tue, Aug 14, 2018 at 06:38:56PM +0300, Alexander Korotkov wrote:

BTW, I've also revised documentation and regression tests. Patch is attached.

I looked at the patch. It applies without errors.

The document looks good. It compiles.
The code looks good too. It compiles and tests are passed.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#111

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#110)

Re: [HACKERS] Bug in to_timestamp().

Hi, David!

You can notice that:

1) We identified downside of changes in to_timestamp() function and
documented them [1].
2) We found 4 more differences between between patches behavior and
Oracle behavior [2][3]. One of them was assumed to be ridiculous and
wasn't implemented. So, I think this answers your original concern
that we shouldn't copy Oracle behavior bug to bug. So, it's depends
on particular case. For me, if function was introduced for Oracle
compatibility, then it's OK to copy aspects of Oracle behavior that
look reasonable. But if aspect doesn't look reasonable, then we don't
copy it.

Do you have any feedback on current state of patch?

1. /messages/by-id/20180723141254.GA10168@zakirov.localdomain
2. /messages/by-id/CAPpHfdtqOSniGJRvJ2zaaE8=eMB8XDnzvVS-9c3Xufaw=iPA+Q@mail.gmail.com
3. /messages/by-id/CAPpHfdso_Yvbo-EXKD8t3cuAeR7wszPyuWNBdjQLi1NrMt3O5w@mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#112

Liudmila Mantrova

l.mantrova@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#109)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On 08/14/2018 06:38 PM, Alexander Korotkov wrote:

On Thu, Aug 2, 2018 at 9:06 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Thu, Aug 2, 2018 at 6:17 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

After some experiments I found that when you mix spaces and separators
between two fields, then Oracle takes into account only length of last
group of spaces/separators.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM
dual2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
2018-01-01 00:00:00 -10:00
(length of last spaces/separators group is 3)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 2)

Ooops... I'm sorry, but I've posted wrong results here. Correct
version is here.

# SELECT to_timestamp('2018- -01 02', 'YYYY---- --- --MM-DD') FROM dual
ORA-01843: not a valid month
(length of last spaces/separators group is 2)

# SELECT to_timestamp('2018- -01 02', 'YYYY---- -- ---MM-DD') FROM dual
02.01.2018 00:00:00
(length of last spaces/separators group is 3)

So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle. And that seems
ridiculous for me.

BTW, I've also revised documentation and regression tests. Patch is attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Hi,
Please consider some further documentation improvements in the attached
patch.

--
Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v16.patchtext/x-patch; name=0001-to-timestamp-format-checking-v16.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index edc9be9..a8bbafc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6262,11 +6262,12 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
-       <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       skip multiple blank spaces at the beginning of the input string and
+       around date and time values unless the <literal>FX</literal> option is used.  For example,
+       <literal>to_timestamp('&nbsp;2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
-       because <function>to_timestamp</function> expects one space only.
+       because <function>to_timestamp</function> expects a single space only.
        <literal>FX</literal> must be specified as the first item in
        the template.
       </para>
@@ -6274,6 +6275,43 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
 
      <listitem>
       <para>
+       A separator (a space or a non-letter/non-digit character) in the template string of
+       <function>to_timestamp</function> and <function>to_date</function>
+       matches any single separator in the input string or is skipped,
+       unless the <literal>FX</literal> option is used.
+       For example, <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because the number of separators in the input string
+       exceeds the number of separators in the template.
+      </para>
+      <para>
+       If <literal>FX</literal> is specified, separators in the
+       input and template strings must match exactly. For example,
+       <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal>
+       returns an error because a space is expected in the input string.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <literal>TZH</literal> template pattern can match a signed number.
+       Without the <literal>FX</literal> option, it may lead to ambiguity in
+       interpretation of the minus sign, which can also be interpreted as a separator.
+       This ambiguity is resolved as follows.  If the number of separators before
+       <literal>TZH</literal> in the template string is less than the number of
+       separators before the minus sign in the input string, the minus sign
+       is interpreted as part of <literal>TZH</literal>.
+       Otherwise, the minus sign is considered to be a separator between values.
+       For example, <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches
+       <literal>-10</literal> to <literal>TZH</literal>, but
+       <literal>to_timestamp('2000 -10', 'YYYY&nbsp;&nbsp;TZH')</literal>
+       matches <literal>10</literal> to <literal>TZH</literal>.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
        Ordinary text is allowed in <function>to_char</function>
        templates and will be output literally.  You can put a substring
        in double quotes to force it to be interpreted as literal text
@@ -6287,6 +6325,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        string; for example <literal>"XX"</literal> skips two input characters
        (whether or not they are <literal>XX</literal>).
       </para>
+      <tip>
+        <para>
+          Prior to <productname>PostgreSQL</productname> 12, it was possible to
+          skip arbitrary text in the input string using non-letter or non-digit
+          characters. For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> used to
+          work.  Now you can only use letter characters for this purpose.  For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> and
+          <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal>
+          skip <literal>y</literal>, <literal>m</literal>, and
+          <literal>d</literal>.
+        </para>
+      </tip>
      </listitem>
 
      <listitem>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 30696e3..bf4e606 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2987,25 +3007,69 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	int			len,
 				value;
 	bool		fx_mode = false;
+	/* number of extra skipped characters (more than given in format string) */
+	int			extra_skip = 0;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		/*
+		 * Ignore spaces at the beginning of the string and before fields when
+		 * not in FX (fixed width) mode.
+		 */
+		if (!fx_mode && (n->type != NODE_TYPE_ACTION || n->key->id != DCH_FX) &&
+			(n->type == NODE_TYPE_ACTION || n == node))
+		{
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
+
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
 			 */
-			s += pg_mblen(s);
+			if (!fx_mode)
+			{
+				extra_skip--;
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+				{
+					s++;
+					extra_skip++;
+				}
+			}
+			else
+			{
+				/*
+				 * In FX mode we insist that whitespaces and separator
+				 * characters from the format string match the same in input
+				 * string.
+				 */
+				if ((n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s)) ||
+					(n->type == NODE_TYPE_SEPARATOR && *n->character != *s))
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+							 errmsg("unexpected character \"%.*s\", expected character \"%s\"",
+									pg_mblen(s), s, n->character),
+							 errhint("In FX mode, punctuation in the input string "
+									 "must exactly match the format string.")));
+				s++;
+			}
 			continue;
 		}
-
-		/* Ignore spaces before fields when not in FX (fixed width) mode */
-		if (!fx_mode && n->key->id != DCH_FX)
+		else if (n->type != NODE_TYPE_ACTION)
 		{
-			while (*s != '\0' && isspace((unsigned char) *s))
-				s++;
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s += pg_mblen(s);
+			continue;
 		}
 
 		from_char_set_mode(out, n->key->date_mode);
@@ -3086,10 +3150,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 								n->key->name)));
 				break;
 			case DCH_TZH:
-				out->tzsign = *s == '-' ? -1 : +1;
-
+				/*
+				 * Value of TZH might be negative.  And the issue is that we
+				 * might swallow minus sign as the separator.  So, if we have
+				 * skipped more characters than specified in the format string,
+				 * then we consider prepending last skipped minus to TZH.
+				 */
 				if (*s == '+' || *s == '-' || *s == ' ')
+				{
+					out->tzsign = *s == '-' ? -1 : +1;
 					s++;
+				}
+				else
+				{
+					if (extra_skip > 0 && *(s - 1) == '-')
+						out->tzsign = -1;
+					else
+						out->tzsign = +1;
+				}
 
 				from_char_parse_int_len(&out->tzh, &s, 2, n);
 				break;
@@ -3261,6 +3339,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+
+		/* Ignore all spaces after fields */
+		if (!fx_mode)
+		{
+			extra_skip = 0;
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e3919..976f478 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,21 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3029,64 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+         to_timestamp         
+------------------------------
+ Sat Jan 01 02:00:00 2000 PST
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+         to_timestamp         
+------------------------------
+ Fri Dec 31 06:00:00 1999 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3104,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a..807037b 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,17 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#113

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Liudmila Mantrova (#112)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Aug 16, 2018 at 4:57 PM Liudmila Mantrova
<l.mantrova@postgrespro.ru> wrote:

On 08/14/2018 06:38 PM, Alexander Korotkov wrote:

On Thu, Aug 2, 2018 at 9:06 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
BTW, I've also revised documentation and regression tests. Patch is attached.

Please consider some further documentation improvements in the attached
patch.

Thank you very much. Your editing looks good for me.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#114

David G. Johnston

david.g.johnston@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#111)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Aug 16, 2018 at 3:53 AM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

Hi, David!

You can notice that:

1) We identified downside of changes in to_timestamp() function and
documented them [1].
2) We found 4 more differences between between patches behavior and
Oracle behavior [2][3]. One of them was assumed to be ridiculous and
wasn't implemented. So, I think this answers your original concern
that we shouldn't copy Oracle behavior bug to bug. So, it's depends
on particular case. For me, if function was introduced for Oracle
compatibility, then it's OK to copy aspects of Oracle behavior that
look reasonable. But if aspect doesn't look reasonable, then we don't
copy it.

Do you have any feedback on current state of patch?

1. /messages/by-id/20180723141254.
GA10168%40zakirov.localdomain
2. /messages/by-id/CAPpHfdtqOSniGJRvJ2zaaE8%25
3DeMB8XDnzvVS-9c3Xufaw%3DiPA%2BQ%40mail.gmail.com
3. /messages/by-id/CAPpHfdso_Yvbo-
EXKD8t3cuAeR7wszPyuWNBdjQLi1NrMt3O5w%40mail.gmail.com

If the new behavior is an error I don't really have a problem since the
need to fix one's queries will be obvious.

"So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle. And that seems
ridiculous for me."

What do you believe should (or does) happen? Multiple groups always fail
or something else? How does this interplay with the detection of the
negative timezone offset? I'm not finding the behavior ridiculous at first
blush; not to the extent to avoid emulating it in a function whose purpose
is emulation. Being more lenient than Oracle seems undesirable.
Regardless of the choice made here it should be memorialized in the
regression tests.

The couple of regression tests that change do so for the better. It would
be illuminating to set this up as two patches though, one introducing all
of the new regression tests against the current code and then a second
patch with the changed behavior with only the affected tests.

David J.

#115

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: David G. Johnston (#114)

2 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

Hi!

Sorry for very long reply.

On Thu, Aug 16, 2018 at 11:44 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:

If the new behavior is an error I don't really have a problem since the need to fix one's queries will be obvious.

"So length of last group of spaces/separators in the pattern should be
greater or equal to length of spaces/separators in the input string.
Other previous groups are ignored in Oracle. And that seems
ridiculous for me."

What do you believe should (or does) happen? Multiple groups always fail or something else? How does this interplay with the detection of the negative timezone offset? I'm not finding the behavior ridiculous at first blush; not to the extent to avoid emulating it in a function whose purpose is emulation. Being more lenient than Oracle seems undesirable. Regardless of the choice made here it should be memorialized in the regression tests.

The current version of patch doesn't really distinguish spaces and
delimiters in format string in non-FX mode. So, spaces and delimiters
are forming single group. For me Oracle behavior is ridiculous at
least because it doesn't allow cases when input string exactly matches
format string.

This one fails:
SELECT to_timestamp('2018- -01 02', 'YYYY- -MM DD') FROM dual

But both this two are working:
SELECT to_timestamp('2018- -01 02', 'YYYY---MM DD') FROM dual
SELECT to_timestamp('2018- -01 02', 'YYYY MM DD') FROM dual

Regarding TZH, Oracle takes into account total number of characters
between placeholders as we do. So, there is no change in this aspect.

The couple of regression tests that change do so for the better. It would be illuminating to set this up as two patches though, one introducing all of the new regression tests against the current code and then a second patch with the changed behavior with only the affected tests.

OK, here you go. 0001-to_timestamp-regression-test-v17.patch
introduces changes in regression tests and their output for current
master, while 0002-to_timestamp-format-checking-v17.patch contain
changes to to_timestamp itself.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-to_timestamp-regression-test-v17.patchapplication/x-patch; name=0001-to_timestamp-regression-test-v17.patchDownload

diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e68..e585b07c18c 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Wed Jan 02 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,24 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2999,6 +3035,48 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
  Sun Dec 18 03:38:15 2011 PST
 (1 row)
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+ERROR:  invalid value "+JU" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+ERROR:  invalid value "+JU" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+         to_timestamp         
+------------------------------
+ Sat Jan 01 02:00:00 2000 PST
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+         to_timestamp         
+------------------------------
+ Fri Dec 31 06:00:00 1999 PST
+(1 row)
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
   to_date   
 ------------
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cfc..807037be76e 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,17 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

0002-to_timestamp-format-checking-v17.patchapplication/x-patch; name=0002-to_timestamp-format-checking-v17.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bb794e044f4..6f28fe1e753 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6262,16 +6262,54 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
-       <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       skip multiple blank spaces at the beginning of the input string and
+       around date and time values unless the <literal>FX</literal> option is used.  For example,
+       <literal>to_timestamp('&nbsp;2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
-       because <function>to_timestamp</function> expects one space only.
+       because <function>to_timestamp</function> expects a single space only.
        <literal>FX</literal> must be specified as the first item in
        the template.
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       A separator (a space or a non-letter/non-digit character) in the template string of
+       <function>to_timestamp</function> and <function>to_date</function>
+       matches any single separator in the input string or is skipped,
+       unless the <literal>FX</literal> option is used.
+       For example, <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because the number of separators in the input string
+       exceeds the number of separators in the template.
+      </para>
+      <para>
+       If <literal>FX</literal> is specified, separators in the
+       input and template strings must match exactly. For example,
+       <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal>
+       returns an error because a space is expected in the input string.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <literal>TZH</literal> template pattern can match a signed number.
+       Without the <literal>FX</literal> option, it may lead to ambiguity in
+       interpretation of the minus sign, which can also be interpreted as a separator.
+       This ambiguity is resolved as follows.  If the number of separators before
+       <literal>TZH</literal> in the template string is less than the number of
+       separators before the minus sign in the input string, the minus sign
+       is interpreted as part of <literal>TZH</literal>.
+       Otherwise, the minus sign is considered to be a separator between values.
+       For example, <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches
+       <literal>-10</literal> to <literal>TZH</literal>, but
+       <literal>to_timestamp('2000 -10', 'YYYY&nbsp;&nbsp;TZH')</literal>
+       matches <literal>10</literal> to <literal>TZH</literal>.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
@@ -6287,6 +6325,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        string; for example <literal>"XX"</literal> skips two input characters
        (whether or not they are <literal>XX</literal>).
       </para>
+      <tip>
+        <para>
+          Prior to <productname>PostgreSQL</productname> 12, it was possible to
+          skip arbitrary text in the input string using non-letter or non-digit
+          characters. For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> used to
+          work.  Now you can only use letter characters for this purpose.  For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> and
+          <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal>
+          skip <literal>y</literal>, <literal>m</literal>, and
+          <literal>d</literal>.
+        </para>
+      </tip>
      </listitem>
 
      <listitem>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 30696e3575d..bf4e6067e85 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2987,25 +3007,69 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	int			len,
 				value;
 	bool		fx_mode = false;
+	/* number of extra skipped characters (more than given in format string) */
+	int			extra_skip = 0;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		/*
+		 * Ignore spaces at the beginning of the string and before fields when
+		 * not in FX (fixed width) mode.
+		 */
+		if (!fx_mode && (n->type != NODE_TYPE_ACTION || n->key->id != DCH_FX) &&
+			(n->type == NODE_TYPE_ACTION || n == node))
+		{
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
+
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * In non FX (fixed format) mode we don't insist that the consumed
+			 * character matches the format's character.
 			 */
-			s += pg_mblen(s);
+			if (!fx_mode)
+			{
+				extra_skip--;
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+				{
+					s++;
+					extra_skip++;
+				}
+			}
+			else
+			{
+				/*
+				 * In FX mode we insist that whitespaces and separator
+				 * characters from the format string match the same in input
+				 * string.
+				 */
+				if ((n->type == NODE_TYPE_SPACE && !isspace((unsigned char) *s)) ||
+					(n->type == NODE_TYPE_SEPARATOR && *n->character != *s))
+					ereport(ERROR,
+							(errcode(ERRCODE_INVALID_DATETIME_FORMAT),
+							 errmsg("unexpected character \"%.*s\", expected character \"%s\"",
+									pg_mblen(s), s, n->character),
+							 errhint("In FX mode, punctuation in the input string "
+									 "must exactly match the format string.")));
+				s++;
+			}
 			continue;
 		}
-
-		/* Ignore spaces before fields when not in FX (fixed width) mode */
-		if (!fx_mode && n->key->id != DCH_FX)
+		else if (n->type != NODE_TYPE_ACTION)
 		{
-			while (*s != '\0' && isspace((unsigned char) *s))
-				s++;
+			/*
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
+			 */
+			s += pg_mblen(s);
+			continue;
 		}
 
 		from_char_set_mode(out, n->key->date_mode);
@@ -3086,10 +3150,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 								n->key->name)));
 				break;
 			case DCH_TZH:
-				out->tzsign = *s == '-' ? -1 : +1;
-
+				/*
+				 * Value of TZH might be negative.  And the issue is that we
+				 * might swallow minus sign as the separator.  So, if we have
+				 * skipped more characters than specified in the format string,
+				 * then we consider prepending last skipped minus to TZH.
+				 */
 				if (*s == '+' || *s == '-' || *s == ' ')
+				{
+					out->tzsign = *s == '-' ? -1 : +1;
 					s++;
+				}
+				else
+				{
+					if (extra_skip > 0 && *(s - 1) == '-')
+						out->tzsign = -1;
+					else
+						out->tzsign = +1;
+				}
 
 				from_char_parse_int_len(&out->tzh, &s, 2, n);
 				break;
@@ -3261,6 +3339,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+
+		/* Ignore all spaces after fields */
+		if (!fx_mode)
+		{
+			extra_skip = 0;
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index e585b07c18c..976f4783e1e 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2790,7 +2790,7 @@ SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
 SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
          to_timestamp         
 ------------------------------
- Wed Jan 02 00:00:00 1985 PST
+ Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
@@ -2835,11 +2835,8 @@ SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
 (1 row)
 
 SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp         
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
 SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
          to_timestamp         
 ------------------------------
@@ -3002,7 +2999,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -3032,7 +3029,7 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
@@ -3042,8 +3039,11 @@ SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
 (1 row)
 
 SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
-ERROR:  invalid value "+JU" for "MON"
-DETAIL:  The given value did not match any of the allowed values for this field.
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
 SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
          to_timestamp         
 ------------------------------
@@ -3051,20 +3051,32 @@ SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
 (1 row)
 
 SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
-ERROR:  invalid value "+JU" for "MON"
-DETAIL:  The given value did not match any of the allowed values for this field.
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
 SELECT to_timestamp('2000 + JUN', 'YYYY MON');
-ERROR:  invalid value "+ J" for "MON"
-DETAIL:  The given value did not match any of the allowed values for this field.
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
 SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
-ERROR:  invalid value "+ J" for "MON"
-DETAIL:  The given value did not match any of the allowed values for this field.
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
 SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
 ERROR:  invalid value "+ J" for "MON"
 DETAIL:  The given value did not match any of the allowed values for this field.
 SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
-ERROR:  invalid value "+ J" for "MON"
-DETAIL:  The given value did not match any of the allowed values for this field.
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
 SELECT to_timestamp('2000 -10', 'YYYY TZH');
          to_timestamp         
 ------------------------------
@@ -3092,13 +3104,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');

#116

David G. Johnston

david.g.johnston@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#115)

Re: [HACKERS] Bug in to_timestamp().

On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

The current version of patch doesn't really distinguish spaces and
delimiters in format string in non-FX mode. So, spaces and delimiters
are forming single group. For me Oracle behavior is ridiculous at
least because it doesn't allow cases when input string exactly matches
format string.

This one fails:
SELECT to_timestamp('2018- -01 02', 'YYYY- -MM DD') FROM dual

Related to below - since this works today it should continue to work. I
was under the wrong impression that we followed their behavior today and
were pondering deviating from it because of its ridiculousness.

The couple of regression tests that change do so for the better. It

would be illuminating to set this up as two patches though, one introducing
all of the new regression tests against the current code and then a second
patch with the changed behavior with only the affected tests.

OK, here you go. 0001-to_timestamp-regression-test-v17.patch
introduces changes in regression tests and their output for current
master, while 0002-to_timestamp-format-checking-v17.patch contain
changes to to_timestamp itself.

From those results the question is how important is it to force the
following breakage on our users (i.e., introduce FX exact symbol matching):

SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the
format string.

There seemed to be some implicit approvals of this breakage some 30 emails
and 10 months ago but given that this is the only change from a correct
result to a failure I'd like to officially put it out there for
opinion/vote gathering. Mine is a -1; though keeping the distinction
between space and non-alphanumeric characters is expected.

David J.

#117

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: David G. Johnston (#116)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

The current version of patch doesn't really distinguish spaces and
delimiters in format string in non-FX mode. So, spaces and delimiters
are forming single group. For me Oracle behavior is ridiculous at
least because it doesn't allow cases when input string exactly matches
format string.

This one fails:
SELECT to_timestamp('2018- -01 02', 'YYYY- -MM DD') FROM dual

Related to below - since this works today it should continue to work. I was under the wrong impression that we followed their behavior today and were pondering deviating from it because of its ridiculousness.

Right, that's just incompatibility with Oracle behavior, not with our
previous behavior.

The couple of regression tests that change do so for the better. It would be illuminating to set this up as two patches though, one introducing all of the new regression tests against the current code and then a second patch with the changed behavior with only the affected tests.

OK, here you go. 0001-to_timestamp-regression-test-v17.patch
introduces changes in regression tests and their output for current
master, while 0002-to_timestamp-format-checking-v17.patch contain
changes to to_timestamp itself.

From those results the question is how important is it to force the following breakage on our users (i.e., introduce FX exact symbol matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
There seemed to be some implicit approvals of this breakage some 30 emails and 10 months ago but given that this is the only change from a correct result to a failure I'd like to officially put it out there for opinion/vote gathering. Mine is a -1; though keeping the distinction between space and non-alphanumeric characters is expected.

Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#118

amul sul

sulamul@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#117)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

The current version of patch doesn't really distinguish spaces and
delimiters in format string in non-FX mode. So, spaces and delimiters
are forming single group. For me Oracle behavior is ridiculous at
least because it doesn't allow cases when input string exactly matches
format string.

This one fails:
SELECT to_timestamp('2018- -01 02', 'YYYY- -MM DD') FROM dual

Related to below - since this works today it should continue to work. I was under the wrong impression that we followed their behavior today and were pondering deviating from it because of its ridiculousness.

Right, that's just incompatibility with Oracle behavior, not with our
previous behavior.
The couple of regression tests that change do so for the better. It would be illuminating to set this up as two patches though, one introducing all of the new regression tests against the current code and then a second patch with the changed behavior with only the affected tests.

OK, here you go. 0001-to_timestamp-regression-test-v17.patch
introduces changes in regression tests and their output for current
master, while 0002-to_timestamp-format-checking-v17.patch contain
changes to to_timestamp itself.

From those results the question is how important is it to force the following breakage on our users (i.e., introduce FX exact symbol matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
There seemed to be some implicit approvals of this breakage some 30 emails and 10 months ago but given that this is the only change from a correct result to a failure I'd like to officially put it out there for opinion/vote gathering. Mine is a -1; though keeping the distinction between space and non-alphanumeric characters is expected.
Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?

Ditto.

Regards,
Amul Sul.

#119

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: amul sul (#118)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 5, 2018 at 3:10 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
From those results the question is how important is it to force the following breakage on our users (i.e., introduce FX exact symbol matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
There seemed to be some implicit approvals of this breakage some 30 emails and 10 months ago but given that this is the only change from a correct result to a failure I'd like to officially put it out there for opinion/vote gathering. Mine is a -1; though keeping the distinction between space and non-alphanumeric characters is expected.
Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?
Ditto.

So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#120

amul sul

sulamul@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#119)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:

On Wed, Sep 5, 2018 at 3:10 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

From those results the question is how important is it to force the

following breakage on our users (i.e., introduce FX exact symbol matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly
match the format string.

There seemed to be some implicit approvals of this breakage some 30

emails and 10 months ago but given that this is the only change from a
correct result to a failure I'd like to officially put it out there for
opinion/vote gathering. Mine is a -1; though keeping the distinction
between space and non-alphanumeric characters is expected.

Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?

Ditto.

So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.

Yeah, that make sense to me, thank you.

Regards,
Amul

Show quoted text

#121

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: amul sul (#120)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 5, 2018 at 7:28 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Sep 5, 2018 at 3:10 PM amul sul <sulamul@gmail.com> wrote:
On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
From those results the question is how important is it to force the following breakage on our users (i.e., introduce FX exact symbol matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
There seemed to be some implicit approvals of this breakage some 30 emails and 10 months ago but given that this is the only change from a correct result to a failure I'd like to officially put it out there for opinion/vote gathering. Mine is a -1; though keeping the distinction between space and non-alphanumeric characters is expected.
Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?
Ditto.
So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.
Yeah, that make sense to me, thank you.

OK! I've removed FX changes from the patch. The result is attached.
I'm going to commit this if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v18.patchapplication/octet-stream; name=0001-to-timestamp-format-checking-v18.patchDownload

commit a5f933a84627f442d1850ca6da642be7fdb660ea
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Tue Sep 4 00:02:30 2018 +0300

    Improve behavior of to_timestamp()/to_date() functions
    
    to_timestamp()/to_date() functions were introduced mainly for Oracle
    compatibility, and became very popular among PostgreSQL users.  However, some
    behavior of to_timestamp()/to_date() functions are both incompatible with Oracle
    and confusing for our users.  This behavior is related to handling of spaces and
    separators in non FX (fixed format) mode.  This commit reworks this behavior
    making less confusing, better documented and more compatible with Oracle.
    
    Nevertheless, there are still following incompatibilities with Oracle.
    1) We don't insist that there are no format string patterns unmatched to
       input string.
    2) In FX mode we don't exist space and separators in format string to exactly
       match input string.
    3) When format string patterns are divided by mix of spaces and separators, we
       don't distinguish them, while Oracle takes into account only last group of
       spaces/separators.
    
    Discussion: https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com
    Author: Artur Zakirov, Alexander Korotkov, Liudmila Mantrova
    Review: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bb794e044f4..3f15dc57282 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6262,16 +6262,57 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
-       <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       skip multiple blank spaces at the beginning of the input string and
+       around date and time values unless the <literal>FX</literal> option is used.  For example,
+       <literal>to_timestamp('&nbsp;2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
-       because <function>to_timestamp</function> expects one space only.
+       because <function>to_timestamp</function> expects a single space only.
        <literal>FX</literal> must be specified as the first item in
        the template.
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       A separator (a space or a non-letter/non-digit character) in the template string of
+       <function>to_timestamp</function> and <function>to_date</function>
+       matches any single separator in the input string or is skipped,
+       unless the <literal>FX</literal> option is used.
+       For example, <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because the number of separators in the input string
+       exceeds the number of separators in the template.
+      </para>
+      <para>
+       If <literal>FX</literal> is specified, separator in template string
+       matches to exactly one character in input string.  Notice we don't insist
+       input string character to be the same as template string separator.
+       For example, <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal>
+       works, but <literal>to_timestamp('2000/JUN', 'FXYYYY  MON')</literal>
+       returns an error because a space second template string space consumed
+       letter <literal>J</literal> from the input string.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <literal>TZH</literal> template pattern can match a signed number.
+       Without the <literal>FX</literal> option, it may lead to ambiguity in
+       interpretation of the minus sign, which can also be interpreted as a separator.
+       This ambiguity is resolved as follows.  If the number of separators before
+       <literal>TZH</literal> in the template string is less than the number of
+       separators before the minus sign in the input string, the minus sign
+       is interpreted as part of <literal>TZH</literal>.
+       Otherwise, the minus sign is considered to be a separator between values.
+       For example, <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches
+       <literal>-10</literal> to <literal>TZH</literal>, but
+       <literal>to_timestamp('2000 -10', 'YYYY&nbsp;&nbsp;TZH')</literal>
+       matches <literal>10</literal> to <literal>TZH</literal>.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
@@ -6287,6 +6328,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        string; for example <literal>"XX"</literal> skips two input characters
        (whether or not they are <literal>XX</literal>).
       </para>
+      <tip>
+        <para>
+          Prior to <productname>PostgreSQL</productname> 12, it was possible to
+          skip arbitrary text in the input string using non-letter or non-digit
+          characters. For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> used to
+          work.  Now you can only use letter characters for this purpose.  For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> and
+          <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal>
+          skip <literal>y</literal>, <literal>m</literal>, and
+          <literal>d</literal>.
+        </para>
+      </tip>
      </listitem>
 
      <listitem>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 30696e3575d..2ed8ca675bd 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2987,27 +3007,66 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	int			len,
 				value;
 	bool		fx_mode = false;
+	/* number of extra skipped characters (more than given in format string) */
+	int			extra_skip = 0;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		/*
+		 * Ignore spaces at the beginning of the string and before fields when
+		 * not in FX (fixed width) mode.
+		 */
+		if (!fx_mode && (n->type != NODE_TYPE_ACTION || n->key->id != DCH_FX) &&
+			(n->type == NODE_TYPE_ACTION || n == node))
+		{
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
+
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
+		{
+			if (!fx_mode)
+			{
+				/*
+				 * In non FX (fixed format) mode one format string space or
+				 * separator match to one space or separator in input string.
+				 * Or match nothing if there is no space or separator in
+				 * the current position of input string.
+				 */
+				extra_skip--;
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+				{
+					s++;
+					extra_skip++;
+				}
+			}
+			else
+			{
+				/*
+				 * In FX mode, on format string space or separator we consume
+				 * exactly one character from input string.  Notice we don't
+				 * insist that the consumed character match the format's
+				 * character.
+				 */
+				s += pg_mblen(s);
+			}
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
 		}
 
-		/* Ignore spaces before fields when not in FX (fixed width) mode */
-		if (!fx_mode && n->key->id != DCH_FX)
-		{
-			while (*s != '\0' && isspace((unsigned char) *s))
-				s++;
-		}
-
 		from_char_set_mode(out, n->key->date_mode);
 
 		switch (n->key->id)
@@ -3086,10 +3145,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 								n->key->name)));
 				break;
 			case DCH_TZH:
-				out->tzsign = *s == '-' ? -1 : +1;
-
+				/*
+				 * Value of TZH might be negative.  And the issue is that we
+				 * might swallow minus sign as the separator.  So, if we have
+				 * skipped more characters than specified in the format string,
+				 * then we consider prepending last skipped minus to TZH.
+				 */
 				if (*s == '+' || *s == '-' || *s == ' ')
+				{
+					out->tzsign = *s == '-' ? -1 : +1;
 					s++;
+				}
+				else
+				{
+					if (extra_skip > 0 && *(s - 1) == '-')
+						out->tzsign = -1;
+					else
+						out->tzsign = +1;
+				}
 
 				from_char_parse_int_len(&out->tzh, &s, 2, n);
 				break;
@@ -3261,6 +3334,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+
+		/* Ignore all spaces after fields */
+		if (!fx_mode)
+		{
+			extra_skip = 0;
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e68..7d11f251584 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,24 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +3002,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3032,64 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+         to_timestamp         
+------------------------------
+ Sat Jan 01 02:00:00 2000 PST
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+         to_timestamp         
+------------------------------
+ Fri Dec 31 06:00:00 1999 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3107,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cfc..807037be76e 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,17 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#122

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#121)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 7:28 PM amul sul <sulamul@gmail.com> wrote:
On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Wed, Sep 5, 2018 at 3:10 PM amul sul <sulamul@gmail.com> wrote:
On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
From those results the question is how important is it to force the following breakage on our users (i.e., introduce FX exact symbol matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must exactly match the format string.
There seemed to be some implicit approvals of this breakage some 30 emails and 10 months ago but given that this is the only change from a correct result to a failure I'd like to officially put it out there for opinion/vote gathering. Mine is a -1; though keeping the distinction between space and non-alphanumeric characters is expected.
Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?
Ditto.
So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.
Yeah, that make sense to me, thank you.
OK! I've removed FX changes from the patch. The result is attached.
I'm going to commit this if no objections.

Attached revision fixes usage of two subsequent spaces in the documentation.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

0001-to-timestamp-format-checking-v19.patchapplication/octet-stream; name=0001-to-timestamp-format-checking-v19.patchDownload

commit 9b27411b6bbd9161ddddc3b6aa7253816df0e175
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Tue Sep 4 00:02:30 2018 +0300

    Improve behavior of to_timestamp()/to_date() functions
    
    to_timestamp()/to_date() functions were introduced mainly for Oracle
    compatibility, and became very popular among PostgreSQL users.  However, some
    behavior of to_timestamp()/to_date() functions are both incompatible with Oracle
    and confusing for our users.  This behavior is related to handling of spaces and
    separators in non FX (fixed format) mode.  This commit reworks this behavior
    making less confusing, better documented and more compatible with Oracle.
    
    Nevertheless, there are still following incompatibilities with Oracle.
    1) We don't insist that there are no format string patterns unmatched to
       input string.
    2) In FX mode we don't exist space and separators in format string to exactly
       match input string.
    3) When format string patterns are divided by mix of spaces and separators, we
       don't distinguish them, while Oracle takes into account only last group of
       spaces/separators.
    
    Discussion: https://postgr.es/m/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com
    Author: Artur Zakirov, Alexander Korotkov, Liudmila Mantrova
    Review: Amul Sul, Robert Haas, Tom Lane, Dmitry Dolgov, David G. Johnston

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bb794e044f4..9b8a0134846 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6262,16 +6262,57 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
      <listitem>
       <para>
        <function>to_timestamp</function> and <function>to_date</function>
-       skip multiple blank spaces in the input string unless the
-       <literal>FX</literal> option is used. For example,
-       <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> works, but
+       skip multiple blank spaces at the beginning of the input string and
+       around date and time values unless the <literal>FX</literal> option is used.  For example,
+       <literal>to_timestamp('&nbsp;2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'YYYY MON')</literal> and
+       <literal>to_timestamp('2000 - JUN', 'YYYY-MON')</literal> work, but
        <literal>to_timestamp('2000&nbsp;&nbsp;&nbsp;&nbsp;JUN', 'FXYYYY MON')</literal> returns an error
-       because <function>to_timestamp</function> expects one space only.
+       because <function>to_timestamp</function> expects a single space only.
        <literal>FX</literal> must be specified as the first item in
        the template.
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+       A separator (a space or a non-letter/non-digit character) in the template string of
+       <function>to_timestamp</function> and <function>to_date</function>
+       matches any single separator in the input string or is skipped,
+       unless the <literal>FX</literal> option is used.
+       For example, <literal>to_timestamp('2000JUN', 'YYYY///MON')</literal> and
+       <literal>to_timestamp('2000/JUN', 'YYYY MON')</literal> work, but
+       <literal>to_timestamp('2000//JUN', 'YYYY/MON')</literal>
+       returns an error because the number of separators in the input string
+       exceeds the number of separators in the template.
+      </para>
+      <para>
+       If <literal>FX</literal> is specified, separator in template string
+       matches to exactly one character in input string.  Notice we don't insist
+       input string character to be the same as template string separator.
+       For example, <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal>
+       works, but <literal>to_timestamp('2000/JUN', 'FXYYYY&nbsp;&nbsp;MON')</literal>
+       returns an error because a space second template string space consumed
+       letter <literal>J</literal> from the input string.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+       <literal>TZH</literal> template pattern can match a signed number.
+       Without the <literal>FX</literal> option, it may lead to ambiguity in
+       interpretation of the minus sign, which can also be interpreted as a separator.
+       This ambiguity is resolved as follows.  If the number of separators before
+       <literal>TZH</literal> in the template string is less than the number of
+       separators before the minus sign in the input string, the minus sign
+       is interpreted as part of <literal>TZH</literal>.
+       Otherwise, the minus sign is considered to be a separator between values.
+       For example, <literal>to_timestamp('2000 -10', 'YYYY TZH')</literal> matches
+       <literal>-10</literal> to <literal>TZH</literal>, but
+       <literal>to_timestamp('2000 -10', 'YYYY&nbsp;&nbsp;TZH')</literal>
+       matches <literal>10</literal> to <literal>TZH</literal>.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        Ordinary text is allowed in <function>to_char</function>
@@ -6287,6 +6328,19 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        string; for example <literal>"XX"</literal> skips two input characters
        (whether or not they are <literal>XX</literal>).
       </para>
+      <tip>
+        <para>
+          Prior to <productname>PostgreSQL</productname> 12, it was possible to
+          skip arbitrary text in the input string using non-letter or non-digit
+          characters. For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyy-MM-DD')</literal> used to
+          work.  Now you can only use letter characters for this purpose.  For example,
+          <literal>to_timestamp('2000y6m1d', 'yyyytMMtDDt')</literal> and
+          <literal>to_timestamp('2000y6m1d', 'yyyy"y"MM"m"DD"d"')</literal>
+          skip <literal>y</literal>, <literal>m</literal>, and
+          <literal>d</literal>.
+        </para>
+      </tip>
      </listitem>
 
      <listitem>
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 30696e3575d..2ed8ca675bd 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -165,6 +165,8 @@ typedef struct
 #define NODE_TYPE_END		1
 #define NODE_TYPE_ACTION	2
 #define NODE_TYPE_CHAR		3
+#define NODE_TYPE_SEPARATOR	4
+#define NODE_TYPE_SPACE		5
 
 #define SUFFTYPE_PREFIX		1
 #define SUFFTYPE_POSTFIX	2
@@ -955,6 +957,7 @@ typedef struct NUMProc
 static const KeyWord *index_seq_search(const char *str, const KeyWord *kw,
 				 const int *index);
 static const KeySuffix *suff_search(const char *str, const KeySuffix *suf, int type);
+static bool is_separator_char(const char *str);
 static void NUMDesc_prepare(NUMDesc *num, FormatNode *n);
 static void parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 			 const KeySuffix *suf, const int *index, int ver, NUMDesc *Num);
@@ -1044,6 +1047,16 @@ suff_search(const char *str, const KeySuffix *suf, int type)
 	return NULL;
 }
 
+static bool
+is_separator_char(const char *str)
+{
+	/* ASCII printable character, but not letter or digit */
+	return (*str > 0x20 && *str < 0x7F &&
+			!(*str >= 'A' && *str <= 'Z') &&
+			!(*str >= 'a' && *str <= 'z') &&
+			!(*str >= '0' && *str <= '9'));
+}
+
 /* ----------
  * Prepare NUMDesc (number description struct) via FormatNode struct
  * ----------
@@ -1319,7 +1332,14 @@ parse_format(FormatNode *node, const char *str, const KeyWord *kw,
 				if (*str == '\\' && *(str + 1) == '"')
 					str++;
 				chlen = pg_mblen(str);
-				n->type = NODE_TYPE_CHAR;
+
+				if (ver == DCH_TYPE && is_separator_char(str))
+					n->type = NODE_TYPE_SEPARATOR;
+				else if (isspace((unsigned char) *str))
+					n->type = NODE_TYPE_SPACE;
+				else
+					n->type = NODE_TYPE_CHAR;
+
 				memcpy(n->character, str, chlen);
 				n->character[chlen] = '\0';
 				n->key = NULL;
@@ -2987,27 +3007,66 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 	int			len,
 				value;
 	bool		fx_mode = false;
+	/* number of extra skipped characters (more than given in format string) */
+	int			extra_skip = 0;
 
 	for (n = node, s = in; n->type != NODE_TYPE_END && *s != '\0'; n++)
 	{
-		if (n->type != NODE_TYPE_ACTION)
+		/*
+		 * Ignore spaces at the beginning of the string and before fields when
+		 * not in FX (fixed width) mode.
+		 */
+		if (!fx_mode && (n->type != NODE_TYPE_ACTION || n->key->id != DCH_FX) &&
+			(n->type == NODE_TYPE_ACTION || n == node))
+		{
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
+
+		if (n->type == NODE_TYPE_SPACE || n->type == NODE_TYPE_SEPARATOR)
+		{
+			if (!fx_mode)
+			{
+				/*
+				 * In non FX (fixed format) mode one format string space or
+				 * separator match to one space or separator in input string.
+				 * Or match nothing if there is no space or separator in
+				 * the current position of input string.
+				 */
+				extra_skip--;
+				if (isspace((unsigned char) *s) || is_separator_char(s))
+				{
+					s++;
+					extra_skip++;
+				}
+			}
+			else
+			{
+				/*
+				 * In FX mode, on format string space or separator we consume
+				 * exactly one character from input string.  Notice we don't
+				 * insist that the consumed character match the format's
+				 * character.
+				 */
+				s += pg_mblen(s);
+			}
+			continue;
+		}
+		else if (n->type != NODE_TYPE_ACTION)
 		{
 			/*
-			 * Separator, so consume one character from input string.  Notice
-			 * we don't insist that the consumed character match the format's
-			 * character.
+			 * Text character, so consume one character from input string.
+			 * Notice we don't insist that the consumed character match the
+			 * format's character.
+			 * Text field ignores FX mode.
 			 */
 			s += pg_mblen(s);
 			continue;
 		}
 
-		/* Ignore spaces before fields when not in FX (fixed width) mode */
-		if (!fx_mode && n->key->id != DCH_FX)
-		{
-			while (*s != '\0' && isspace((unsigned char) *s))
-				s++;
-		}
-
 		from_char_set_mode(out, n->key->date_mode);
 
 		switch (n->key->id)
@@ -3086,10 +3145,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 								n->key->name)));
 				break;
 			case DCH_TZH:
-				out->tzsign = *s == '-' ? -1 : +1;
-
+				/*
+				 * Value of TZH might be negative.  And the issue is that we
+				 * might swallow minus sign as the separator.  So, if we have
+				 * skipped more characters than specified in the format string,
+				 * then we consider prepending last skipped minus to TZH.
+				 */
 				if (*s == '+' || *s == '-' || *s == ' ')
+				{
+					out->tzsign = *s == '-' ? -1 : +1;
 					s++;
+				}
+				else
+				{
+					if (extra_skip > 0 && *(s - 1) == '-')
+						out->tzsign = -1;
+					else
+						out->tzsign = +1;
+				}
 
 				from_char_parse_int_len(&out->tzh, &s, 2, n);
 				break;
@@ -3261,6 +3334,17 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 				SKIP_THth(s, n->suffix);
 				break;
 		}
+
+		/* Ignore all spaces after fields */
+		if (!fx_mode)
+		{
+			extra_skip = 0;
+			while (*s != '\0' && isspace((unsigned char) *s))
+			{
+				s++;
+				extra_skip++;
+			}
+		}
 	}
 }
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 63e39198e68..7d11f251584 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -2769,14 +2769,32 @@ SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
  Sat Feb 16 08:14:30 0097 PST
 (1 row)
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+         to_timestamp         
+------------------------------
+ Fri Mar 18 23:38:15 2011 PDT
+(1 row)
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
          to_timestamp         
 ------------------------------
  Sat Jan 12 00:00:00 1985 PST
 (1 row)
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+         to_timestamp         
+------------------------------
+ Sat Jan 12 00:00:00 1985 PST
+(1 row)
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
          to_timestamp         
 ------------------------------
  Sun May 16 00:00:00 1976 PDT
@@ -2789,7 +2807,7 @@ SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 (1 row)
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
          to_timestamp         
 ------------------------------
  Thu Jan 01 15:54:45 1998 PST
@@ -2810,6 +2828,24 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 ERROR:  invalid value "/Fe" for "Mon"
 DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+         to_timestamp         
+------------------------------
+ Sun Feb 16 00:00:00 1997 PST
+(1 row)
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
          to_timestamp         
 ------------------------------
@@ -2966,7 +3002,7 @@ SELECT to_timestamp('2011-12-18 11:38 20',     'YYYY-MM-DD HH12:MI TZM');
 SELECT to_timestamp('2011-12-18 23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
 (1 row)
 
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
@@ -2996,7 +3032,64 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
          to_timestamp         
 ------------------------------
- Sun Dec 18 03:38:15 2011 PST
+ Sun Dec 18 23:38:15 2011 PST
+(1 row)
+
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+ERROR:  invalid value "+ J" for "MON"
+DETAIL:  The given value did not match any of the allowed values for this field.
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+         to_timestamp         
+------------------------------
+ Thu Jun 01 00:00:00 2000 PDT
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+         to_timestamp         
+------------------------------
+ Sat Jan 01 02:00:00 2000 PST
+(1 row)
+
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+         to_timestamp         
+------------------------------
+ Fri Dec 31 06:00:00 1999 PST
 (1 row)
 
 SELECT to_date('2011 12  18', 'YYYY MM DD');
@@ -3014,13 +3107,13 @@ SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');
   to_date   
 ------------
- 12-08-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011 12 18', 'YYYY  MM DD');
   to_date   
 ------------
- 02-18-2011
+ 12-18-2011
 (1 row)
 
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index ebb196a1cfc..807037be76e 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -392,15 +392,21 @@ SELECT to_timestamp('0097/Feb/16 --> 08:14:30', 'YYYY/Mon/DD --> HH:MI:SS');
 
 SELECT to_timestamp('97/2/16 8:14:30', 'FMYYYY/FMMM/FMDD FMHH:FMMI:FMSS');
 
+SELECT to_timestamp('2011$03!18 23_38_15', 'YYYY-MM-DD HH24:MI:SS');
+
 SELECT to_timestamp('1985 January 12', 'YYYY FMMonth DD');
 
+SELECT to_timestamp('1985 FMMonth 12', 'YYYY "FMMonth" DD');
+
+SELECT to_timestamp('1985 \ 12', 'YYYY \\ DD');
+
 SELECT to_timestamp('My birthday-> Year: 1976, Month: May, Day: 16',
-                    '"My birthday-> Year" YYYY, "Month:" FMMonth, "Day:" DD');
+                    '"My birthday-> Year:" YYYY, "Month:" FMMonth, "Day:" DD');
 
 SELECT to_timestamp('1,582nd VIII 21', 'Y,YYYth FMRM DD');
 
 SELECT to_timestamp('15 "text between quote marks" 98 54 45',
-                    E'HH24 "\\text between quote marks\\"" YY MI SS');
+                    E'HH24 "\\"text between quote marks\\"" YY MI SS');
 
 SELECT to_timestamp('05121445482000', 'MMDDHH24MISSYYYY');
 
@@ -408,6 +414,12 @@ SELECT to_timestamp('2000January09Sunday', 'YYYYFMMonthDDFMDay');
 
 SELECT to_timestamp('97/Feb/16', 'YYMonDD');
 
+SELECT to_timestamp('97/Feb/16', 'YY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
+
+SELECT to_timestamp('97/Feb/16', 'FXYY/Mon/DD');
+
 SELECT to_timestamp('19971116', 'YYYYMMDD');
 
 SELECT to_timestamp('20000-1116', 'YYYY-MMDD');
@@ -464,6 +476,17 @@ SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD  HH24:MI:SS');
 SELECT to_timestamp('2011-12-18  23:38:15', 'YYYY-MM-DD   HH24:MI:SS');
 
+SELECT to_timestamp('2000+   JUN', 'YYYY/MON');
+SELECT to_timestamp('  2000 +JUN', 'YYYY/MON');
+SELECT to_timestamp(' 2000 +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000  +JUN', 'YYYY//MON');
+SELECT to_timestamp('2000 + JUN', 'YYYY MON');
+SELECT to_timestamp('2000 ++ JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY  MON');
+SELECT to_timestamp('2000 + + JUN', 'YYYY   MON');
+SELECT to_timestamp('2000 -10', 'YYYY TZH');
+SELECT to_timestamp('2000 -10', 'YYYY  TZH');
+
 SELECT to_date('2011 12  18', 'YYYY MM DD');
 SELECT to_date('2011 12  18', 'YYYY MM  DD');
 SELECT to_date('2011 12  18', 'YYYY MM   DD');

#123

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#122)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Sep 6, 2018 at 3:58 PM Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:

On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 7:28 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov <

a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 3:10 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

From those results the question is how important is it to force

the following breakage on our users (i.e., introduce FX exact symbol
matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must
exactly match the format string.

There seemed to be some implicit approvals of this breakage

some 30 emails and 10 months ago but given that this is the only change
from a correct result to a failure I'd like to officially put it out there
for opinion/vote gathering. Mine is a -1; though keeping the distinction
between space and non-alphanumeric characters is expected.

Do I understand correctly that you're -1 to changes to FX mode,

but no

objection to changes in non-FX mode?

Ditto.

So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.

Yeah, that make sense to me, thank you.

OK! I've removed FX changes from the patch. The result is attached.
I'm going to commit this if no objections.

Attached revision fixes usage of two subsequent spaces in the
documentation.

So, pushed! Thanks to every thread participant for review and feedback.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#124

Arthur Zakirov

a.zakirov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#123)

Re: [HACKERS] Bug in to_timestamp().

On Sun, Sep 09, 2018 at 09:52:46PM +0300, Alexander Korotkov wrote:

So, pushed! Thanks to every thread participant for review and feedback.

Great! Should we close the commitfest entry? There is FX part of the
patch though. But it seems that nobody is happy with it. It could be
done with a separate patch anyway.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

#125

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Arthur Zakirov (#124)

Re: [HACKERS] Bug in to_timestamp().

On Tue, Sep 11, 2018 at 6:18 PM Arthur Zakirov <a.zakirov@postgrespro.ru>
wrote:

On Sun, Sep 09, 2018 at 09:52:46PM +0300, Alexander Korotkov wrote:

So, pushed! Thanks to every thread participant for review and feedback.

Great! Should we close the commitfest entry? There is FX part of the
patch though. But it seems that nobody is happy with it. It could be
done with a separate patch anyway.

I've closed commitfest entry. I think we can add new commitfest entry if
required. Regarding FX part, it easy to extract it as separate patch, but
it's hard to find consensus. I think there are at least three possible
decisions.

1) Change FX mode to require separators to be the same.
2) Leave FX mode "as is".
3) Introduce GUC variable controlling behavior of FX mode.

Any thoughts?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#126

Tom Lane

tgl@sss.pgh.pa.us

over 7 years ago

In reply to: Alexander Korotkov (#125)

Re: [HACKERS] Bug in to_timestamp().

Alexander Korotkov <a.korotkov@postgrespro.ru> writes:

I've closed commitfest entry. I think we can add new commitfest entry if
required. Regarding FX part, it easy to extract it as separate patch, but
it's hard to find consensus. I think there are at least three possible
decisions.

1) Change FX mode to require separators to be the same.
2) Leave FX mode "as is".
3) Introduce GUC variable controlling behavior of FX mode.

Any thoughts?

A GUC variable is a horrid solution. Years ago we thought it'd be OK
to have GUCs that change query behavior, but we've regretted it every
time we did that, and often removed them again later (e.g. regex_flavor,
sql_inheritance). Applications that want to be portable have to contend
with all possible values of the GUC, and that's no fun for anybody.

Given the lack of consensus, it's hard to make a case for breaking
backwards compatibility, so I'd have to vote for option 2.

regards, tom lane

#127

Prabhat Sahu

prabhat.sahu@enterprisedb.com

over 7 years ago

In reply to: Alexander Korotkov (#123)

Re: [HACKERS] Bug in to_timestamp().

Hi All,

Few more findings on to_timestamp() test with HEAD.

postgres[3493]=# select to_timestamp('15-07-1984 23:30:32',' dd- mm- yyyy
hh24: mi: ss');
to_timestamp
---------------------------
1984-07-15 23:30:32+05:30
(1 row)

postgres[3493]=# select to_timestamp('15-07-*1984* 23:30:32','*9*dd-*9*mm-
*99*yyyy *9*hh24:*9*mi:*9*ss');
to_timestamp
------------------------------
*0084*-07-05 03:00:02+05:53:28
(1 row)

If there are spaces before any formate then output is fine(1st output) but
instead of spaces if we have *digit* then we are getting wrong output.

On Mon, Sep 10, 2018 at 12:23 AM Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:

On Thu, Sep 6, 2018 at 3:58 PM Alexander Korotkov <
a.korotkov@postgrespro.ru> wrote:
On Thu, Sep 6, 2018 at 2:40 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 7:28 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov <

a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 3:10 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:

From those results the question is how important is it to

force the following breakage on our users (i.e., introduce FX exact symbol
matching):
SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
-         to_timestamp
-------------------------------
- Sun Feb 16 00:00:00 1997 PST
-(1 row)
-
+ERROR:  unexpected character "/", expected character ":"
+HINT:  In FX mode, punctuation in the input string must
exactly match the format string.

There seemed to be some implicit approvals of this breakage

some 30 emails and 10 months ago but given that this is the only change
from a correct result to a failure I'd like to officially put it out there
for opinion/vote gathering. Mine is a -1; though keeping the distinction
between space and non-alphanumeric characters is expected.

Do I understand correctly that you're -1 to changes to FX mode,

but no

objection to changes in non-FX mode?

Ditto.

So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.

Yeah, that make sense to me, thank you.

OK! I've removed FX changes from the patch. The result is attached.
I'm going to commit this if no objections.

Attached revision fixes usage of two subsequent spaces in the
documentation.
So, pushed! Thanks to every thread participant for review and feedback.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

With Regards,

Prabhat Kumar Sahu
Skype ID: prabhat.sahu1984
EnterpriseDB Corporation

The Postgres Database Company

#128

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Prabhat Sahu (#127)

Re: [HACKERS] Bug in to_timestamp().

On Tue, Sep 18, 2018 at 2:08 PM Prabhat Sahu
<prabhat.sahu@enterprisedb.com> wrote:

Few more findings on to_timestamp() test with HEAD.

postgres[3493]=# select to_timestamp('15-07-1984 23:30:32',' dd- mm- yyyy hh24: mi: ss');
to_timestamp
---------------------------
1984-07-15 23:30:32+05:30
(1 row)

postgres[3493]=# select to_timestamp('15-07-1984 23:30:32','9dd-9mm-99yyyy 9hh24:9mi:9ss');
to_timestamp
------------------------------
0084-07-05 03:00:02+05:53:28
(1 row)

If there are spaces before any formate then output is fine(1st output) but instead of spaces if we have digit then we are getting wrong output.

This behavior might look strange, but it wasn't introduced by
cf9846724. to_timestamp() behaves so, because it takes digit have
NODE_TYPE_CHAR type. And for NODE_TYPE_CHAR we just "eat" single
character of input string regardless what is it.

But, I found related issue in cf9846724. Before it was:

# select to_timestamp('2018 01 01', 'YYYY9MM9DD');
to_timestamp
------------------------
2018-01-01 00:00:00+03
(1 row)

But after it becomes so.

# select to_timestamp('2018 01 01', 'YYYY9MM9DD');
ERROR: invalid value "1 " for "MM"
DETAIL: Field requires 2 characters, but only 1 could be parsed.
HINT: If your source string is not fixed-width, try using the "FM" modifier.

That happens because we've already skipped space "for free", and then
NODE_TYPE_CHAR eats digit. I've checked that Oracle doesn't allow
random charaters/digits to appear in format string.

select to_timestamp('2018 01 01', 'YYYY9MM9DD') from dual
ORA-01821: date format not recognized

So, Oracle compatibility isn't argument here. Therefore I'm going to
propose following fix for that: let NODE_TYPE_CHAR eat characters only
if we didn't skip input string characters more than it was in format
string. I'm sorry for vague explanation. I'll come up with patch
later, and it should be clear then.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#129

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#128)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Tue, Sep 18, 2018 at 4:42 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

But, I found related issue in cf9846724. Before it was:

# select to_timestamp('2018 01 01', 'YYYY9MM9DD');
to_timestamp
------------------------
2018-01-01 00:00:00+03
(1 row)

But after it becomes so.

# select to_timestamp('2018 01 01', 'YYYY9MM9DD');
ERROR: invalid value "1 " for "MM"
DETAIL: Field requires 2 characters, but only 1 could be parsed.
HINT: If your source string is not fixed-width, try using the "FM" modifier.

That happens because we've already skipped space "for free", and then
NODE_TYPE_CHAR eats digit. I've checked that Oracle doesn't allow
random charaters/digits to appear in format string.

select to_timestamp('2018 01 01', 'YYYY9MM9DD') from dual
ORA-01821: date format not recognized

So, Oracle compatibility isn't argument here. Therefore I'm going to
propose following fix for that: let NODE_TYPE_CHAR eat characters only
if we didn't skip input string characters more than it was in format
string. I'm sorry for vague explanation. I'll come up with patch
later, and it should be clear then.

Please find attached patch for fixing this issue. It makes handling
of format string text characters be similar to pre cf984672 behavior.
See the examples in regression tests and explanation in the commit
message. I'm going to commit this if no objections.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

to_timestamp_fix.patchapplication/octet-stream; name=to_timestamp_fix.patchDownload

commit eb75659624d76144ae6c6b304c9c2715e7ce722a
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Wed Sep 19 12:21:31 2018 +0300

    Fix handling of format string text characters in to_timestamp()/to_date()
    
    cf984672 introduced improvement of handling of spaces and separators in
    to_timestamp()/to_date() functions.  In particular, now we're skipping spaces
    both before and after fields.  That may cause format string text character to
    consume part of field in the situations, when it didn't happen before cf984672.
    This commit cause format string text character consume input string characters
    only when since previous field (or string beginning) number of skipped input
    string characters is not greater than number of format string characters
    (that is we didn't skip any extra characters in input string).

diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 2ed8ca675bd..5001af32f60 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -3061,9 +3061,24 @@ DCH_from_char(FormatNode *node, char *in, TmFromChar *out)
 			 * Text character, so consume one character from input string.
 			 * Notice we don't insist that the consumed character match the
 			 * format's character.
-			 * Text field ignores FX mode.
 			 */
-			s += pg_mblen(s);
+			if (!fx_mode)
+			{
+				/*
+				 * In non FX mode we might have skipped some extra characters
+				 * (more than specified in format string) before.  In this case
+				 * we don't skip input string character, because it might be
+				 * part of field.
+				 */
+				if (extra_skip > 0)
+					extra_skip--;
+				else
+					s += pg_mblen(s);
+			}
+			else
+			{
+				s += pg_mblen(s);
+			}
 			continue;
 		}
 
diff --git a/src/test/regress/expected/horology.out b/src/test/regress/expected/horology.out
index 7d11f251584..b00ab0f142c 100644
--- a/src/test/regress/expected/horology.out
+++ b/src/test/regress/expected/horology.out
@@ -3128,6 +3128,21 @@ SELECT to_date('2011   12 18', 'YYYY  MM DD');
  12-18-2011
 (1 row)
 
+SELECT to_date('2011 12 18', 'YYYYxMMxDD');
+  to_date   
+------------
+ 12-18-2011
+(1 row)
+
+SELECT to_date('2011x 12x 18', 'YYYYxMMxDD');
+  to_date   
+------------
+ 12-18-2011
+(1 row)
+
+SELECT to_date('2011 x12 x18', 'YYYYxMMxDD');
+ERROR:  invalid value "x1" for "MM"
+DETAIL:  Value must be an integer.
 --
 -- Check errors for some incorrect usages of to_timestamp() and to_date()
 --
diff --git a/src/test/regress/sql/horology.sql b/src/test/regress/sql/horology.sql
index 807037be76e..ca34e8753de 100644
--- a/src/test/regress/sql/horology.sql
+++ b/src/test/regress/sql/horology.sql
@@ -495,6 +495,10 @@ SELECT to_date('2011 12 18', 'YYYY  MM DD');
 SELECT to_date('2011  12 18', 'YYYY  MM DD');
 SELECT to_date('2011   12 18', 'YYYY  MM DD');
 
+SELECT to_date('2011 12 18', 'YYYYxMMxDD');
+SELECT to_date('2011x 12x 18', 'YYYYxMMxDD');
+SELECT to_date('2011 x12 x18', 'YYYYxMMxDD');
+
 --
 -- Check errors for some incorrect usages of to_timestamp() and to_date()
 --

#130

amul sul

sulamul@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#129)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Tue, Sep 18, 2018 at 4:42 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

But, I found related issue in cf9846724. Before it was:

# select to_timestamp('2018 01 01', 'YYYY9MM9DD');
to_timestamp
------------------------
2018-01-01 00:00:00+03
(1 row)

But after it becomes so.

# select to_timestamp('2018 01 01', 'YYYY9MM9DD');
ERROR: invalid value "1 " for "MM"
DETAIL: Field requires 2 characters, but only 1 could be parsed.
HINT: If your source string is not fixed-width, try using the "FM" modifier.

That happens because we've already skipped space "for free", and then
NODE_TYPE_CHAR eats digit. I've checked that Oracle doesn't allow
random charaters/digits to appear in format string.

select to_timestamp('2018 01 01', 'YYYY9MM9DD') from dual
ORA-01821: date format not recognized

So, Oracle compatibility isn't argument here. Therefore I'm going to
propose following fix for that: let NODE_TYPE_CHAR eat characters only
if we didn't skip input string characters more than it was in format
string. I'm sorry for vague explanation. I'll come up with patch
later, and it should be clear then.

Please find attached patch for fixing this issue. It makes handling
of format string text characters be similar to pre cf984672 behavior.
See the examples in regression tests and explanation in the commit
message. I'm going to commit this if no objections.

With this patch, to_date and to_timestamp behaving differently, see this:

edb=# SELECT to_date('18 12 2011', 'xDDxMMxYYYY');
to_date
--------------------
18-DEC-11 00:00:00
(1 row)

edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMxYYYY');
to_timestamp
---------------------------
08-DEC-11 00:00:00 -05:00 <=========== Incorrect output.
(1 row)

Regards,
Amul

#131

amul sul

sulamul@gmail.com

over 7 years ago

In reply to: amul sul (#130)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 19, 2018 at 3:51 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov

[...]

With this patch, to_date and to_timestamp behaving differently, see this:

edb=# SELECT to_date('18 12 2011', 'xDDxMMxYYYY');
to_date
--------------------
18-DEC-11 00:00:00
(1 row)

edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMxYYYY');
to_timestamp
---------------------------
08-DEC-11 00:00:00 -05:00 <=========== Incorrect output.
(1 row)

Sorry, this was wrong info -- with this patch, I had some mine trial changes.

Both to_date and to_timestamp behaving same with your patch -- the
wrong output, we are expecting that?

postgres =# SELECT to_date('18 12 2011', 'xDDxMMxYYYY');
to_date
------------
2011-12-08
(1 row)

postgres =# SELECT to_timestamp('18 12 2011', 'xDDxMMxYYYY');
to_timestamp
------------------------
2011-12-08 00:00:00-05
(1 row)

Regards,
Amul

#132

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: amul sul (#131)

Re: [HACKERS] Bug in to_timestamp().

On Wed, Sep 19, 2018 at 1:38 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 19, 2018 at 3:51 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov

[...]

With this patch, to_date and to_timestamp behaving differently, see this:

edb=# SELECT to_date('18 12 2011', 'xDDxMMxYYYY');
to_date
--------------------
18-DEC-11 00:00:00
(1 row)

edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMxYYYY');
to_timestamp
---------------------------
08-DEC-11 00:00:00 -05:00 <=========== Incorrect output.
(1 row)

Sorry, this was wrong info -- with this patch, I had some mine trial changes.

Both to_date and to_timestamp behaving same with your patch -- the
wrong output, we are expecting that?

postgres =# SELECT to_date('18 12 2011', 'xDDxMMxYYYY');
to_date
------------
2011-12-08
(1 row)
ma
postgres =# SELECT to_timestamp('18 12 2011', 'xDDxMMxYYYY');
to_timestamp
------------------------
2011-12-08 00:00:00-05
(1 row)

It's hard to understand whether it was expected, because it wasn't
properly documented. More important that it's the same behavior we
have before cf984672, and purpose of cf984672 was not to change this.

But from the code comments, it's intentional. If you put digits or
text characters into format string, you can skip non-separator input
string characters. For instance you may do.

# SELECT to_date('y18y12y2011', 'xDDxMMxYYYY');
to_date
------------
2011-12-18
(1 row)

It's even more interesting that letters and digits are handled in
different manner.

# SELECT to_date('01801202011', 'xDDxMMxYYYY');
ERROR: date/time field value out of range: "01801202011"
Time: 0,453 ms

# SELECT to_date('01801202011', '9DD9MM9YYYY');
to_date
------------
2011-12-18
(1 row)

So, letters in format string doesn't allow you to extract fields at
particular positions of digit sequence, but digits in format string
allows you to. That's rather undocumented, but from the code you can
get that it's intentional. Thus, I think it would be nice to improve
the documentation here. But I still propose to commit the patch I
propose to bring back unintentional behavior change in cf984672.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#133

amul sul

sulamul@gmail.com

over 7 years ago

In reply to: Alexander Korotkov (#132)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Sep 20, 2018, 3:22 AM Alexander Korotkov <a.korotkov@postgrespro.ru>
wrote:

On Wed, Sep 19, 2018 at 1:38 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 19, 2018 at 3:51 PM amul sul <sulamul@gmail.com> wrote:

On Wed, Sep 19, 2018 at 2:57 PM Alexander Korotkov

[...]

With this patch, to_date and to_timestamp behaving differently, see

this:

edb=# SELECT to_date('18 12 2011', 'xDDxMMxYYYY');
to_date
--------------------
18-DEC-11 00:00:00
(1 row)

edb=# SELECT to_timestamp('18 12 2011', 'xDDxMMxYYYY');
to_timestamp
---------------------------
08-DEC-11 00:00:00 -05:00 <=========== Incorrect output.
(1 row)

Sorry, this was wrong info -- with this patch, I had some mine trial

changes.

Both to_date and to_timestamp behaving same with your patch -- the
wrong output, we are expecting that?

postgres =# SELECT to_date('18 12 2011', 'xDDxMMxYYYY');
to_date
------------
2011-12-08
(1 row)
ma
postgres =# SELECT to_timestamp('18 12 2011', 'xDDxMMxYYYY');
to_timestamp
------------------------
2011-12-08 00:00:00-05
(1 row)

It's hard to understand whether it was expected, because it wasn't
properly documented. More important that it's the same behavior we
have before cf984672, and purpose of cf984672 was not to change this.

But from the code comments, it's intentional. If you put digits or
text characters into format string, you can skip non-separator input
string characters. For instance you may do.

# SELECT to_date('y18y12y2011', 'xDDxMMxYYYY');
to_date
------------
2011-12-18
(1 row)

It's even more interesting that letters and digits are handled in
different manner.

# SELECT to_date('01801202011', 'xDDxMMxYYYY');
ERROR: date/time field value out of range: "01801202011"
Time: 0,453 ms

# SELECT to_date('01801202011', '9DD9MM9YYYY');
to_date
------------
2011-12-18
(1 row)

So, letters in format string doesn't allow you to extract fields at
particular positions of digit sequence, but digits in format string
allows you to. That's rather undocumented, but from the code you can
get that it's intentional. Thus, I think it would be nice to improve
the documentation here. But I still propose to commit the patch I
propose to bring back unintentional behavior change in cf984672.

Agreed, thanks for working on this.

Regards,
Amul

Show quoted text

#134

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: amul sul (#133)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Sep 20, 2018 at 6:09 AM amul sul <sulamul@gmail.com> wrote:

On Thu, Sep 20, 2018, 3:22 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

It's hard to understand whether it was expected, because it wasn't
properly documented. More important that it's the same behavior we
have before cf984672, and purpose of cf984672 was not to change this.

But from the code comments, it's intentional. If you put digits or
text characters into format string, you can skip non-separator input
string characters. For instance you may do.

# SELECT to_date('y18y12y2011', 'xDDxMMxYYYY');
to_date
------------
2011-12-18
(1 row)

It's even more interesting that letters and digits are handled in
different manner.

# SELECT to_date('01801202011', 'xDDxMMxYYYY');
ERROR: date/time field value out of range: "01801202011"
Time: 0,453 ms

# SELECT to_date('01801202011', '9DD9MM9YYYY');
to_date
------------
2011-12-18
(1 row)

So, letters in format string doesn't allow you to extract fields at
particular positions of digit sequence, but digits in format string
allows you to. That's rather undocumented, but from the code you can
get that it's intentional. Thus, I think it would be nice to improve
the documentation here. But I still propose to commit the patch I
propose to bring back unintentional behavior change in cf984672.

Agreed, thanks for working on this.

Pushed, thanks.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

#135

Alexander Korotkov

a.korotkov@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#134)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On Thu, Sep 20, 2018 at 3:52 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Thu, Sep 20, 2018 at 6:09 AM amul sul <sulamul@gmail.com> wrote:

Agreed, thanks for working on this.

Pushed, thanks.

Please, find attached patch improving documentation about
letters/digits in to_date()/to_timestamp() template string. I think
it needs review from native English speaker.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

to_timestamp_letters_digits.patchapplication/octet-stream; name=to_timestamp_letters_digits.patchDownload

commit c716d5ccdf9e768d7ab9dc100fff43bf1ea06c6c
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date:   Sat Sep 22 21:51:44 2018 +0300

    Documentation about letters/digits in to_date()/to_timestamp()

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c44417d868d..dce81eecf89 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6296,6 +6296,35 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
       </para>
      </listitem>
 
+     <listitem>
+      <para>
+        Template string can contain letters/digits, which are not part of some
+        template pattern.  Such template string letters/digits mach any
+        characters in input string including letters and digits.
+        For example, <literal>to_timestamp('2000yJUN', 'YYYYxMON')</literal>
+        works.
+      </para>
+      <para>
+        Without the <literal>FX</literal> option, spaces are skipped in the
+        beginning of the input string and around date and time values.  In order
+        to prevent template string letters/digits from consuming data and time
+        values characters followed by spaces, they do consume character only
+        when number of characters after previous value (or beginning of the
+        string) in input string is less or equal than number of such
+        characters in template string.  For example,
+        <literal>to_timestamp('2000y JUN', 'YYYYxMON')</literal> works, but
+        <literal>to_timestamp('2000 yJUN', 'YYYYxMON')</literal> returns an
+        error.
+      </para>
+      <para>
+        In spite of letters, digits in template string can prevent previous
+        template pattern from being greedy.  For example,
+        <literal>to_timestamp('2000906901', 'YYYY0MM0DD')</literal> works, but
+        <literal>to_timestamp('2000906901', 'YYYYxMMxDD')</literal> returns an
+        error.
+      </para>
+     </listitem>
+
      <listitem>
       <para>
        <literal>TZH</literal> template pattern can match a signed number.

#136

Liudmila Mantrova

l.mantrova@postgrespro.ru

over 7 years ago

In reply to: Alexander Korotkov (#135)

1 attachment(s)

Re: [HACKERS] Bug in to_timestamp().

On 09/22/2018 10:00 PM, Alexander Korotkov wrote:

On Thu, Sep 20, 2018 at 3:52 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Thu, Sep 20, 2018 at 6:09 AM amul sul <sulamul@gmail.com> wrote:

Agreed, thanks for working on this.

Pushed, thanks.

Please, find attached patch improving documentation about
letters/digits in to_date()/to_timestamp() template string. I think
it needs review from native English speaker.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Hi Alexander,

I'm not a native speaker, but let me try to help. A new doc version is
attached.

Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachments:

to_timestamp_letters_digits-0001.patchtext/x-patch; name=to_timestamp_letters_digits-0001.patchDownload

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9a7f683..1532bcc 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -6286,13 +6286,46 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
        exceeds the number of separators in the template.
       </para>
       <para>
-       If <literal>FX</literal> is specified, separator in template string
-       matches to exactly one character in input string.  Notice we don't insist
-       input string character to be the same as template string separator.
+       If <literal>FX</literal> is specified, a separator in the template
+       string matches exactly one character in the input string.  The
+       input string character does not need to be the same as the template
+       string separator.
        For example, <literal>to_timestamp('2000/JUN', 'FXYYYY MON')</literal>
        works, but <literal>to_timestamp('2000/JUN', 'FXYYYY&nbsp;&nbsp;MON')</literal>
-       returns an error because a space second template string space consumed
-       letter <literal>J</literal> from the input string.
+       returns an error because the second space in the template string
+       consumes letter <literal>J</literal> of the input string.
+      </para>
+     </listitem>
+
+     <listitem>
+      <para>
+        Template strings of <function>to_timestamp</function> and
+        <function>to_date</function> functions can also contain arbitrary
+        letters/digits between patterns.  Such letters/digits can match
+        any characters in the input string.  For example,
+        <literal>to_timestamp('2000yJUN', 'YYYYxMON')</literal> works.
+      </para>
+      <para>
+        Letters/digits consume an input string character only if the number
+        of extra characters at the beginning of the input string or between
+        the identified date/time values is less than or equal to the number
+        of the corresponding characters in the template string. This ensures
+        that the template string does not consume any characters of date/time
+        values when used without the <literal>FX</literal> option, even if
+        a letter/digit separator in the input string appears after a space.
+        For example, <literal>to_timestamp('2000y JUN', 'YYYYxMON')</literal>
+        works, but <literal>to_timestamp('2000 yJUN', 'YYYYxMON')</literal>
+        returns an error.
+      </para>
+      <para>
+        Note that if the template string contains an arbitrary letter,
+        the pattern that precedes this letter becomes greedy and tries
+        to match as many characters as possible. For example,
+        <literal>to_timestamp('2000906901', 'YYYYxMMxDD')</literal>
+        fails because the <literal>YYYY</literal> pattern matches
+        the whole input string instead of the first four characters.
+        Patterns separated by digits are non-greedy, so
+        <literal>to_timestamp('2000906901', 'YYYY0MM0DD')</literal> works fine.
       </para>
      </listitem>