to_date_valid()

Started by Andreas 'ads' Scherbaumover 9 years ago37 messages
#1Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
1 attachment(s)

Hello,

we have customers complaining that to_date() accepts invalid dates, and
returns a different date instead. This is a known issue:

http://sql-info.de/postgresql/notes/to_date-to_timestamp-gotchas.html

On the other hand this leads to wrong dates when loading dates into the
database, because the database happily accepts invalid dates and ends up
writing something completely different into the table.

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

Attachments:

to_date_valid.patch.gzapplication/gzip; name=to_date_valid.patch.gzDownload
#2Euler Taveira
euler@timbira.com.br
In reply to: Andreas 'ads' Scherbaum (#1)
Re: to_date_valid()

On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.

--
Euler Taveira Timbira - http://www.timbira.com.br/
PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jaime Casanova
jaime.casanova@2ndquadrant.com
In reply to: Euler Taveira (#2)
Re: to_date_valid()

El 2/7/2016 20:33, "Euler Taveira" <euler@timbira.com.br> escribió:

On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.

Shouldn't we fix this instead? Sounds like a bug to me. We don't usually
want to be bug compatible so it doesn't matter if we break something.

--
Jaime Casanova http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Jaime Casanova (#3)
Re: to_date_valid()

On 03.07.2016 07:05, Jaime Casanova wrote:

El 2/7/2016 20:33, "Euler Taveira" <euler@timbira.com.br
<mailto:euler@timbira.com.br>> escribió:

On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.

Shouldn't we fix this instead? Sounds like a bug to me. We don't usually
want to be bug compatible so it doesn't matter if we break something.

There are previous discussions about such a change, and this was rejected:

/messages/by-id/lbjf1v$a2v$1@ger.gmane.org
/messages/by-id/A737B7A37273E048B164557ADEF4A58B17C9140E@ntex2010i.host.magwien.gv.at

Hence the new function, which does not collide with the existing
implementation.

Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Euler Taveira (#2)
Re: to_date_valid()

On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br> wrote:

On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.

because

SELECT to_date('blah', 'pattern', true)

is less clear to read than

SELECT to_date_valid('blah', 'pattern')

and offers no advantage. It's likely faster to use a separate function too.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#6Pavel Stehule
pavel.stehule@gmail.com
In reply to: Craig Ringer (#5)
Re: to_date_valid()

2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br> wrote:

On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.

because

SELECT to_date('blah', 'pattern', true)

is less clear to read than

SELECT to_date_valid('blah', 'pattern')

and offers no advantage. It's likely faster to use a separate function too.

personally I prefer first variant - this is same function with stronger
check.

The name to_date_valid sounds little bit strange - maybe to_date_strict
should be better.

Regards

Pavel

Show quoted text

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#7Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Pavel Stehule (#6)
Re: to_date_valid()

On 04/07/16 15:19, Pavel Stehule wrote:

2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com
<mailto:craig@2ndquadrant.com>>:

On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br
<mailto:euler@timbira.com.br>> wrote:

On 02-07-2016 22 <tel:02-07-2016%2022>:04, Andreas 'ads'
Scherbaum wrote:

The attached patch adds a new function "to_date_valid()"

which will

validate the date and return an error if the input and

output date do

not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could
default to
false to not break compatibility.

because

SELECT to_date('blah', 'pattern', true)

is less clear to read than

SELECT to_date_valid('blah', 'pattern')

and offers no advantage. It's likely faster to use a separate
function too.

personally I prefer first variant - this is same function with
stronger check.

The name to_date_valid sounds little bit strange - maybe
to_date_strict should be better.

Regards

Pavel

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Yeah, my feeling too, is that 'to_date_strict' would be better!

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Pavel Stehule
pavel.stehule@gmail.com
In reply to: Pavel Stehule (#6)
Re: to_date_valid()

2016-07-04 5:19 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:

2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:

On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br> wrote:

On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.

because

SELECT to_date('blah', 'pattern', true)

is less clear to read than

SELECT to_date_valid('blah', 'pattern')

and offers no advantage. It's likely faster to use a separate function
too.

personally I prefer first variant - this is same function with stronger
check.

Currently probably we have not two similar function - one fault tolerant
and second stricter. There is only one example of similar behave -
parse_ident with "strict" option.

The three parameters are ok still - so I don't see a reason why we have to
implement new function. If you need to emphasize the fact so behave should
be strict, you can use named parameters

select to_date('blah', 'patter', strict => true)

Regards

Pavel

Show quoted text

The name to_date_valid sounds little bit strange - maybe to_date_strict
should be better.

Regards

Pavel

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#9Amit Kapila
amit.kapila16@gmail.com
In reply to: Andreas 'ads' Scherbaum (#1)
Re: to_date_valid()

On Sun, Jul 3, 2016 at 6:34 AM, Andreas 'ads' Scherbaum
<adsmail@wars-nicht.de> wrote:

Hello,

we have customers complaining that to_date() accepts invalid dates, and
returns a different date instead. This is a known issue:

http://sql-info.de/postgresql/notes/to_date-to_timestamp-gotchas.html

On the other hand this leads to wrong dates when loading dates into the
database, because the database happily accepts invalid dates and ends up
writing something completely different into the table.

The attached patch adds a new function "to_date_valid()" which will validate
the date and return an error if the input and output date do not match.
Tests included, documentation update as well.

It seems that you are calling many additional function calls
(date_out, timestamp_in, etc.) to validate the date. Won't the
additional function calls make to_date much costlier than its current
implementation? I don't know if there is a better way, but I think it
is worth to consider, if we can find a cheaper way to detect validity
of date.

Note - Your patch is small (~13KB) enough that it doesn't need to zipped.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Amit Kapila (#9)
Re: to_date_valid()

On 04.07.2016 16:33, Amit Kapila wrote:

On Sun, Jul 3, 2016 at 6:34 AM, Andreas 'ads' Scherbaum
<adsmail@wars-nicht.de> wrote:

Hello,

we have customers complaining that to_date() accepts invalid dates, and
returns a different date instead. This is a known issue:

http://sql-info.de/postgresql/notes/to_date-to_timestamp-gotchas.html

On the other hand this leads to wrong dates when loading dates into the
database, because the database happily accepts invalid dates and ends up
writing something completely different into the table.

The attached patch adds a new function "to_date_valid()" which will validate
the date and return an error if the input and output date do not match.
Tests included, documentation update as well.

It seems that you are calling many additional function calls
(date_out, timestamp_in, etc.) to validate the date. Won't the
additional function calls make to_date much costlier than its current
implementation? I don't know if there is a better way, but I think it
is worth to consider, if we can find a cheaper way to detect validity
of date.

It certainly is costlier, and I'm open to suggestions how to improve the
performance. That is one of the reasons why I considered a separate
function, instead of adding this to to_date() and add even more overhead
there.

Note - Your patch is small (~13KB) enough that it doesn't need to zipped.

It's not about the small size, it's about websites like Gmail which
mingle up the linebreaks and then git fails. Ran into this problem on
the pgAdminIII list a while ago, ever since I just zip it and avoid the
trouble.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Pavel Stehule (#8)
Re: to_date_valid()

On 04.07.2016 05:51, Pavel Stehule wrote:

2016-07-04 5:19 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>>:

2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com
<mailto:craig@2ndquadrant.com>>:

On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br
<mailto:euler@timbira.com.br>> wrote:

On 02-07-2016 22 <tel:02-07-2016%2022>:04, Andreas 'ads'
Scherbaum wrote:

The attached patch adds a new function "to_date_valid()" which will
validate the date and return an error if the input and output date do
not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true |
false)
instead of creating another function? The new parameter
could default to
false to not break compatibility.

because

SELECT to_date('blah', 'pattern', true)

is less clear to read than

SELECT to_date_valid('blah', 'pattern')

and offers no advantage. It's likely faster to use a separate
function too.

personally I prefer first variant - this is same function with
stronger check.

Currently probably we have not two similar function - one fault
tolerant and second stricter. There is only one example of similar
behave - parse_ident with "strict" option.

The three parameters are ok still - so I don't see a reason why we have
to implement new function. If you need to emphasize the fact so behave
should be strict, you can use named parameters

select to_date('blah', 'patter', strict => true)

The new function is not "strict", it just adds a validation step:

postgres=# select to_date_valid(NULL, NULL);
to_date_valid
---------------

(1 row)

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andreas 'ads' Scherbaum (#11)
Re: to_date_valid()

2016-07-04 18:24 GMT+02:00 Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>:

On 04.07.2016 05:51, Pavel Stehule wrote:

2016-07-04 5:19 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>>:

2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com
<mailto:craig@2ndquadrant.com>>:

On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br
<mailto:euler@timbira.com.br>> wrote:

On 02-07-2016 22 <tel:02-07-2016%2022>:04, Andreas 'ads'
Scherbaum wrote:

The attached patch adds a new function "to_date_valid()"

which will

validate the date and return an error if the input and

output date do

not match. Tests included, documentation update as well.

Why don't you add a third parameter (say, validate = true |
false)
instead of creating another function? The new parameter
could default to
false to not break compatibility.

because

SELECT to_date('blah', 'pattern', true)

is less clear to read than

SELECT to_date_valid('blah', 'pattern')

and offers no advantage. It's likely faster to use a separate
function too.

personally I prefer first variant - this is same function with
stronger check.

Currently probably we have not two similar function - one fault
tolerant and second stricter. There is only one example of similar
behave - parse_ident with "strict" option.

The three parameters are ok still - so I don't see a reason why we have
to implement new function. If you need to emphasize the fact so behave
should be strict, you can use named parameters

select to_date('blah', 'patter', strict => true)

The new function is not "strict", it just adds a validation step:

I understand - I know, so this has zero relation to function flag STRICT

I don't know if the name "strict" is best, but the name "validate" is not
good too. Current to_date does some validations too.

Regards

Pavel

Show quoted text

postgres=# select to_date_valid(NULL, NULL);
to_date_valid
---------------

(1 row)

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

#13Andreas Karlsson
andreas@proxel.se
In reply to: Andreas 'ads' Scherbaum (#4)
Re: to_date_valid()

On 07/03/2016 12:36 PM, Andreas 'ads' Scherbaum wrote:

On 03.07.2016 07:05, Jaime Casanova wrote:

Shouldn't we fix this instead? Sounds like a bug to me. We don't usually
want to be bug compatible so it doesn't matter if we break something.

There are previous discussions about such a change, and this was rejected:

/messages/by-id/lbjf1v$a2v$1@ger.gmane.org
/messages/by-id/A737B7A37273E048B164557ADEF4A58B17C9140E@ntex2010i.host.magwien.gv.at

Hence the new function, which does not collide with the existing
implementation.

I do not see a clear conclusion in the linked threads. For example Bruce
calls it a bug in one of the emails
(/messages/by-id/201107200103.p6K13ix10517@momjian.us).

I think we should fix to_date() to throw an error. Personally I would be
happy if my code broke due to this kind of change since the exception
would reveal an old bug which has been there a long time eating my data.
I cannot see a case where I would have wanted the current behavior.

If there is any legitimate use for the current behavior then we can add
it back as another function.

Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andreas Karlsson (#13)
Re: to_date_valid()

2016-07-04 22:15 GMT+02:00 Andreas Karlsson <andreas@proxel.se>:

On 07/03/2016 12:36 PM, Andreas 'ads' Scherbaum wrote:

On 03.07.2016 07:05, Jaime Casanova wrote:

Shouldn't we fix this instead? Sounds like a bug to me. We don't usually
want to be bug compatible so it doesn't matter if we break something.

There are previous discussions about such a change, and this was rejected:

/messages/by-id/lbjf1v$a2v$1@ger.gmane.org

/messages/by-id/A737B7A37273E048B164557ADEF4A58B17C9140E@ntex2010i.host.magwien.gv.at

Hence the new function, which does not collide with the existing
implementation.

I do not see a clear conclusion in the linked threads. For example Bruce
calls it a bug in one of the emails (
/messages/by-id/201107200103.p6K13ix10517@momjian.us
).

I think we should fix to_date() to throw an error. Personally I would be
happy if my code broke due to this kind of change since the exception would
reveal an old bug which has been there a long time eating my data. I cannot
see a case where I would have wanted the current behavior.

If I remember, this implementation is based on Oracle's behave. It is
pretty old and documented - so it is hard to speak about it like the bug. I
understand, so the behave is strange, but it was designed in different
time. You can enter not 100% valid string, but you get correct date

postgres=# select to_date('2016-12-40','YYYY-MM-DD');

┌────────────┐
│ to_date │
╞════════════╡
│ 2017-01-09 │
└────────────┘
(1 row)

It can be used for some date calculations. In old age the applications was
designed in this style. I am against to change of default behave. We can
introduce new new different function with different name with better
designed format and rules, or we can add new options to this function, or
we can live with current state.

Now, to_date function should not be used - functions make_date,
make_timestamp are faster and safe.

postgres=# select make_date(2017,01,40);
ERROR: date field value out of range: 2017-01-40

Regards

Pavel

Show quoted text

If there is any legitimate use for the current behavior then we can add it
back as another function.

Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Andreas Karlsson
andreas@proxel.se
In reply to: Pavel Stehule (#14)
Re: to_date_valid()

On 07/04/2016 10:55 PM, Pavel Stehule wrote:

2016-07-04 22:15 GMT+02:00 Andreas Karlsson <andreas@proxel.se
<mailto:andreas@proxel.se>>:
I do not see a clear conclusion in the linked threads. For example
Bruce calls it a bug in one of the emails
(/messages/by-id/201107200103.p6K13ix10517@momjian.us).

I think we should fix to_date() to throw an error. Personally I
would be happy if my code broke due to this kind of change since the
exception would reveal an old bug which has been there a long time
eating my data. I cannot see a case where I would have wanted the
current behavior.

If I remember, this implementation is based on Oracle's behave.

In the thread I linked above they claim that Oracle (at least 10g) does
not work like this.

Thomas Kellerer wrote:

Oracle throws an error for the above example:

SQL> select to_date('20110231', 'YYYYMMDD') from dual;
select to_date('20110231', 'YYYYMMDD') from dual
*
ERROR at line 1:
ORA-01839: date not valid for month specified

I do not have access to an Oracle installation so I cannot confirm this
myself.

It is
pretty old and documented - so it is hard to speak about it like the
bug. I understand, so the behave is strange, but it was designed in
different time. You can enter not 100% valid string, but you get correct
date

postgres=# select to_date('2016-12-40','YYYY-MM-DD');

┌────────────┐
│ to_date │
╞════════════╡
│ 2017-01-09 │
└────────────┘
(1 row)

It can be used for some date calculations. In old age the applications
was designed in this style. I am against to change of default behave. We
can introduce new new different function with different name with better
designed format and rules, or we can add new options to this function,
or we can live with current state.

While clever, I think this behavior is a violation of the principle of
least surprise. It is not obvious to someone reading a query that
to_date() would behave like this. Especially when Oracle's to_date()
works differently.

Now, to_date function should not be used - functions make_date,
make_timestamp are faster and safe.

Yeah, I personally know of this behavior and therefore would never use
to_date(), but I am far from the average PostgreSQL user.

Andreas

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Pavel Stehule (#12)
Re: to_date_valid()

On 04.07.2016 18:37, Pavel Stehule wrote:

I don't know if the name "strict" is best, but the name "validate" is
not good too. Current to_date does some validations too.

Obviously not enough, because it allows invalid dates. I'd say that the
current to_date() merely validates the input format for string parsing,
and that the date is in range. But there is not much validation on the
date itself.

So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you offer?

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17David G. Johnston
david.g.johnston@gmail.com
In reply to: Andreas 'ads' Scherbaum (#16)
Re: to_date_valid()

On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum <
adsmail@wars-nicht.de> wrote:

On 04.07.2016 18:37, Pavel Stehule wrote:

I don't know if the name "strict" is best, but the name "validate" is
not good too. Current to_date does some validations too.

Obviously not enough, because it allows invalid dates. I'd say that the
current to_date() merely validates the input format for string parsing, and
that the date is in range. But there is not much validation on the date
itself.

So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you offer?

​We don't have to change the name...we could do something like how
RegularExpressions work - like (?i) - and just add a new modifier ​code.

​'~YYYY-MI-DD' --that's a leading tilde, could be anything - even something
like "HMYYYY-MI-DD" for "historical mode"

​Also, see this thread of a few weeks ago for related material:

/messages/by-id/1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com

It seems that fixing it is back on the table, possibly even for 9.6 since
this is such a hideous bug - one that closely resembles a cockroach ;)

WRT to the 2014 "reject out-of-range dates" thread, I'm kinda surprised
that we didn't just set the date to be the minimum or maximum allowable
date back in 9.2 instead of rejecting it...

I'd be more inclined to add another argument if it was basically an enum.

to_date(text, format, format_style)

This at least opens the door for future enhancements that are associated
with named styles. I could imagine an "exact" format that also allows for
something like regexp character classes so that one can write:
"YYYY[:SEP:]MM[:SEP:]DD" to keep exact matches but accommodate variations
on what people type as a separator e.g. [.-/]

format_style=>historical would invoke today's behaviors

David J.

#18Pavel Stehule
pavel.stehule@gmail.com
In reply to: David G. Johnston (#17)
Re: to_date_valid()

2016-07-05 4:33 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:

On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum <
adsmail@wars-nicht.de> wrote:

On 04.07.2016 18:37, Pavel Stehule wrote:

I don't know if the name "strict" is best, but the name "validate" is
not good too. Current to_date does some validations too.

Obviously not enough, because it allows invalid dates. I'd say that the
current to_date() merely validates the input format for string parsing, and
that the date is in range. But there is not much validation on the date
itself.

So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you offer?

​We don't have to change the name...we could do something like how
RegularExpressions work - like (?i) - and just add a new modifier ​code.

​'~YYYY-MI-DD' --that's a leading tilde, could be anything - even
something like "HMYYYY-MI-DD" for "historical mode"

​Also, see this thread of a few weeks ago for related material:

/messages/by-id/1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com

It seems that fixing it is back on the table, possibly even for 9.6 since
this is such a hideous bug - one that closely resembles a cockroach ;)

WRT to the 2014 "reject out-of-range dates" thread, I'm kinda surprised
that we didn't just set the date to be the minimum or maximum allowable
date back in 9.2 instead of rejecting it...

I'd be more inclined to add another argument if it was basically an enum.

to_date(text, format, format_style)

This at least opens the door for future enhancements that are associated
with named styles. I could imagine an "exact" format that also allows for
something like regexp character classes so that one can write:
"YYYY[:SEP:]MM[:SEP:]DD" to keep exact matches but accommodate variations
on what people type as a separator e.g. [.-/]

format_style=>historical would invoke today's behaviors

this is compatibility break - the question is "can we break it?"

Pavel

Show quoted text

David J.

#19Pavel Stehule
pavel.stehule@gmail.com
In reply to: Andreas 'ads' Scherbaum (#16)
Re: to_date_valid()

2016-07-05 2:39 GMT+02:00 Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>:

On 04.07.2016 18:37, Pavel Stehule wrote:

I don't know if the name "strict" is best, but the name "validate" is
not good too. Current to_date does some validations too.

Obviously not enough, because it allows invalid dates. I'd say that the
current to_date() merely validates the input format for string parsing, and
that the date is in range. But there is not much validation on the date
itself.

So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you offer?

I have not - so third option looks best for me - it can be long name
"only_correct_date", "only_valid_date", "only_valid_date_on_input" ...

Pavel

Show quoted text

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

#20Albe Laurenz
laurenz.albe@wien.gv.at
In reply to: Andreas Karlsson (#15)
Re: to_date_valid()

Andreas Karlsson wrote:

On 07/04/2016 10:55 PM, Pavel Stehule wrote:

2016-07-04 22:15 GMT+02:00 Andreas Karlsson wrote:

I do not see a clear conclusion in the linked threads. For example
Bruce calls it a bug in one of the emails
(/messages/by-id/201107200103.p6K13ix10517@momjian.us).

I think we should fix to_date() to throw an error. Personally I
would be happy if my code broke due to this kind of change since the
exception would reveal an old bug which has been there a long time
eating my data. I cannot see a case where I would have wanted the
current behavior.

If I remember, this implementation is based on Oracle's behave.

In the thread I linked above they claim that Oracle (at least 10g) does
not work like this.

[...]

I do not have access to an Oracle installation so I cannot confirm this
myself.

Oracle 12.1:

SQL> SELECT to_date('2016-12-40','YYYY-MM-DD') FROM dual;
SELECT to_date('2016-12-40','YYYY-MM-DD') FROM dual
*
ERROR at line 1:
ORA-01847: day of month must be between 1 and last day of month

SQL> SELECT to_date('2017-02-29','YYYY-MM-DD') FROM dual;
SELECT to_date('2017-02-29','YYYY-MM-DD') FROM dual
*
ERROR at line 1:
ORA-01839: date not valid for month specified

So no, compatibility with Oracle is certainly not the reason
to leave it as it is.

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.

Yours,
Laurenz Albe

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Pavel Stehule (#19)
Re: to_date_valid()

On 05.07.2016 06:05, Pavel Stehule wrote:

2016-07-05 2:39 GMT+02:00 Andreas 'ads' Scherbaum <adsmail@wars-nicht.de
<mailto:adsmail@wars-nicht.de>>:

On 04.07.2016 18:37, Pavel Stehule wrote:

I don't know if the name "strict" is best, but the name
"validate" is
not good too. Current to_date does some validations too.

Obviously not enough, because it allows invalid dates. I'd say that
the current to_date() merely validates the input format for string
parsing, and that the date is in range. But there is not much
validation on the date itself.

So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you offer?

I have not - so third option looks best for me - it can be long name
"only_correct_date", "only_valid_date", "only_valid_date_on_input" ...

Then you don't have "to_date" in the function name, but still use
"valid" in the name. How is that useful to remember the function? Where
"to_date_valid" already gives you the idea that it is "to_date" with an
additional "valid"ator.

Don't make it overly complicated.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: David G. Johnston (#17)
Re: to_date_valid()

On 05.07.2016 04:33, David G. Johnston wrote:

On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum
<adsmail@wars-nicht.de <mailto:adsmail@wars-nicht.de>>wrote:

On 04.07.2016 18:37, Pavel Stehule wrote:

I don't know if the name "strict" is best, but the name
"validate" is
not good too. Current to_date does some validations too.

Obviously not enough, because it allows invalid dates. I'd say that
the current to_date() merely validates the input format for string
parsing, and that the date is in range. But there is not much
validation on the date itself.

So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you offer?

​We don't have to change the name...we could do something like how
RegularExpressions work - like (?i) - and just add a new modifier ​code.

​'~YYYY-MI-DD' --that's a leading tilde, could be anything - even
something like "HMYYYY-MI-DD" for "historical mode"

Where to_timestamp() already uses HH for the hour? If you add another
"H", that surely is confusing.

It seems that fixing it is back on the table, possibly even for 9.6
since this is such a hideous bug - one that closely resembles a cockroach ;)

9.6 is already in Beta, people are testing their applications against
it. This would be a huge break, plus an API change - something you don't
add in a Beta.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#23David G. Johnston
david.g.johnston@gmail.com
In reply to: Andreas 'ads' Scherbaum (#22)
Re: to_date_valid()

On Tue, Jul 5, 2016 at 5:22 AM, Andreas 'ads' Scherbaum <
adsmail@wars-nicht.de> wrote:

On 05.07.2016 04:33, David G. Johnston wrote:

On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum
<adsmail@wars-nicht.de <mailto:adsmail@wars-nicht.de>>wrote:

On 04.07.2016 18:37, Pavel Stehule wrote:

I don't know if the name "strict" is best, but the name
"validate" is
not good too. Current to_date does some validations too.

Obviously not enough, because it allows invalid dates. I'd say that
the current to_date() merely validates the input format for string
parsing, and that the date is in range. But there is not much
validation on the date itself.

So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you
offer?

​We don't have to change the name...we could do something like how
RegularExpressions work - like (?i) - and just add a new modifier ​code.

​'~YYYY-MI-DD' --that's a leading tilde, could be anything - even
something like "HMYYYY-MI-DD" for "historical mode"

Where to_timestamp() already uses HH for the hour? If you add another "H",
that surely is confusing.

​I don't really try to pick final names for things until the idea has taken
hold...

It seems that fixing it is back on the table, possibly even for 9.6

since this is such a hideous bug - one that closely resembles a cockroach
;)

9.6 is already in Beta, people are testing their applications against it.
This would be a huge break, plus an API change - something you don't add in
a Beta.

​Surely these beta testers would test against the RC before putting it into
production so I don't see an issue. I tend to agree generally but the
point of beta is to find bugs and solicit suggestions for improvement to
features. Having found a bug it doesn't make sense to avoid patching our
current unstable release. This applies moreso because of our annual
release cycle. On the topic of whether this becomes an exception to the
rule I'm largely ambivalent.

David J.

#24Bruce Momjian
bruce@momjian.us
In reply to: David G. Johnston (#23)
Re: to_date_valid()

On Tue, Jul 5, 2016 at 07:41:15AM -0400, David G. Johnston wrote:

​Surely these beta testers would test against the RC before putting it into
production so I don't see an issue.  I tend to agree generally but the point of
beta is to find bugs and solicit suggestions for improvement to features. 
Having found a bug it doesn't make sense to avoid patching our current unstable
release.  This applies moreso because of our annual release cycle.  On the
topic of whether this becomes an exception to the rule I'm largely ambivalent.

We don't assume users re-test everything when the RC1 arrives --- we
assume the testing is cumulative from when we started beta.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Albe Laurenz (#20)
Re: to_date_valid()

On 7/5/16 4:24 AM, Albe Laurenz wrote:

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.

What is the use of the existing behavior? You get back an arbitrary
implementation dependent value. We don't even guarantee what the value
will be. If we changed it to return a different implementation
dependent value, would users get upset?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26Joshua D. Drake
jd@commandprompt.com
In reply to: Peter Eisentraut (#25)
Re: to_date_valid()

On 07/26/2016 06:25 PM, Peter Eisentraut wrote:

On 7/5/16 4:24 AM, Albe Laurenz wrote:

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.

What is the use of the existing behavior? You get back an arbitrary
implementation dependent value. We don't even guarantee what the value
will be. If we changed it to return a different implementation
dependent value, would users get upset?

No they would not get upset because they wouldn't know.

Can we just do the right thing?

JD

--
Command Prompt, Inc. http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Joshua D. Drake (#26)
Re: to_date_valid()

On 27.07.2016 05:00, Joshua D. Drake wrote:

On 07/26/2016 06:25 PM, Peter Eisentraut wrote:

On 7/5/16 4:24 AM, Albe Laurenz wrote:

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.

What is the use of the existing behavior? You get back an arbitrary
implementation dependent value. We don't even guarantee what the value
will be. If we changed it to return a different implementation
dependent value, would users get upset?

No they would not get upset because they wouldn't know.

Can we just do the right thing?

I'm in favour of fixing this, and update the documentation. But given
the discussions in the past, it seemed like people actually depend on
this behaviour. Hence the additional function.

if this is fixed, it's too late for the current beta. But it's a good
time to add a note in the release notes, and advise people that it will
be changed in the next release.

A workaround can be to rename the current function to something like
"to_date_legacy", or "to_date_oracle". And implement the checks in to_date.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Andreas 'ads' Scherbaum (#27)
Re: to_date_valid()

On 7/29/16 1:33 PM, Andreas 'ads' Scherbaum wrote:

On 27.07.2016 05:00, Joshua D. Drake wrote:

On 07/26/2016 06:25 PM, Peter Eisentraut wrote:

On 7/5/16 4:24 AM, Albe Laurenz wrote:

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.

What is the use of the existing behavior? You get back an arbitrary
implementation dependent value. We don't even guarantee what the value
will be. If we changed it to return a different implementation
dependent value, would users get upset?

No they would not get upset because they wouldn't know.

Can we just do the right thing?

I'm in favour of fixing this, and update the documentation.

+1. I'd say that if users complain we can always create an extension (on 
PGXN) that offers the old behavior. Users could even put that function 
before pg_catalog in search_path and get the old behavior back.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Joshua D. Drake (#26)
1 attachment(s)
Re: to_date_valid()

On 27.07.2016 05:00, Joshua D. Drake wrote:

On 07/26/2016 06:25 PM, Peter Eisentraut wrote:

On 7/5/16 4:24 AM, Albe Laurenz wrote:

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.

What is the use of the existing behavior? You get back an arbitrary
implementation dependent value. We don't even guarantee what the value
will be. If we changed it to return a different implementation
dependent value, would users get upset?

No they would not get upset because they wouldn't know.

Can we just do the right thing?

Attached is a patch to "do the right thing". The verification is in
"to_date()" now, the extra function is removed. Regression tests are
updated - two or three of them returned a wrong date before, and still
passed. They fail now. Documentation is also updated.

Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

Attachments:

to_date_valid.patch.gzapplication/gzip; name=to_date_valid.patch.gzDownload
#30Artur Zakirov
a.zakirov@postgrespro.ru
In reply to: Andreas 'ads' Scherbaum (#29)
1 attachment(s)
Re: to_date_valid()

On 14.08.2016 01:52, Andreas 'ads' Scherbaum wrote:

Attached is a patch to "do the right thing". The verification is in
"to_date()" now, the extra function is removed. Regression tests are
updated - two or three of them returned a wrong date before, and still
passed. They fail now. Documentation is also updated.

Regards,

Is it right and "true" way to validate date by extra transforming and
comparison?

Maybe validate date by using ValidateDate(). Attached sample patch.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachments:

to-date-valid.patchtext/x-patch; name=to-date-valid.patchDownload
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index bbd97dc..c0048c9 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -3563,7 +3563,9 @@ do_to_timestamp(text *date_txt, text *fmt,
 {
 	FormatNode *format;
 	TmFromChar	tmfc;
-	int			fmt_len;
+	int			fmt_len,
+				fmask = 0;		/* Bit mask for ValidateDate() */
+	char	   *date_str = NULL;
 
 	ZERO_tmfc(&tmfc);
 	ZERO_tm(tm);
@@ -3574,7 +3576,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (fmt_len)
 	{
 		char	   *fmt_str;
-		char	   *date_str;
 		bool		incache;
 
 		fmt_str = text_to_cstring(fmt);
@@ -3630,7 +3631,6 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 		DCH_from_char(format, date_str, &tmfc);
 
-		pfree(date_str);
 		pfree(fmt_str);
 		if (!incache)
 			pfree(format);
@@ -3706,6 +3706,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			if (tmfc.bc && tm->tm_year > 0)
 				tm->tm_year = -(tm->tm_year - 1);
 		}
+		fmask |= DTK_M(YEAR);
 	}
 	else if (tmfc.cc)			/* use first year of century */
 	{
@@ -3717,10 +3718,14 @@ do_to_timestamp(text *date_txt, text *fmt,
 		else
 			/* +1 because year == 599 is 600 BC */
 			tm->tm_year = tmfc.cc * 100 + 1;
+		fmask |= DTK_M(YEAR);
 	}
 
 	if (tmfc.j)
+	{
 		j2date(tmfc.j, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+		fmask |= DTK_DATE_M;
+	}
 
 	if (tmfc.ww)
 	{
@@ -3734,6 +3739,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 				isoweekdate2date(tmfc.ww, tmfc.d, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
 			else
 				isoweek2date(tmfc.ww, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 			tmfc.ddd = (tmfc.ww - 1) * 7 + 1;
@@ -3744,11 +3750,17 @@ do_to_timestamp(text *date_txt, text *fmt,
 	if (tmfc.d)
 		tm->tm_wday = tmfc.d - 1;		/* convert to native numbering */
 	if (tmfc.dd)
+	{
 		tm->tm_mday = tmfc.dd;
+		fmask |= DTK_M(DAY);
+	}
 	if (tmfc.ddd)
 		tm->tm_yday = tmfc.ddd;
 	if (tmfc.mm)
+	{
 		tm->tm_mon = tmfc.mm;
+		fmask |= DTK_M(MONTH);
+	}
 
 	if (tmfc.ddd && (tm->tm_mon <= 1 || tm->tm_mday <= 1))
 	{
@@ -3771,6 +3783,7 @@ do_to_timestamp(text *date_txt, text *fmt,
 			j0 = isoweek2j(tm->tm_year, 1) - 1;
 
 			j2date(j0 + tmfc.ddd, &tm->tm_year, &tm->tm_mon, &tm->tm_mday);
+			fmask |= DTK_DATE_M;
 		}
 		else
 		{
@@ -3793,9 +3806,24 @@ do_to_timestamp(text *date_txt, text *fmt,
 
 			if (tm->tm_mday <= 1)
 				tm->tm_mday = tmfc.ddd - y[i - 1];
+
+			fmask |= DTK_M(MONTH) | DTK_M(DAY);
 		}
 	}
 
+	/* Validate date with bit mask received above */
+	if (fmask != 0 && date_str)
+	{
+		if (ValidateDate(fmask, false, false, false, tm) != 0)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATETIME_FIELD_OVERFLOW),
+					 errmsg("date/time field value out of range: \"%s\"",
+							date_str)));
+	}
+
+	if (date_str)
+		pfree(date_str);
+
 #ifdef HAVE_INT64_TIMESTAMP
 	if (tmfc.ms)
 		*fsec += tmfc.ms * 1000;
#31Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Artur Zakirov (#30)
Re: to_date_valid()

On 15.08.2016 10:24, Artur Zakirov wrote:

On 14.08.2016 01:52, Andreas 'ads' Scherbaum wrote:

Attached is a patch to "do the right thing". The verification is in
"to_date()" now, the extra function is removed. Regression tests are
updated - two or three of them returned a wrong date before, and still
passed. They fail now. Documentation is also updated.

Regards,

Is it right and "true" way to validate date by extra transforming and
comparison?

Maybe validate date by using ValidateDate(). Attached sample patch.

This does not solve the problem at hand, and let's wrong dates/formats
slip through:

./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make
--run-install --no-clean-at-all --patch
'/messages/by-id/95738e12-6ed6-daf5-9dcf-6336072e6b15@postgrespro.ru&

postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD');
to_date
------------
2011-12-08
(1 row)

That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32Artur Zakirov
a.zakirov@postgrespro.ru
In reply to: Andreas 'ads' Scherbaum (#31)
Re: to_date_valid()

On 15.08.2016 14:33, Andreas 'ads' Scherbaum wrote:

Is it right and "true" way to validate date by extra transforming and
comparison?

Maybe validate date by using ValidateDate(). Attached sample patch.

This does not solve the problem at hand, and let's wrong dates/formats
slip through:

./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make
--run-install --no-clean-at-all --patch
'/messages/by-id/95738e12-6ed6-daf5-9dcf-6336072e6b15@postgrespro.ru&

postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD');
to_date
------------
2011-12-08
(1 row)

That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.

I suppose that your sample query is an another issue, not date validate
task. I sent the patch to the thread
/messages/by-id/b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru
. It fixes formatting issues.

I thought that it is better to distinguish this issues to:
- validation of input date/timestmap string and input format string
- result date/timestamp validation

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Artur Zakirov (#32)
Re: to_date_valid()

On 15.08.2016 13:44, Artur Zakirov wrote:

On 15.08.2016 14:33, Andreas 'ads' Scherbaum wrote:

Is it right and "true" way to validate date by extra transforming and
comparison?

Maybe validate date by using ValidateDate(). Attached sample patch.

This does not solve the problem at hand, and let's wrong dates/formats
slip through:

./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make
--run-install --no-clean-at-all --patch
'/messages/by-id/95738e12-6ed6-daf5-9dcf-6336072e6b15@postgrespro.ru&

postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD');
to_date
------------
2011-12-08
(1 row)

That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.

I suppose that your sample query is an another issue, not date validate
task. I sent the patch to the thread
/messages/by-id/b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru
. It fixes formatting issues.

I thought that it is better to distinguish this issues to:
- validation of input date/timestmap string and input format string
- result date/timestamp validation

I was a bit confused when I've seen that you modified another function
in your patch, but tried it out nevertheless. After all, you answered
one of my emails, and I let my tool download your patch directly from
the website. The other thread you are mentioning is a different topic.

Anyway, what I'm trying to solve is validate that to_date() produces
valid and correct output. It does not, as of today.

Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andreas 'ads' Scherbaum (#31)
Re: to_date_valid()

On 8/15/16 7:33 AM, Andreas 'ads' Scherbaum wrote:

postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD');
to_date
------------
2011-12-08
(1 row)

That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.

It's debatable what is correct here.

Using to_number, the behavior appears to be that a space in the pattern
ignores one character. For example:

test=# select to_number('123 456', '999 999');
to_number
-----------
123456

test=# select to_number('123 456', '999 999');
to_number
-----------
12356

Considering that, the above to_date result is not incorrect.

So just squashing the spaces and converting the value back is not a
correct approach to detecting overflow.

I think using ValidateDate() was the right idea. That is what we use
for checking date validity everywhere else.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35Andreas 'ads' Scherbaum
adsmail@wars-nicht.de
In reply to: Peter Eisentraut (#34)
Re: to_date_valid()

On 08.09.2016 17:31, Peter Eisentraut wrote:

On 8/15/16 7:33 AM, Andreas 'ads' Scherbaum wrote:

postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD');
to_date
------------
2011-12-08
(1 row)

That is from the regression tests, and obviously handles the date
transformation wrong. My attempt catches this, because I compare the
date with the input date, and do not rely on a valid date only.

It's debatable what is correct here.

Using to_number, the behavior appears to be that a space in the pattern
ignores one character. For example:

test=# select to_number('123 456', '999 999');
to_number
-----------
123456

test=# select to_number('123 456', '999 999');
to_number
-----------
12356

Considering that, the above to_date result is not incorrect.

So just squashing the spaces and converting the value back is not a
correct approach to detecting overflow.

I think using ValidateDate() was the right idea. That is what we use
for checking date validity everywhere else.

ValidateDate() will tell you if it's a valid date. But not if the
transformation was correct:

postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD');
to_date
------------
2011-12-08
(1 row)

(with the patch from Artur)

Any idea how to solve this problem?

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Andreas 'ads' Scherbaum (#35)
Re: to_date_valid()

On 9/9/16 4:48 AM, Andreas 'ads' Scherbaum wrote:

ValidateDate() will tell you if it's a valid date. But not if the
transformation was correct:

postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD');
to_date
------------
2011-12-08
(1 row)

(with the patch from Artur)

Any idea how to solve this problem?

I think we need to do overflow checking as we read the various values,
in do_to_timestamp() and functions called from it. This would be very
similar to overflow checking in input functions.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37Tom Lane
tgl@sss.pgh.pa.us
In reply to: Peter Eisentraut (#34)
Re: to_date_valid()

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:

I think using ValidateDate() was the right idea. That is what we use
for checking date validity everywhere else.

Note that we've got two different CF entries, and threads, covering
fundamentally the same territory here, ie making to_timestamp et al
behave more sanely. The other thread is
https://commitfest.postgresql.org/10/713/
/messages/by-id/1873520224.1784572.1465833145330.JavaMail.yahoo@mail.yahoo.com

Robert pointed out several times in the other thread that to_timestamp
and friends were intended to be Oracle-compatible, which I agree with.
It was also pointed out that Oracle does range validity checks on the
timestamp component fields, which I verified just now using
http://rextester.com/l/oracle_online_compiler
(which is what I found by googling after discovering that sqlfiddle
isn't working very well, sigh). I got these results:

select banner as "oracle version" from v$version
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for 64-bit Windows: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01850: hour must be between 0 and 23

SELECT TO_TIMESTAMP('2016-06-13 19:99:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01851: minutes must be between 0 and 59

SELECT TO_TIMESTAMP('2016-06-13 19:59:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01852: seconds must be between 0 and 59

SELECT TO_TIMESTAMP('2016-06-13 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
13.06.2016 19:59:59

SELECT TO_TIMESTAMP('2016-06-13 19:59:59', 'YYYY-MM-DD HH:MI:SS') FROM dual
ORA-01849: hour must be between 1 and 12

SELECT TO_TIMESTAMP('2016-02-30 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01839: date not valid for month specified

SELECT TO_TIMESTAMP('2016-02-29 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
29.02.2016 19:59:59

SELECT TO_TIMESTAMP('2015-02-29 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual
ORA-01839: date not valid for month specified

I think this is sufficient evidence to show that we ought to change
to_timestamp et al. to apply range checking on the component fields.
And while I've not attempted to characterize exactly what Oracle
does with extra spaces, non-matching punctuation, etc, it's also
clear to me that their behavior is different and probably saner
than ours.

So I vote for fixing these functions to behave more like Oracle, and
forgetting about having a separate family of to_date_valid() functions
or optional parameters or anything of the sort. I might've been in
favor of that, until I saw how far down the rabbit hole this thread
had gotten. Let's just call it a bug and fix it.

Having range checking after the field scanning phase also changes the
terms of discussion about what we need to do in the scanning phase, since
it would catch many (of course not all) of the problems that arise from
field boundary issues. So I think we should get the range changing
committed first and then fool with the scanner.

The 0002 patch that Artur sent for this purpose in
/messages/by-id/22dbe4e0-b550-ca86-8634-adcda0faa159@postgrespro.ru
seems like the right approach to me, though I've not read it in detail
yet. I propose to work through that and commit it.

I'm much less sold on his 0001 patch, but I tend to agree that what
we want is to adjust the scanner behavior. I do not like the
reverse-conversion wrapper that Andreas proposes here; quite aside from
micro-efficiency issues, it seems to me that that just begs the question
of how you know what the input is supposed to mean.

In short, I think we should reject this thread + CF entry altogether
and instead move forward with the approach in the other thread.
It's probably too late in the September CF to close out the other
submission entirely, but we could get the range checking committed
and then RWF the other part for reconsideration later.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers