A creepy story about dates. How to prevent it?
Hi.
I will explain my experience to prevent other to fall in the same problem.
We upgraded from 7.2 to 7.3 few days ago. We reload the data via pg_dump an
psql -i <file_dumped>.
The pg_dump was made with PGDATESTYLE=SQL, European (dd/mm/yyyy).
When we load the database we didn't configure te pgdestyle yet, then it was
the default (mm/dd/yy). We populate the data successfully.
Some days later we catch on that some dates are correct and some not. Some
dates appears with day and month exchanged. In this case Postgres
flexibility turn against us.
The load proces did this:
INSERT INTO taula VALUES('12/01/2003,..
INSERT INTO taula VALUES('13/01/2003,..
SELECT fecha FROM taula:
fecha
--------
12/01/03
01/13/03 !!!!!
I suggest that maybe the pg_dump has to store the datestyle in order to
prevent this problems, no?
Regards,
Conxita Mar�n
=?iso-8859-1?Q?Conxita_Mar=EDn?= <comarin@telefonica.net> writes:
I suggest that maybe the pg_dump has to store the datestyle in order to
prevent this problems, no?
In 7.3.3 and later, pg_dump forces ISO datestyle while dumping, to
forestall exactly this problem. Sorry that you got bit.
When you're doing an update, it's often a good idea to use the newer
version's pg_dump to extract the data from the older database. We
keep pg_dump compatible with back versions (presently it handles
anything back to 7.0) so that you can take advantage of bugfixes in
newer pg_dumps. Like this one ...
regards, tom lane
On Wed, 18 Jun 2003, Tom Lane wrote:
=?iso-8859-1?Q?Conxita_Mar=EDn?= <comarin@telefonica.net> writes:
I suggest that maybe the pg_dump has to store the datestyle in order to
prevent this problems, no?In 7.3.3 and later, pg_dump forces ISO datestyle while dumping, to
forestall exactly this problem. Sorry that you got bit.When you're doing an update, it's often a good idea to use the newer
version's pg_dump to extract the data from the older database. We
keep pg_dump compatible with back versions (presently it handles
anything back to 7.0) so that you can take advantage of bugfixes in
newer pg_dumps. Like this one ...
That reminds me, did we get the date parsing fixed so that you can't
insert 22/03/2003 into a european database (or conversely, 03/22/2003 into
a US database) ? I.e the problem where the date parser assumed you meant
it the other way...
"scott.marlowe" <scott.marlowe@ihs.com> writes:
That reminds me, did we get the date parsing fixed so that you can't
insert 22/03/2003 into a european database (or conversely, 03/22/2003 into
a US database) ? I.e the problem where the date parser assumed you meant
it the other way...
IIRC, there was no consensus that that's a bug.
regards, tom lane
On Wed, 18 Jun 2003, Tom Lane wrote:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
That reminds me, did we get the date parsing fixed so that you can't
insert 22/03/2003 into a european database (or conversely, 03/22/2003 into
a US database) ? I.e the problem where the date parser assumed you meant
it the other way...IIRC, there was no consensus that that's a bug.
I thought there was, and someone had said they were gonna fix it.
IMHO it is a bug. We don't let postgresql "guess" about a lot of more
obvious things (i.e. int4 to int8 casting, etc...) and letting it guess
about dates makes it non-ACID compliant.
If it isn't a bug, how do I implement a check constraint to stop it from
happening? I'd like to know my database accepts properly formatted input
and rejects the rest. That's what the C in ACID means, right?
"scott.marlowe" <scott.marlowe@ihs.com> writes:
IMHO it is a bug. We don't let postgresql "guess" about a lot of more
obvious things (i.e. int4 to int8 casting, etc...) and letting it guess
about dates makes it non-ACID compliant.
How do you arrive at that conclusion?
If it isn't a bug, how do I implement a check constraint to stop it from
happening? I'd like to know my database accepts properly formatted input
and rejects the rest. That's what the C in ACID means, right?
Do the checking in your application. Something you think is improperly
formatted probably shouldn't get to the database in the first place.
If you aren't doing any format checking at all, you're possibly
vulnerable to SQL injection attacks.
I do now seem to recall an agreement that a GUC switch to disable
date-interpretation guessing would be okay, though.
regards, tom lane
On Wed, 18 Jun 2003, Tom Lane wrote:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
IMHO it is a bug. We don't let postgresql "guess" about a lot of more
obvious things (i.e. int4 to int8 casting, etc...) and letting it guess
about dates makes it non-ACID compliant.How do you arrive at that conclusion?
The same way I come to all my conclusions, logic. :-) but seriously...
Why not accept a date of 04/44/2003 and just wrap it into May? It's
the same kind of thing. I told my database where I live, and expect it to
only accept dates that are valid in my locale. If a user feeds it a date
that isn't right, I expect the database to error out.
If it isn't a bug, how do I implement a check constraint to stop it from
happening? I'd like to know my database accepts properly formatted input
and rejects the rest. That's what the C in ACID means, right?Do the checking in your application.
I do. I make sure it's ##/##/#### (i.e. a simple regex works)
The database already does the rest of the checking for me, it just happens
to think it might be helpful to coerce some bad dates for me, but others
that are obviously wrong are tossed out.
Here's a scenario for how we can wind up teaching a user to enter dates
the wrong way. The day is 22 feb. They enter this date, in the US, where
mm/dd/yyyy is standard:
22/02/2003
The database converts it to
02/22/2003 silently.
Next day, they enter
23/02/2003
Again, it takes it silently.
So on and so forth. On the first day of march they put in:
01/03/2003 which the database takes as January 03, and happily puts it in.
with a couple of weeks of "training" the user now believes they are
putting the date right, but it is wrong. No error.
Next day, we get
02/03/2003. The database puts in Feb 03. Again, the user doesn't know.
We continue the rest of the year this way. Somewhere along the line, the
user notices all their reports have the wrong date. Which ones were for
feb 03 and which ones were for march 02? We don't know.
Why bother checking for wrong dates sometimes (i.e. 03/33/2003) but not
all the time (i.e. 13/03/2003)?
Something you think is improperly
formatted probably shouldn't get to the database in the first place.
Agreed. But that's not the point. It is properly formatted, i.e.
mm/dd/yyyy, it's just out of range. That's not the same at all.
If you aren't doing any format checking at all, you're possibly
vulnerable to SQL injection attacks.
I do plenty of format checking, this isn't the same. This is range
checking. I expect my database to do that for me.
I do now seem to recall an agreement that a GUC switch to disable
date-interpretation guessing would be okay, though.
I'm pretty sure it was the other way around, make strict locale / date
checking the standard and a GUC to turn it off for folks who really want
to use a broken database. :-)
"scott.marlowe" <scott.marlowe@ihs.com> writes:
On Wed, 18 Jun 2003, Tom Lane wrote:
I do now seem to recall an agreement that a GUC switch to disable
date-interpretation guessing would be okay, though.
I'm pretty sure it was the other way around, make strict locale / date
checking the standard and a GUC to turn it off for folks who really want
to use a broken database. :-)
This is definitely a case where what is "broken" is in the eye of the
beholder. If the current behavior is broken, why have we had so few
complaints about it? It's been like that for quite a few years now.
I think that on grounds of backwards compatibility alone, we should
leave the out-of-the-box default behavior as it is.
regards, tom lane
This is an old postgres "gotcha" that I've lived with since a long, long
time ago. It forced me to always use the datestyle YYYY-MM-DD so there
is never any confusion in my programming.
I then convert the date to whatever the current user preference is, at
the application level or with an SQL query call to to_char().
I suggest that this is a good practice to follow.
- Andrew
On 18/06/2003 22:09 scott.marlowe wrote:
[snip]
Why bother checking for wrong dates sometimes (i.e. 03/33/2003) but not
all the time (i.e. 13/03/2003)?
ISTR that DB2 used to do the same (maybe still does - haven't used it for
a few years). Still, IMHO, it is frustrating. I would rather have the db
be ultra-strict, forcing me to handle dates correctly and rigorously in my
app.
[snip]
I do plenty of format checking, this isn't the same. This is range
checking. I expect my database to do that for me.
Agreed, although if the user gets his data thrown back at him by the db
then I feel that this is a bug in my program. YMMV.
regards
--
Paul Thomas
+------------------------------+---------------------------------------------+
| Thomas Micro Systems Limited | Software Solutions for the Smaller
Business |
| Computer Consultants |
http://www.thomas-micro-systems-ltd.co.uk |
+------------------------------+---------------------------------------------+
On Wed, 2003-06-18 at 18:09, Paul Thomas wrote:
On 18/06/2003 22:09 scott.marlowe wrote:
[snip]
Why bother checking for wrong dates sometimes (i.e. 03/33/2003) but not
all the time (i.e. 13/03/2003)?ISTR that DB2 used to do the same (maybe still does - haven't used it for
a few years). Still, IMHO, it is frustrating. I would rather have the db
be ultra-strict, forcing me to handle dates correctly and rigorously in my
app.[snip]
I do plenty of format checking, this isn't the same. This is range
checking. I expect my database to do that for me.Agreed, although if the user gets his data thrown back at him by the db
then I feel that this is a bug in my program. YMMV.
I hate to say, "Me too!", but I do agree with Paul and Scott.
--
+-----------------------------------------------------------+
| Ron Johnson, Jr. Home: ron.l.johnson@cox.net |
| Jefferson, LA USA http://members.cox.net/ron.l.johnson |
| |
| "Oh, great altar of passive entertainment, bestow upon me |
| thy discordant images at such speed as to render linear |
| thought impossible" (Calvin, regarding TV) |
+-----------------------------------------------------------
(Comments from the peanut gallery here)
IMHO it is a bug. We don't let postgresql "guess" about a lot of more
obvious things (i.e. int4 to int8 casting, etc...) and letting it guess
about dates makes it non-ACID compliant.How do you arrive at that conclusion?
The same way I come to all my conclusions, logic. :-) but seriously...
Why not accept a date of 04/44/2003 and just wrap it into May? It's
the same kind of thing.
Is it? Similar, perhaps, but at least you can be pretty sure that 44 and
2003 are not valid months. (Not that I want the database fixing that for
me, either.)
I told my database where I live, and expect it to
only accept dates that are valid in my locale.
I wouldn't suggest that. Locale is not dependable because there is
simply no dependable way of mapping, for example, IP address to a
physical location, much less to a cultural location.
Also, even if you had a truly accurate way to determine that your user
was Japanese, you wouldn't know whether the user intended western era or
Japanese era, whether the user was doing year-month-day according
tradition here, or whether the user was trying to anticipate a western
order because the web site was in English.
(I'd give about a 90% odds that the date 01.02.03 entered into a web
site by someone Japanese is going to mean 3 Feb 2001, 8% that it's going
to mean 3rd of February of the year Heisei 1, and 2% that the user is
going to try to second-guess and enter it in one of the western orders.)
In the US, you will also have users who may be accustomed to using
military or genealogy order, as well.
If a user feeds it a date
that isn't right, I expect the database to error out.
I personally wouldn't want the order checked in the database. I think
I'd prefer that the application set the order and that the database
limit itself to range-checking the elements.
If it isn't a bug, how do I implement a check constraint to stop it from
happening?
Good question. Another good question is how the database would implement
the check.
I'd like to know my database accepts properly formatted input
and rejects the rest. That's what the C in ACID means, right?Do the checking in your application.
I do. I make sure it's ##/##/#### (i.e. a simple regex works)
That's not much of a check, of course.
The database already does the rest of the checking for me, it just happens
to think it might be helpful to coerce some bad dates for me, but others
that are obviously wrong are tossed out.
If this means what it appears to mean, that, when the database is
expecting day-month-year, it "fixes" 02/22/2003 to 2003.FEB.22, that
would worry me. I'd prefer that sort of behavior to be off by default.
I'd much rather have it just cough on a month 22. If attempted, I'd want
it in the application layer.
Here's a scenario for how we can wind up teaching a user to enter dates
the wrong way. The day is 22 feb. They enter this date, in the US, where
mm/dd/yyyy is standard:22/02/2003
The database converts it to
02/22/2003 silently.
The application layer should report, explicitly, how it interpreted what
was entered. Explicity feedback is the only way to make reasonably sure
the user and the database are on the same wavelength.
...
We continue the rest of the year this way. Somewhere along the line, the
user notices all their reports have the wrong date. Which ones were for
feb 03 and which ones were for march 02? We don't know.Why bother checking for wrong dates sometimes (i.e. 03/33/2003) but not
all the time (i.e. 13/03/2003)?Something you think is improperly
formatted probably shouldn't get to the database in the first place.Agreed. But that's not the point. It is properly formatted, i.e.
mm/dd/yyyy, it's just out of range. That's not the same at all.
I think order would still be format, which is why I wouldn't want the
database checking it, and especially not trying to fix it.
...
I do now seem to recall an agreement that a GUC switch to disable
date-interpretation guessing would be okay, though.I'm pretty sure it was the other way around, make strict locale / date
checking the standard and a GUC to turn it off for folks who really want
to use a broken database. :-)
I would not want the database guessing the order from the locale, either.
My JPY 2.
--
Joel Rees, programmer, Kansai Systems Group
Altech Corporation (Alpsgiken), Osaka, Japan
http://www.alpsgiken.co.jp
On Wed, 18 Jun 2003, Tom Lane wrote:
I'm pretty sure it was the other way around, make strict locale / date
checking the standard and a GUC to turn it offI think that on grounds of backwards compatibility alone, we should
leave the out-of-the-box default behavior as it is.
Who can rely on pg to accept dates that are "wrong"?
If some program (or person) generates a date where the month is 22
according to the DateStyle, then pg can guess that it's a day and correct
the date. But if someone relies on this then they will most likely also
generate other dates that will be accepted and intepreted by pg in a
different way. And you got broken data in the database.
The reason it have worked before is because everyone who had it wrong
before had to fix it anyway or get broken data into the database.
--
/Dennis
On Thu, 2003-06-19 at 01:06, Joel Rees wrote:
(Comments from the peanut gallery here)
IMHO it is a bug. We don't let postgresql "guess" about a lot of more
obvious things (i.e. int4 to int8 casting, etc...) and letting it guess
about dates makes it non-ACID compliant.How do you arrive at that conclusion?
The same way I come to all my conclusions, logic. :-) but seriously...
Why not accept a date of 04/44/2003 and just wrap it into May? It's
the same kind of thing.Is it? Similar, perhaps, but at least you can be pretty sure that 44 and
2003 are not valid months. (Not that I want the database fixing that for
me, either.)I told my database where I live, and expect it to
only accept dates that are valid in my locale.I wouldn't suggest that. Locale is not dependable because there is
simply no dependable way of mapping, for example, IP address to a
physical location, much less to a cultural location.
The locale specified by the SysAdmin should be canononical.
[snip]
Good question. Another good question is how the database would implement
the check.
Other databases do it. It can't be *that* hard to do.
OTOH, Andrew Snow's method (alway use ANSI standard YYYY-MM-DD)
is guaranteed to work. Have your app convert to that format before
inserting, and then PostgreSQL is guaranteed to puke if there's
a problem.
--
+-----------------------------------------------------------+
| Ron Johnson, Jr. Home: ron.l.johnson@cox.net |
| Jefferson, LA USA http://members.cox.net/ron.l.johnson |
| |
| "Oh, great altar of passive entertainment, bestow upon me |
| thy discordant images at such speed as to render linear |
| thought impossible" (Calvin, regarding TV) |
+-----------------------------------------------------------
On Thu, Jun 19, 2003 at 02:43:12 -0500,
Ron Johnson <ron.l.johnson@cox.net> wrote:
OTOH, Andrew Snow's method (alway use ANSI standard YYYY-MM-DD)
is guaranteed to work. Have your app convert to that format before
inserting, and then PostgreSQL is guaranteed to puke if there's
a problem.
No it isn't. In 7.4:
area=> select '2003-20-02'::date;
date
------------
2003-02-20
(1 row)
On Thu, 2003-06-19 at 05:35, Bruno Wolff III wrote:
On Thu, Jun 19, 2003 at 02:43:12 -0500,
Ron Johnson <ron.l.johnson@cox.net> wrote:OTOH, Andrew Snow's method (alway use ANSI standard YYYY-MM-DD)
is guaranteed to work. Have your app convert to that format before
inserting, and then PostgreSQL is guaranteed to puke if there's
a problem.No it isn't. In 7.4:
area=> select '2003-20-02'::date;
date
------------
2003-02-20
(1 row)
Whoa...
It shouldn't be difficult, though, to create a function to validate
dates, and have it be an implicit CHECK on date fields. Should it?
--
+-----------------------------------------------------------+
| Ron Johnson, Jr. Home: ron.l.johnson@cox.net |
| Jefferson, LA USA http://members.cox.net/ron.l.johnson |
| |
| "Oh, great altar of passive entertainment, bestow upon me |
| thy discordant images at such speed as to render linear |
| thought impossible" (Calvin, regarding TV) |
+-----------------------------------------------------------
The problem is that you can't do the check on the _field_ since it has
already been converted to a date.
Jon
On 19 Jun 2003, Ron Johnson wrote:
Show quoted text
On Thu, 2003-06-19 at 05:35, Bruno Wolff III wrote:
On Thu, Jun 19, 2003 at 02:43:12 -0500,
Ron Johnson <ron.l.johnson@cox.net> wrote:OTOH, Andrew Snow's method (alway use ANSI standard YYYY-MM-DD)
is guaranteed to work. Have your app convert to that format before
inserting, and then PostgreSQL is guaranteed to puke if there's
a problem.No it isn't. In 7.4:
area=> select '2003-20-02'::date;
date
------------
2003-02-20
(1 row)Whoa...
It shouldn't be difficult, though, to create a function to validate
dates, and have it be an implicit CHECK on date fields. Should it?-- +-----------------------------------------------------------+ | Ron Johnson, Jr. Home: ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "Oh, great altar of passive entertainment, bestow upon me | | thy discordant images at such speed as to render linear | | thought impossible" (Calvin, regarding TV) | +--------------------------------------------------------------------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
On Wed, 18 Jun 2003 15:09:44 -0600 (MDT), scott.marlowe wrote:
Why not accept a date of 04/44/2003 and just wrap it into May?
Unbelievably, to_date() already does that in 7.3.3:
pmh=> select to_date('2003-04-44','yyyy-mm-dd');
to_date
------------
2003-05-14
(1 row)
I raised this issue along with the date component order switching in April.
Someone (possibly Karel Zak) did say that they were going to look into it,
but it doesn't look like anything got done about it in 7.3. Maybe it's
better in 7.4?
--
Peter Haworth pmh@edison.ioppublishing.com
"you have been evaluated. you have a negative reference count.
prepare to be garbage collected. persistence is futile."
-- Erik Naggum
scott.marlowe writes:
That reminds me, did we get the date parsing fixed so that you can't
insert 22/03/2003 into a european database (or conversely, 03/22/2003 into
a US database) ? I.e the problem where the date parser assumed you meant
it the other way...
I vaguely recall that we wanted to provide some kind of configuration
option to make the date format checking more strict. If I'm mistaken, I
would certainly be in favor of it now. But exactly how strict it ought to
be is up for discussion.
--
Peter Eisentraut peter_e@gmx.net
On Wed, 18 Jun 2003, Tom Lane wrote:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
On Wed, 18 Jun 2003, Tom Lane wrote:
I do now seem to recall an agreement that a GUC switch to disable
date-interpretation guessing would be okay, though.I'm pretty sure it was the other way around, make strict locale / date
checking the standard and a GUC to turn it off for folks who really want
to use a broken database. :-)This is definitely a case where what is "broken" is in the eye of the
beholder. If the current behavior is broken, why have we had so few
complaints about it? It's been like that for quite a few years now.I think that on grounds of backwards compatibility alone, we should
leave the out-of-the-box default behavior as it is.
I thought of another "silent failure" scenario.
Imports. If you have say 10,000 rows to import, and all the dates are in
euro style format going into a us style database, then all the ones where
the "month" is <13 will be put in wrong, and all the ones with a "month"
12 will be silently converted to be right. Now half the dates are right,
and half are wrong, and there's no error.
That's the worst of possibilities. Better to fail grandly than silently
corrupt data.