TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

Started by Aleksander Alekseevover 1 year ago12 messagesbugs
Jump to latest
#1Aleksander Alekseev
aleksander@timescale.com

Hi hackers,

We discovered one strange edge case with TimestampTz:

```
=# set datestyle to 'Postgres';
SET
=# SELECT '1000-01-01'::timestamptz::text;
text
------------------------------
Wed Jan 01 00:00:00 1000 LMT

=# SELECT '1000-01-01'::timestamptz::text::timestamptz;
ERROR: invalid input syntax for type timestamp with time zone: "Wed
Jan 01 00:00:00 1000 LMT"
```

When DateStyle is set to 'ISO' everything works fine:

```
=# set datestyle to 'ISO';
SET
=# SELECT '1000-01-01'::timestamptz::text;
text
------------------------------
1000-01-01 00:00:00+02:30:17

=# SELECT '1000-01-01'::timestamptz::text::timestamptz;
timestamptz
------------------------------
1000-01-01 00:00:00+02:30:17
```

If I understand correctly, text->timestamptz doesn't understand the
'LMT' timezone. Until 1879 it doesn't work, but in 1880 we switch to
'MMT' and then it works:

```
eax=# SELECT '1879-01-01'::timestamptz::text::timestamptz;
ERROR: invalid input syntax for type timestamp with time zone: "Wed
Jan 01 00:00:00 1879 LMT"
eax=# SELECT '1880-01-01'::timestamptz::text::timestamptz;
timestamptz
------------------------------
Wed Dec 31 20:00:17 1879 LMT
(1 row)

eax=# SELECT '1880-01-01'::timestamptz::text;
text
------------------------------
Thu Jan 01 00:00:00 1880 MMT
```

It seems to me that in the first case we should either accept "Wed Jan
01 00:00:00 1000 LMT" (since we just generated it) or alternatively
produce something else when casting timestamptz to text.

Thoughts?

--
Best regards,
Aleksander Alekseev

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aleksander Alekseev (#1)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

Aleksander Alekseev <aleksander@timescale.com> writes:

=# set datestyle to 'Postgres';
SET
=# SELECT '1000-01-01'::timestamptz::text::timestamptz;
ERROR: invalid input syntax for type timestamp with time zone: "Wed
Jan 01 00:00:00 1000 LMT"

If I understand correctly, text->timestamptz doesn't understand the
'LMT' timezone.

So it seems. In a quick experiment, it seemed like just treating LMT
as a noise word on input might be enough, but I don't have time today
to poke at it further.

regards, tom lane

#3Aleksander Alekseev
aleksander@timescale.com
In reply to: Tom Lane (#2)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

Hi Tom,

If I understand correctly, text->timestamptz doesn't understand the
'LMT' timezone.

So it seems. In a quick experiment, it seemed like just treating LMT
as a noise word on input might be enough, but I don't have time today
to poke at it further.

Thanks for the hint. I've found the corresponding piece of code and
will submit a proper patch shortly.

--
Best regards,
Aleksander Alekseev

#4Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#3)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

Hi,

If I understand correctly, text->timestamptz doesn't understand the
'LMT' timezone.

So it seems. In a quick experiment, it seemed like just treating LMT
as a noise word on input might be enough, but I don't have time today
to poke at it further.

Thanks for the hint. I've found the corresponding piece of code and
will submit a proper patch shortly.

Here is the patch.

--
Best regards,
Aleksander Alekseev

Attachments:

v1-0001-Make-LMT-an-ignored-date-time-keyword.patchapplication/octet-stream; name=v1-0001-Make-LMT-an-ignored-date-time-keyword.patchDownload+24-1
#5Michael Paquier
michael@paquier.xyz
In reply to: Aleksander Alekseev (#4)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

On Thu, Nov 07, 2024 at 06:09:15PM +0300, Aleksander Alekseev wrote:

Here is the patch.

@@ -139,6 +139,7 @@ static const datetkn datetktbl[] = {
 	{"july", MONTH, 7},
 	{"jun", MONTH, 6},
 	{"june", MONTH, 6},
+	{"lmt", IGNORE_DTF, 0},		/* "lmt" (throwaway) */
 	{"m", UNITS, DTK_MONTH},	/* "month" for ISO input */
 	{"mar", MONTH, 3},
 	{"march", MONTH, 3},

FWIW, I have a hard time understanding why ignoring "LMT", which is
a timezone abbreviation meaning "Local Mean Time", in datetktbl[],
which is a table for date/time keywords, is the right thing to do.

The comment at the top of datetktbl also mentions that this includes
no TZ, DTZ, or DYNTZ entries, and that we should rely on zoneabbrevtbl
instead, which is pushed by InstallTimeZoneAbbrevs() depending on the
GUC assign callback of timezone_abbreviations. So that seems just
inconsistent to me. Perhaps that's OK at the end, but this ignores
the existing comment on top of the table.

Worth noting that we may want to do something in ECPG's dt_common.c as
well, that holds a similar table with TZ abbreviation data and LMT is
missing there?
--
Michael

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#5)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

Michael Paquier <michael@paquier.xyz> writes:

FWIW, I have a hard time understanding why ignoring "LMT", which is
a timezone abbreviation meaning "Local Mean Time", in datetktbl[],
which is a table for date/time keywords, is the right thing to do.

I agree. We should be mapping it to whatever GMT offset "LMT" means
in the selected zone (assuming there is an entry, which it seems like
there is in most of them). I've not gotten around to looking at what
that would take, but it's probably not entirely trivial, since it
would mean different GMT offsets in different zones.

One thing that might be interesting to look at is whether the same
mechanism could be used for other TZ abbreviations defined by the tzdb
data, instead of relying on our hard-wired lists.

Worth noting that we may want to do something in ECPG's dt_common.c as
well, that holds a similar table with TZ abbreviation data and LMT is
missing there?

ECPG's datetime support is so far behind the server's that it's kind
of sad. But bringing it up to speed would be a large investment of
effort, something I for one am not prepared to put into it. I sure
don't think we should hold up fixing this on the server side pending
making that happen.

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

I wrote:

I agree. We should be mapping it to whatever GMT offset "LMT" means
in the selected zone (assuming there is an entry, which it seems like
there is in most of them). I've not gotten around to looking at what
that would take, but it's probably not entirely trivial, since it
would mean different GMT offsets in different zones.

Here's a draft patch for that. It turns out to be simpler than I
thought, because we can mostly piggy-back on the existing DYNTZ
support. The difference between LMT and one of our existing dynamic
abbreviations is that LMT means whatever it means in the prevailing
session timezone, while a dynamic abbreviation specifies which TZDB
timezone it refers to.

One thing that might be interesting to look at is whether the same
mechanism could be used for other TZ abbreviations defined by the tzdb
data, instead of relying on our hard-wired lists.

As I set it up here, we first check the timezone abbreviation list,
then look into the session timezone to see if it has an entry.
I don't expect that this lookup will succeed for anything except LMT,
because every other abbreviation that TZDB knows about is already
listed in our standard abbreviation list. But in the future we
could imagine removing entries from the abbreviation list so that
this code path takes more of the burden. That'd be particularly
attractive for abbreviations that have conflicts, because then
our interpretation of the abbreviation would automatically adapt
based on the timezone setting. That's something to pursue another
day though. We might need to work a bit harder on optimizing
this code path before we let it take anything more common than
LMT, too.

This is not committable because I didn't think about documentation
yet. We probably want to explain this in Appendix B, and also
there are assorted comments about what a "dynamic abbreviation"
is that will need some adjustment.

regards, tom lane

Attachments:

v1-handle-LMT-as-a-dynamic-abbreviation.patchtext/x-diff; charset=us-ascii; name=v1-handle-LMT-as-a-dynamic-abbreviation.patchDownload+120-9
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

I wrote:

As I set it up here, we first check the timezone abbreviation list,
then look into the session timezone to see if it has an entry.
I don't expect that this lookup will succeed for anything except LMT,
because every other abbreviation that TZDB knows about is already
listed in our standard abbreviation list. But in the future we
could imagine removing entries from the abbreviation list so that
this code path takes more of the burden.

I had a second thought here. The original problem is not really
restricted to "LMT", though that case tends to give an obvious
"no such timezone abbreviation" failure. There is a more insidious
possibility that the cast-back-and-forth succeeds but yields a changed
timestamp value, because the abbreviation emitted by timestamptz_out
is recognized but interpreted differently by timestamptz_in.
I believe that that's at least theoretically possible today, because
what timestamptz_out prints for the abbreviation comes out of the
prevailing zone's TZDB entry, but what timestamptz_in consults is the
timezone_abbreviations list. If our list is out of sync with TZDB,
even for just part of the history of a zone, we've got problems.

It may well be that there are no live problems today, especially
since TZDB has gotten rid of a lot of abbreviations that they
decided were made-up rather than in real-world usage. I don't
feel like that's a great bet though, and even if it's true today
it might not be in the future.

So that leads me to wonder if we should flip the lookup order
and consult the prevailing zone's TZDB entry first, falling back
to timezone_abbreviations only if the abbrev is not known in the
current zone.

There are certainly reasons not to do that, one being that there
would be no way to override TZDB's opinion if you don't like it.
And I'd not propose it for back-patch. But it's something to
consider.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

I wrote:

I had a second thought here. The original problem is not really
restricted to "LMT", though that case tends to give an obvious
"no such timezone abbreviation" failure. There is a more insidious
possibility that the cast-back-and-forth succeeds but yields a changed
timestamp value, because the abbreviation emitted by timestamptz_out
is recognized but interpreted differently by timestamptz_in.
I believe that that's at least theoretically possible today, because
what timestamptz_out prints for the abbreviation comes out of the
prevailing zone's TZDB entry, but what timestamptz_in consults is the
timezone_abbreviations list. If our list is out of sync with TZDB,
even for just part of the history of a zone, we've got problems.

I decided to troll through TZDB to see whether this is a live issue
or not, and it didn't take me more than a couple minutes to find
such a case:

regression=# set timezone to 'America/Montevideo';
SET
regression=# set datestyle to postgres;
SET
regression=# select '1912-01-01'::timestamptz;
timestamptz
------------------------------
Mon Jan 01 00:00:00 1912 MMT
(1 row)

regression=# select '1912-01-01'::timestamptz::text::timestamptz;
timestamptz
------------------------------
Sun Dec 31 13:45:09 1911 MMT
(1 row)

That's because our default timezone_abbreviations list thinks MMT
means

MMT 23400 # Myanmar Time (obsolete)

whereas TZDB has this for Montevideo:

Zone America/Montevideo -3:44:51 - LMT 1908 Jun 10
-3:44:51 - MMT 1920 May 1 # Montevideo MT
-4:00 - %z 1923 Oct 1
...

Other examples are not hard to come by, eg the same date
in zone Europe/Athens. So this shows that the problem is
real and it can result in sizable errors.

I now recall that our timezone abbreviation list was built by
considering *current* zone abbreviations that appeared in TZDB
whenever we made that list, a decade or two back. We never
thought of trying to cope with historical values. There are
many more conflicting abbreviations if you include the
historical entries.

So it feels like we'd better do something about this, if we
don't want to deprecate the Postgres datestyle entirely.

I find it scary that the solution involves changing the
behavior of timestamptz_in, because that's surely got hazards
of unexpected side-effects; but it's clear that this is a mess.

regards, tom lane

#10Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Tom Lane (#9)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

On Wed, 2024-12-11 at 17:40 -0500, Tom Lane wrote:

So it feels like we'd better do something about this, if we
don't want to deprecate the Postgres datestyle entirely.

I agree that something should be done, but I'd have no problem
with deprecating the Postgres datestyle. People who depend on
it can switch to using to_date() and to_timestamp(), which
shouldn't impose too much hardship if we give them enough time.

Yours,
Laurenz Albe

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Laurenz Albe (#10)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

Laurenz Albe <laurenz.albe@cybertec.at> writes:

I agree that something should be done, but I'd have no problem
with deprecating the Postgres datestyle. People who depend on
it can switch to using to_date() and to_timestamp(), which
shouldn't impose too much hardship if we give them enough time.

It's not like to_timestamp doesn't have the same problem...

Anyway, I've started a -hackers thread about this issue at [1]/messages/by-id/957280.1734046384@sss.pgh.pa.us.
Whatever we do probably deserves wider discussion than it will
get on pgsql-bugs.

regards, tom lane

[1]: /messages/by-id/957280.1734046384@sss.pgh.pa.us

#12Aleksander Alekseev
aleksander@timescale.com
In reply to: Tom Lane (#11)
Re: TimestampTz->Text->TimestampTz casting fails with DateStyle 'Postgres'

Hi,

Anyway, I've started a -hackers thread about this issue at [1].
Whatever we do probably deserves wider discussion than it will
get on pgsql-bugs.

regards, tom lane

[1] /messages/by-id/957280.1734046384@sss.pgh.pa.us

Thanks, Tom. For the record, I'm closing the CF entry for my patch as
RwF since clearly the patch was wrong and the issue needs more
discussion.

--
Best regards,
Aleksander Alekseev