Calendar support in localization

Started by Surafel Temesgenalmost 5 years ago19 messages
#1Surafel Temesgen
surafel3000@gmail.com

Hi all,
My country(Ethiopia) is one of the nations that uses different kind of
calendar than what PostgreSQL have so we are deprived from the benefit of
data datatype. We just uses String to store date that limits our
application quality greatly. The lag became even worst once application and
system time support is available and it seems to me it is not fair to
suggest to add other date data type kind and implementation for just
different calendar that even not minor user group. Having calendar support
to localization will be very very very very exciting feature for none
Gregorian calendar user group and make so loved. As far as i can see the
difficult thing is understanding different calendar. I can prepare a patch
for Ethiopian calendar once we have consensus.

I cc Thomas Munro and Vik because they have interest on this area

Please don't suggests to fork from PostgreSQL just for this feature

regards
Surafel

#2Thomas Munro
thomas.munro@gmail.com
In reply to: Surafel Temesgen (#1)
Re: Calendar support in localization

Hi Surafel,

On Tue, Mar 16, 2021 at 3:48 AM Surafel Temesgen <surafel3000@gmail.com> wrote:

My country(Ethiopia) is one of the nations that uses different kind of calendar than what PostgreSQL have so we are deprived from the benefit of data datatype. We just uses String to store date that limits our application quality greatly. The lag became even worst once application and system time support is available and it seems to me it is not fair to suggest to add other date data type kind and implementation for just different calendar that even not minor user group. Having calendar support to localization will be very very very very exciting feature for none Gregorian calendar user group and make so loved. As far as i can see the difficult thing is understanding different calendar. I can prepare a patch for Ethiopian calendar once we have consensus.

One key question here is whether you need a different date type or
just different operations (functions, operators etc) on the existing
types.

I cc Thomas Munro and Vik because they have interest on this area

Last time it came up[1]/messages/by-id/CA+hUKG+ybW0LJuLtj3yAUsbOw3DrzK00pGk8JyfpCREzi_LSsg@mail.gmail.com, I got as far as wondering if the best way
would be to write a set of ICU-based calendar functions. Would it be
enough for your needs to have Ethiopic calendar-aware date arithmetic
(add, subtract a month etc), date part extraction (get the current
Ethiopic day/month/year of a date), display and parsing, and have all
of these as functions that you have to call explicitly, but have them
take the standard built-in date and timestamp types, so that your
tables would store regular date and timestamp values? If not, what
else do you need?

ICU is very well maintained and widely used software, and PostgreSQL
already depends on it optionally, and that's enabled in all common
distributions. In other words, maybe all the logic you want exists
already in your process's memory, we just have to figure out how to
reach it from SQL. Another reason to use ICU is that we can solve
this problem once and then it'll work for many other calendars.

Please don't suggests to fork from PostgreSQL just for this feature

I would start with an extension, and I'd try to do a small set of
simple functions, to let me write things like:

icu_format(now(), 'fr_FR@calendar=buddhist') to get a Buddhist
calendar with French words

icu_date_part('year', current_date, 'am_ET@calendar=traditional') to
get the current year in the Ethiopic calendar (2013 apparently)

Well, the first one probably also needs a format string too, actual
details to be worked out by reading the ICU manual...

Maybe instead of making a new extension, I might try to start from
https://github.com/dverite/icu_ext and see if it makes sense to extend
it to cover calendars.

Maybe one day ICU will become a hard dependency of PostgreSQL and
someone will propose all that stuff into core, and then maybe we could
start to think about the possibility of tighter integration with the
built-in date/time functions (and LC_TIME setting? seems complicated,
see also problems with setting LC_COLLATE/datcollate to an ICU
collation name, but I digress and that's a far off problem). I would
also study the SQL standard and maybe DB2 (highly subjective comment:
at a wild guess, the most likely commercial RDBMS to have done a good
job of this if anyone has) to see if they contemplate non-Gregorian
calendars, to get some feel for whether that would eventually be a
possibility to conform with whatever the standard says.

In summary, getting something of very high quality by using a widely
used open source library that we already use seems like a better plan
than trying to write and maintain our own specialist knowledge about
individual calendars. If there's something you need that can't be
done with its APIs working on top of our regular date and timestamp
types, could you elaborate?

[1]: /messages/by-id/CA+hUKG+ybW0LJuLtj3yAUsbOw3DrzK00pGk8JyfpCREzi_LSsg@mail.gmail.com

#3Surafel Temesgen
surafel3000@gmail.com
In reply to: Thomas Munro (#2)
Re: Calendar support in localization

Hi Thomas

On Mon, Mar 15, 2021 at 2:58 PM Thomas Munro <thomas.munro@gmail.com> wrote:

One key question here is whether you need a different date type or
just different operations (functions, operators etc) on the existing
types.

I am thinking of having a converter to a specific calendar after each
operation and function for display or storage. It works on
Ethiopice calendar and i expect it will work on other calendar too

I cc Thomas Munro and Vik because they have interest on this area

Last time it came up[1], I got as far as wondering if the best way
would be to write a set of ICU-based calendar functions. Would it be
enough for your needs to have Ethiopic calendar-aware date arithmetic
(add, subtract a month etc), date part extraction (get the current
Ethiopic day/month/year of a date), display and parsing, and have all
of these as functions that you have to call explicitly, but have them
take the standard built-in date and timestamp types, so that your
tables would store regular date and timestamp values? If not, what
else do you need?

Ethiopice calendar have 13 months so it can not be stored as date and
timestamp type and you approach seems more complicated and i suggest to
have this feature on the purpose of PostgreSQL popularities too not only
for my need

ICU is very well maintained and widely used software, and PostgreSQL
already depends on it optionally, and that's enabled in all common
distributions. In other words, maybe all the logic you want exists
already in your process's memory, we just have to figure out how to
reach it from SQL. Another reason to use ICU is that we can solve
this problem once and then it'll work for many other calendars.

Each calendar-aware date arithmetic is different so solving one calendar
problem didn't help on other calendar

Please don't suggests to fork from PostgreSQL just for this feature

I would start with an extension, and I'd try to do a small set of
simple functions, to let me write things like:

icu_format(now(), 'fr_FR@calendar=buddhist') to get a Buddhist
calendar with French words

icu_date_part('year', current_date, 'am_ET@calendar=traditional') to
get the current year in the Ethiopic calendar (2013 apparently)

Well, the first one probably also needs a format string too, actual
details to be worked out by reading the ICU manual...

I think you suggesting this by expecting the implementation is difficult
but it's not that much difficult once you fully understand Gregorian
calendar and the calendar you work on

Maybe instead of making a new extension, I might try to start from
https://github.com/dverite/icu_ext and see if it makes sense to extend
it to cover calendars.

Maybe one day ICU will become a hard dependency of PostgreSQL and
someone will propose all that stuff into core, and then maybe we could
start to think about the possibility of tighter integration with the
built-in date/time functions (and LC_TIME setting? seems complicated,
see also problems with setting LC_COLLATE/datcollate to an ICU
collation name, but I digress and that's a far off problem). I would
also study the SQL standard and maybe DB2 (highly subjective comment:
at a wild guess, the most likely commercial RDBMS to have done a good
job of this if anyone has) to see if they contemplate non-Gregorian
calendars, to get some feel for whether that would eventually be a
possibility to conform with whatever the standard says.

In summary, getting something of very high quality by using a widely
used open source library that we already use seems like a better plan
than trying to write and maintain our own specialist knowledge about
individual calendars. If there's something you need that can't be
done with its APIs working on top of our regular date and timestamp
types, could you elaborate?

[1]
/messages/by-id/CA+hUKG+ybW0LJuLtj3yAUsbOw3DrzK00pGk8JyfpCREzi_LSsg@mail.gmail.com

I don't know how you see this but for me the feature deserves a specialist
and it is not that much difficult to have one because i guess every majore
calendar have english documentation

regards
Surafel

#4Thomas Munro
thomas.munro@gmail.com
In reply to: Surafel Temesgen (#3)
Re: Calendar support in localization

On Wed, Mar 17, 2021 at 6:31 AM Surafel Temesgen <surafel3000@gmail.com> wrote:

Ethiopice calendar have 13 months so it can not be stored as date and timestamp type and you approach seems more complicated and i suggest to have this feature on the purpose of PostgreSQL popularities too not only for my need

I know, but the DATE and TIMESTAMPTZ datatypes don't intrinsically
know anything about months or other calendar concepts. Internally,
they are just a single number that counts the number of days or
seconds since an arbitrary epoch time. We are all in agreement about
how many times the Earth has rotated since then*. The calendar
concepts such as "day", "month", "year", whether Gregorian, Ethiopic,
Islamic, ... are all derivable from those numbers, if you know the
rules.

So I think you should seriously consider using the same types.

Each calendar-aware date arithmetic is different so solving one calendar problem didn't help on other calendar

They have a *lot* in common though. They have similar "fields" (day,
month, year etc), based on the Earth, moon, sun etc, so it is possible
to use a common abstraction to interact with them. I haven't studied
it too closely, but it looks like ICU can give you a "Calendar" object
for a given Locale (which you create from a string like
"am_ET@calendar=traditional") and timezone ("Africa/Addis_Ababa").
Then you can set the object's time to X seconds since an epoch, based
on UTC seconds without leap seconds -- which is exactly like our
TIMESTAMPTZ's internal value -- and then you can query it to get
fields like month etc. Or do the opposite, or use formatting and
parsing routines etc. Internally, ICU has a C++ class for each
calendar with a name like EthiopicCalendar, IslamicCalendar etc which
encapsulates all the logic, but it's not necessary to use them
directly: we could just look them up with names via the C API and then
treat them all the same.

I think you suggesting this by expecting the implementation is difficult but it's not that much difficult once you fully understand Gregorian calendar and the calendar you work on

Yeah, I am sure it's all just a bunch of simple integer maths. But
I'm talking about things like software architecture, maintainability,
cohesion, and getting maximum impact for the work we do.

I may be missing some key detail though: why do you think it should be
a different type? The two reasons I can think of are: (1) the
slightly tricky detail that the date apparently changes at 1:00am
(which I don't think is a show stopper for this approach, I could
elaborate), (2) you may want dates to be formatted on the screen with
the Ethiopic calendar in common software like psql and GUI clients,
which may be easier to arrange with different types, but that seems to
be a cosmetic thing that could eventually be handled with tighter
locale integration with ICU. In the early stages you'd access
calendar logic though special functions with names like
icu_format_date(), or whatever.

Maybe I'm totally wrong about all of this, but this is the first way
I'd probably try to tackle this problem, and I suspect it has the
highest chance of eventually being included in core PostgreSQL.

*I mean, we can discuss the different "timelines" like UT, UTC, TAI
etc, but that's getting into the weeds, the usual timeline for
computer software outside specialist scientific purposes is UTC
without leap seconds.

#5Surafel Temesgen
surafel3000@gmail.com
In reply to: Thomas Munro (#4)
Re: Calendar support in localization

On Tue, Mar 16, 2021 at 12:20 PM Thomas Munro <thomas.munro@gmail.com>
wrote:

On Wed, Mar 17, 2021 at 6:31 AM Surafel Temesgen <surafel3000@gmail.com>
wrote:

Ethiopice calendar have 13 months so it can not be stored as date and

timestamp type and you approach seems more complicated and i suggest to
have this feature on the purpose of PostgreSQL popularities too not only
for my need

I know, but the DATE and TIMESTAMPTZ datatypes don't intrinsically
know anything about months or other calendar concepts. Internally,
they are just a single number that counts the number of days or
seconds since an arbitrary epoch time. We are all in agreement about
how many times the Earth has rotated since then*. The calendar
concepts such as "day", "month", "year", whether Gregorian, Ethiopic,
Islamic, ... are all derivable from those numbers, if you know the
rules.

Okay

I think you suggesting this by expecting the implementation is difficult

but it's not that much difficult once you fully understand Gregorian
calendar and the calendar you work on

Yeah, I am sure it's all just a bunch of simple integer maths. But
I'm talking about things like software architecture, maintainability,
cohesion, and getting maximum impact for the work we do.

I may be missing some key detail though: why do you think it should be
a different type? The two reasons I can think of are: (1) the
slightly tricky detail that the date apparently changes at 1:00am
(which I don't think is a show stopper for this approach, I could
elaborate), (2) you may want dates to be formatted on the screen with
the Ethiopic calendar in common software like psql and GUI clients,
which may be easier to arrange with different types, but that seems to
be a cosmetic thing that could eventually be handled with tighter
locale integration with ICU. In the early stages you'd access
calendar logic though special functions with names like
icu_format_date(), or whatever.

As you mention above whatever the calendar type is we ended up storing an
integer that represent the date so rather than re-implementing every
function and operation for every calendar we can use existing Gerigorian
implementation as a base implementation and if new calendar want to perform
same function or operation it translate to Gregorian one and use the
existing function and operation and translate to back to working calendar.
In this approach the only function we want for supporting a new calendar is
a translator from the new calendar to Gregorian one and from Gerigorian
calendar to the new calendar and may be input/ output function. What do you
think of this implementation?

regards
Surafel

#6Robert Haas
robertmhaas@gmail.com
In reply to: Surafel Temesgen (#5)
Re: Calendar support in localization

On Wed, Mar 17, 2021 at 9:54 AM Surafel Temesgen <surafel3000@gmail.com> wrote:

As you mention above whatever the calendar type is we ended up storing an integer that represent the date so rather than re-implementing every function and operation for every calendar we can use existing Gerigorian implementation as a base implementation and if new calendar want to perform same function or operation it translate to Gregorian one and use the existing function and operation and translate to back to working calendar. In this approach the only function we want for supporting a new calendar is a translator from the new calendar to Gregorian one and from Gerigorian calendar to the new calendar and may be input/ output function. What do you think of this implementation?

I'm not sure what the best way of tackling this problem is, but I
wanted to mention another possible approach: instead of actually using
the timestamptz data type, have another data type that is
binary-compatible with timestamptz - that is, it's the same thing on
disk, so you can cast between the two data types for free. Then have
separate input/output functions for it, separate operators and
functions and so forth.

It's not very obvious how to scale this kind of approach to a wide
variety of calendar types, and as Thomas says, it would much cooler to
be able to handle all of the ones that ICU knows how to support rather
than just one. But, the problem I see with using timestamptz is that
it's not so obvious how to get a different output format ... unless, I
guess, we can cram it into DateStyle. And it's also much less obvious
how you get the other functions and operators to do what you want, if
it's different.

--
Robert Haas
EDB: http://www.enterprisedb.com

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Robert Haas (#6)
Re: Calendar support in localization

Robert Haas <robertmhaas@gmail.com> writes:

It's not very obvious how to scale this kind of approach to a wide
variety of calendar types, and as Thomas says, it would much cooler to
be able to handle all of the ones that ICU knows how to support rather
than just one. But, the problem I see with using timestamptz is that
it's not so obvious how to get a different output format ... unless, I
guess, we can cram it into DateStyle. And it's also much less obvious
how you get the other functions and operators to do what you want, if
it's different.

Yeah, I'm afraid that it probably is different. The most obvious
example is in operations involving type interval:
select now() + '1 month'::interval;
That should almost certainly give a different answer when using a
different calendar --- indeed the units of interest might not even
be the same. (Do all human calendars use the concept of months?)

I don't feel like DateStyle is chartered to affect the behavior
of datetime operators; it's understood as tweaking the I/O behavior
only. There might be more of a case for letting LC_TIME choose
this behavior, but I bet the relevant standards only contemplate
Gregorian calendars. Also, the SQL spec says in so many words
that the SQL-defined datetime types follow the Gregorian calendar.

So on the whole, new datatypes and operators seem like the way
to go. I concur that if ICU has solved this problem, piggybacking
on it seems more promising than writing our own code.

regards, tom lane

#8Thomas Munro
thomas.munro@gmail.com
In reply to: Tom Lane (#7)
Re: Calendar support in localization

On Thu, Mar 18, 2021 at 3:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

It's not very obvious how to scale this kind of approach to a wide
variety of calendar types, and as Thomas says, it would much cooler to
be able to handle all of the ones that ICU knows how to support rather
than just one. But, the problem I see with using timestamptz is that
it's not so obvious how to get a different output format ... unless, I
guess, we can cram it into DateStyle. And it's also much less obvious
how you get the other functions and operators to do what you want, if
it's different.

Yeah, I'm afraid that it probably is different. The most obvious
example is in operations involving type interval:
select now() + '1 month'::interval;
That should almost certainly give a different answer when using a
different calendar --- indeed the units of interest might not even
be the same. (Do all human calendars use the concept of months?)

Right, so if this is done by trying to extend Daniel Verite's icu_ext
extension (link given earlier) and Robert's idea of a fast-castable
type, I suppose you might want now()::icu_date + '1 month'::internal
to advance you by one Ethiopic month if you have done SET
icu_ext.ICU_LC_TIME = 'am_ET@calendar=traditional'. Or if using my
first idea of just sticking with the core types, perhaps you'd have to
replace stuff via search path... I admit that sounds rather error
prone and fragile (I was thinking mainly of different functions, not
operators). Either way, I suppose there'd also be more explicit
functions for various operations including ones that take an extra
argument if you want to use an explicit locale instead of relying on
the ICU_LC_TIME setting. I dunno.

As for whether all calendars have months, it looks like ICU's model
has just the familiar looking standardised fields; whether some of
them make no sense in some calendars, I don't know, but it has stuff
like x.get(field, &error), x.set(field, &error), x.add(field, amount,
&error) and if it fails for some field on your particular calendar, or
for some value (you can't set a Gregorian date's month to 13
(apparently we call this month "undecember", hah), but you can for a
Hebrew or Ethiopic one) I suppose we'd just report the error?

I don't feel like DateStyle is chartered to affect the behavior
of datetime operators; it's understood as tweaking the I/O behavior
only. There might be more of a case for letting LC_TIME choose
this behavior, but I bet the relevant standards only contemplate

About LC_TIME... I suppose in one possible future we eventually use
ICU for more core stuff, and someone proposes to merge hypothetical
icu_date etc types into the core date etc types, and then LC_TIME
controls that. But then you might have a version of the problem that
Peter E ran into in attempts so far to use ICU collations as the
default: if you put ICU's funky extensible locale names into the
LC_XXX environment variables, then your libc will see it too, and
might get upset, since PostgreSQL uses the en. I suspect that ICU
will understand typical libc locale names, but common libcs won't
understand ICU's highly customisable syntax, but I haven't looked into
it. If that's generally true, then perhaps the solution to both
problems is a kind of partial separation: regular LC_XXX, and then
also ICU_LC_XXX which defaults to the same value but can be changed to
access more advanced stuff, and is used only for interacting with ICU.

Gregorian calendars. Also, the SQL spec says in so many words
that the SQL-defined datetime types follow the Gregorian calendar.

:-(

#9Vik Fearing
vik@postgresfriends.org
In reply to: Tom Lane (#7)
Re: Calendar support in localization

On 3/17/21 3:48 PM, Tom Lane wrote:

Also, the SQL spec says in so many words
that the SQL-defined datetime types follow the Gregorian calendar.

We already don't follow the SQL spec for timestamps (and I, for one,
think we are better for it) so I don't think we should worry about that.
--
Vik Fearing

#10Surafel Temesgen
surafel3000@gmail.com
In reply to: Thomas Munro (#8)
Re: Calendar support in localization

On Wed, Mar 17, 2021 at 3:39 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Thu, Mar 18, 2021 at 3:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Right, so if this is done by trying to extend Daniel Verite's icu_ext
extension (link given earlier) and Robert's idea of a fast-castable
type, I suppose you might want now()::icu_date + '1 month'::internal
to advance you by one Ethiopic month if you have done SET
icu_ext.ICU_LC_TIME = 'am_ET@calendar=traditional'. Or if using my
first idea of just sticking with the core types, perhaps you'd have to
replace stuff via search path... I admit that sounds rather error
prone and fragile (I was thinking mainly of different functions, not
operators). Either way, I suppose there'd also be more explicit
functions for various operations including ones that take an extra
argument if you want to use an explicit locale instead of relying on
the ICU_LC_TIME setting. I dunno.

As you know internally timestamptz data type does't existe instead it
stored as integer kind and we depend on operating system and external
library for our date data type support so i think that put as on the
position for not be the first one to implement timestamptz data type thing
and i don't know who give as the casting for free?

regards
Surafel

#11Daniel Verite
daniel@manitou-mail.org
In reply to: Thomas Munro (#8)
Re: Calendar support in localization

Thomas Munro wrote:

Right, so if this is done by trying to extend Daniel Verite's icu_ext
extension (link given earlier) and Robert's idea of a fast-castable
type, I suppose you might want now()::icu_date + '1 month'::internal
to advance you by one Ethiopic month if you have done SET
icu_ext.ICU_LC_TIME = 'am_ET@calendar=traditional'.

I've pushed a calendar branch on icu_ext [1]https://github.com/dverite/icu_ext/tree/calendar with preliminary support
for non-gregorian calendars through ICU, so far with only format and parse
of timetamptz.
The ICU locale drives both the localization of field names (language) and the
choice of calendar.

It looks like this:

\set fmt 'dd/MMMM/yyyy GGGG HH:mm:ss.SSS zz'

WITH list(cal) AS ( values
('gregorian'),
('japanese'),
('buddhist'),
('roc'),
('persian'),
('islamic-civil'),
('islamic'),
('hebrew'),
('chinese'),
('indian'),
('coptic'),
('ethiopic'),
('ethiopic-amete-alem'),
('iso8601'),
('dangi')
),
fmt AS (
select
cal,
icu_format_date(now(), :'fmt', 'fr@calendar='||cal) as now_str,
icu_format_date(now()+'1 month'::interval, :'fmt', 'fr@calendar='||cal) as
plus_1m
from list
)
SELECT
cal,
now_str,
icu_parse_date(now_str, :'fmt', 'fr@calendar='||cal) as now_parsed,
plus_1m,
icu_parse_date(plus_1m, :'fmt', 'fr@calendar='||cal) as plus_1m_parsed
FROM fmt;

-[ RECORD 1 ]--+-------------------------------------------------------
cal | gregorian
now_str | 26/mars/2021 après Jésus-Christ 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/2021 après Jésus-Christ 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 2 ]--+-------------------------------------------------------
cal | japanese
now_str | 26/mars/0033 Heisei 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/0033 Heisei 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 3 ]--+-------------------------------------------------------
cal | buddhist
now_str | 26/mars/2564 ère bouddhique 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/2564 ère bouddhique 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 4 ]--+-------------------------------------------------------
cal | roc
now_str | 26/mars/0110 RdC 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/0110 RdC 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 5 ]--+-------------------------------------------------------
cal | persian
now_str | 06/farvardin/1400 Anno Persico 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 06/ordibehešt/1400 Anno Persico 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 6 ]--+-------------------------------------------------------
cal | islamic-civil
now_str | 12/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 7 ]--+-------------------------------------------------------
cal | islamic
now_str | 13/chaabane/1442 ère de l’Hégire 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 14/ramadan/1442 ère de l’Hégire 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 8 ]--+-------------------------------------------------------
cal | hebrew
now_str | 13/nissan/5781 Anno Mundi 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 14/iyar/5781 Anno Mundi 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 9 ]--+-------------------------------------------------------
cal | chinese
now_str | 14/èryuè/0038 78 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 15/sānyuè/0038 78 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 10 ]-+-------------------------------------------------------
cal | indian
now_str | 05/chaitra/1943 ère Saka 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 06/vaishākh/1943 ère Saka 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 11 ]-+-------------------------------------------------------
cal | coptic
now_str | 17/barmahât/1737 après Dioclétien 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/barmoudah/1737 après Dioclétien 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 12 ]-+-------------------------------------------------------
cal | ethiopic
now_str | 17/mägabit/2013 après l’Incarnation 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/miyazya/2013 après l’Incarnation 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 13 ]-+-------------------------------------------------------
cal | ethiopic-amete-alem
now_str | 17/mägabit/7513 ERA0 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/miyazya/7513 ERA0 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 14 ]-+-------------------------------------------------------
cal | iso8601
now_str | 26/mars/2021 après Jésus-Christ 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/2021 après Jésus-Christ 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 15 ]-+-------------------------------------------------------
cal | dangi
now_str | 14/èryuè/0038 78 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 15/sānyuè/0038 78 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02

I understand that adding months or years with some of the non-gregorian
calendars should lead to different points in time than with the gregorian
calendar.

For instance with the ethiopic calendar, the query above displays today as
17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
while the correct result is probably 17/miyazya/2013 (?)

I'm not sure at this point that there should be a new set of
data/interval/timestamp types though, especially if considering
the integration in core.

About intervals, if there were locale-aware functions like
add_interval(timestamptz, interval [, locale]) returns timestamptz
or
sub_timestamp(timestamptz, timestamptz [,locale]) returns interval
that would use ICU to compute the results according to the locale,
wouldn't it be good enough?

Another argument for new datatypes could be that getting the
localized-by-ICU display/parsing without function calls around the dates
means new I/O functions. In the context of the extension, probably,
but in core, if DateStyle is extended to divert the I/O of date/timestamp[tz]
to ICU, I guess it could work with the existing types.

Another reason to have new datatypes could be that users would like
to use a localized calendar only on specific fields. I don't know if that
makes sense.

[1]: https://github.com/dverite/icu_ext/tree/calendar

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#12Thomas Munro
thomas.munro@gmail.com
In reply to: Daniel Verite (#11)
Re: Calendar support in localization

On Sat, Mar 27, 2021 at 6:51 AM Daniel Verite <daniel@manitou-mail.org> wrote:

now_str | 17/mägabit/2013 après l’Incarnation 18:22:07.566 UTC+1

Very nice!

For instance with the ethiopic calendar, the query above displays today as
17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
while the correct result is probably 17/miyazya/2013 (?)

I'm not sure at this point that there should be a new set of
data/interval/timestamp types though, especially if considering
the integration in core.

About intervals, if there were locale-aware functions like
add_interval(timestamptz, interval [, locale]) returns timestamptz
or
sub_timestamp(timestamptz, timestamptz [,locale]) returns interval
that would use ICU to compute the results according to the locale,
wouldn't it be good enough?

+1, I'd probably do that next if I were hacking on this...

#13Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Daniel Verite (#11)
Re: Calendar support in localization

On Fri, 26 Mar 2021 at 18:51, Daniel Verite <daniel@manitou-mail.org> wrote:

[...]
-[ RECORD 2 ]--+-------------------------------------------------------
cal | japanese
now_str | 26/mars/0033 Heisei 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 26/avril/0033 Heisei 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 3 ]--+-------------------------------------------------------
[...]
-[ RECORD 12 ]-+-------------------------------------------------------
cal | ethiopic
now_str | 17/mägabit/2013 après l’Incarnation 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/miyazya/2013 après l’Incarnation 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
-[ RECORD 13 ]-+-------------------------------------------------------
cal | ethiopic-amete-alem
now_str | 17/mägabit/7513 ERA0 18:22:07.566 UTC+1
now_parsed | 2021-03-26 18:22:07.566+01
plus_1m | 18/miyazya/7513 ERA0 18:22:07.566 UTC+2
plus_1m_parsed | 2021-04-26 18:22:07.566+02
[...]
I understand that adding months or years with some of the non-gregorian
calendars should lead to different points in time than with the gregorian
calendar.

For instance with the ethiopic calendar, the query above displays today as
17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
while the correct result is probably 17/miyazya/2013 (?)

Seeing the results for Japanese locale, you might want to update your
ICU library, which could fix this potential inconsistency.

The results for the Japanese locale should be "0003 Reiwa" instead of
"0033 Heisei", as the era changed in 2019. ICU releases have since
implemented this and other corrections; this specific change was
implemented in the batched release of ICU versions on 2019-04-12.

With regards,

Matthias van de Meent

#14Daniel Verite
daniel@manitou-mail.org
In reply to: Matthias van de Meent (#13)
Re: Calendar support in localization

Matthias van de Meent wrote:

The results for the Japanese locale should be "0003 Reiwa" instead of
"0033 Heisei", as the era changed in 2019. ICU releases have since
implemented this and other corrections; this specific change was
implemented in the batched release of ICU versions on 2019-04-12.

Right. I've run this test on an Ubuntu 18.04 desktop which comes with
libicu60 . The current version for my system is 60.2-3ubuntu3.1.
Ubuntu maintainers did not pick up the change of the new Japanese era.
As a guess, it's because it's not a security fix.
This contrasts with the baseline maintainers, who did an
exceptional effort to backpatch this down to ICU 53
(exceptional in the sense that they don't do that for bugfixes).

For instance with the ethiopic calendar, the query above displays today as
17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
while the correct result is probably 17/miyazya/2013 (?)

Seeing the results for Japanese locale, you might want to update your
ICU library, which could fix this potential inconsistency.

I agree it's always best to have the latest ICU version, but in the
context of Postgres, we have to work with the versions that are
typically installed on users systems. People who have pre-2019
versions will simply be stuck with the previous Japanese era.

Anyway, for the specific problem that the interval datatype cannot be
used seamlessly across all calendars, it's essentially about how days
are mapped into calendars, and it's unrelated to ICU updates AFAIU.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#15Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Daniel Verite (#14)
Re: Calendar support in localization

On Mon, 29 Mar 2021 at 14:33, Daniel Verite <daniel@manitou-mail.org> wrote:

Matthias van de Meent wrote:

The results for the Japanese locale should be "0003 Reiwa" instead of
"0033 Heisei", as the era changed in 2019. ICU releases have since
implemented this and other corrections; this specific change was
implemented in the batched release of ICU versions on 2019-04-12.

Right. I've run this test on an Ubuntu 18.04 desktop which comes with
libicu60 . The current version for my system is 60.2-3ubuntu3.1.
Ubuntu maintainers did not pick up the change of the new Japanese era.
As a guess, it's because it's not a security fix.
This contrasts with the baseline maintainers, who did an
exceptional effort to backpatch this down to ICU 53
(exceptional in the sense that they don't do that for bugfixes).

For instance with the ethiopic calendar, the query above displays today as
17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
while the correct result is probably 17/miyazya/2013 (?)

Seeing the results for Japanese locale, you might want to update your
ICU library, which could fix this potential inconsistency.

I agree it's always best to have the latest ICU version, but in the
context of Postgres, we have to work with the versions that are
typically installed on users systems. People who have pre-2019
versions will simply be stuck with the previous Japanese era.

Anyway, for the specific problem that the interval datatype cannot be
used seamlessly across all calendars, it's essentially about how days
are mapped into calendars, and it's unrelated to ICU updates AFAIU.

Ah, yes, I only glanced over the supplied query and misunderstood it
due to not taking enough time. I understood it as 'use icu locale info
to add 1 month to the current date', which would use ICU knowledge
about months in the locale and would be consistent with the question
mark, instead of 'use icu to interpret the result of adding one
non-icu-locale-dependent month to the current non-icu-locale-dependent
date'. If it were the former, my response would have made more sense,
but it doesn't in this case. So, sorry for the noise.

About intervals, if there were locale-aware functions like
add_interval(timestamptz, interval [, locale]) returns timestamptz
or
sub_timestamp(timestamptz, timestamptz [,locale]) returns interval
that would use ICU to compute the results according to the locale,
wouldn't it be good enough?

I agree, that should fix the issues at hand / grant a workable path
for locale-aware timestamp manipulation.

With regards,

Matthias van de Meent

#16Surafel Temesgen
surafel3000@gmail.com
In reply to: Daniel Verite (#11)
Re: Calendar support in localization

Hi Daniel,

On Fri, Mar 26, 2021 at 8:51 PM Daniel Verite <daniel@manitou-mail.org>
wrote:

Thomas Munro wrote:

Right, so if this is done by trying to extend Daniel Verite's icu_ext
extension (link given earlier) and Robert's idea of a fast-castable
type, I suppose you might want now()::icu_date + '1 month'::internal
to advance you by one Ethiopic month if you have done SET
icu_ext.ICU_LC_TIME = 'am_ET@calendar=traditional'.

I've pushed a calendar branch on icu_ext [1] with preliminary support
for non-gregorian calendars through ICU, so far with only format and parse
of timetamptz.

Thanks

I understand that adding months or years with some of the non-gregorian
calendars should lead to different points in time than with the gregorian
calendar.

For instance with the ethiopic calendar, the query above displays today as
17/mägabit/2013 and 1 month from now as 18/miyazya/2013,
while the correct result is probably 17/miyazya/2013 (?)

yes it should be 17/miyazya/2013 (?)

I'm not sure at this point that there should be a new set of
data/interval/timestamp types though, especially if considering
the integration in core.

About intervals, if there were locale-aware functions like
add_interval(timestamptz, interval [, locale]) returns timestamptz
or
sub_timestamp(timestamptz, timestamptz [,locale]) returns interval
that would use ICU to compute the results according to the locale,
wouldn't it be good enough?

Yes it can be enough for now but there are patches proposed to support the
system and application time period which are in SQL standard and if we have
that feature the calendar has to be in core and It doesn't appear hard for
me to support the calendar locally. Postgresql itself does't store
Gregorian date it stores julian date(which is more accurate than gregorian
calendar) and almost all of function and operator is done using julian date
converted to second(TimestampTz) so what it takes to support calendar
locally is input/output function and a converter from and to julian
calendar and that may not be that much hard since most of the world
calendar is based on julian or gregorian calendar[0]. Thought?

0.https://en.wikipedia.org/wiki/List_of_calendars

regards
Surafel

#17Daniel Verite
daniel@manitou-mail.org
In reply to: Surafel Temesgen (#16)
Re: Calendar support in localization

Surafel Temesgen wrote:

About intervals, if there were locale-aware functions like
add_interval(timestamptz, interval [, locale]) returns timestamptz
or
sub_timestamp(timestamptz, timestamptz [,locale]) returns interval
that would use ICU to compute the results according to the locale,
wouldn't it be good enough?

Yes it can be enough for now but there are patches proposed to support the
system and application time period which are in SQL standard

To clarify, these function signatures are not meant to oppose
a core vs extension implementation, nor an ICU vs non-ICU
implementation. They're meant to illustrate the case of using
specific functions instead of adding specific data types.
AFAIU, adding data types come from the idea that since
(non-gregorian-date + interval) doesn't have the same result as
(gregorian-date + interval), we could use a different type for
non-gregorian-date and so a different "+" operator, maybe
even a specific interval type.

For the case of temporal tables, I'm not quite familiar with the
feature, but I notice that the patch says:

+    When system versioning is specified two columns are added which
+    record the start timestamp and end timestamp of each row verson.
+    The data type of these columns will be TIMESTAMP WITH TIME ZONE.

The user doesn't get to choose the data type, so if we'd require to
use specific data types for non-gregorian calendars, that would
seemingly complicate things for this feature. This is consistent
with the remark upthread that the SQL standard assumes the
gregorian calendar.

what it takes to support calendar locally is input/output function
and a converter from and to julian calendar and that may not be that
much hard since most of the world calendar is based on julian or
gregorian calendar[0]

The conversions from julian dates are not necessarily hard, but the
I/O functions means having localized names for all days, months, eras
of all calendars in all supported languages. If you're thinking of
implementing this from scratch (without the ICU dependency), where
would these names come from? OTOH if we're using ICU, then why
bother reinventing the julian-to-calendars conversions that ICU
already does?

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

#18Surafel Temesgen
surafel3000@gmail.com
In reply to: Daniel Verite (#17)
Re: Calendar support in localization

On Tue, Mar 30, 2021 at 11:16 AM Daniel Verite <daniel@manitou-mail.org>
wrote:

The conversions from julian dates are not necessarily hard, but the
I/O functions means having localized names for all days, months, eras
of all calendars in all supported languages. If you're thinking of
implementing this from scratch (without the ICU dependency), where
would these names come from? OTOH if we're using ICU, then why
bother reinventing the julian-to-calendars conversions that ICU
already does?

i donno why but currently we are using our own function for
converting (see j2date and date2j) maybe it's written before ICU but i
think ICU helps in adding other calendar support easly. Regarding I/O
functions postgresql hard coded days and months names on array and just
parse and string compare, if it is not on the list then error(see
datetime.c) and it will be the same for other calendar but i think we don't
need all that if we use ICU

regards
Surafel

#19Thomas Munro
thomas.munro@gmail.com
In reply to: Thomas Munro (#4)
Re: Calendar support in localization

On Wed, Mar 17, 2021 at 8:20 AM Thomas Munro <thomas.munro@gmail.com> wrote:

*I mean, we can discuss the different "timelines" like UT, UTC, TAI
etc, but that's getting into the weeds, the usual timeline for
computer software outside specialist scientific purposes is UTC
without leap seconds.

(Erm, rereading this thread, I meant to write "time scales" there.)