Extract epoch from Interval weird behavior
Hi all,
I noticed something odd when going through some
of the Interval code. The DAYS_PER_YEAR constant
is defined in src/include/datatype/timestamp.h.
#define DAYS_PER_YEAR 365.25 /* assumes leap year every four years */
We execute the EXTRACT and date_part functions in
src/backend/utils/adt/timestamp.c in
static Datum
interval_part_common(PG_FUNCTION_ARGS, bool retnumeric)
When executing date_part we multiply the total
years in the Interval by DAYS_PER_YEAR
result += ((double) DAYS_PER_YEAR * SECS_PER_DAY) * (interval->month / MONTHS_PER_YEAR);
result += ((double) DAYS_PER_MONTH * SECS_PER_DAY) * (interval->month % MONTHS_PER_YEAR);
result += ((double) SECS_PER_DAY) * interval->day;
However when executing EXTRACT we first truncate
DAYS_PER_YEAR to an integer, and then multiply it
by the total years in the Interval
/* this always fits into int64 */
secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
(int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
interval->day) * SECS_PER_DAY;
Is this truncation on purpose? It seems like
EXTRACT is not accounting for leap years in
it's calculation.
- Joe Koshakow
On Wed, Feb 23, 2022 at 7:42 PM Joseph Koshakow <koshy44@gmail.com> wrote:
Hi all,
I noticed something odd when going through some
of the Interval code. The DAYS_PER_YEAR constant
is defined in src/include/datatype/timestamp.h.#define DAYS_PER_YEAR 365.25 /* assumes leap year every four years */
We execute the EXTRACT and date_part functions in
src/backend/utils/adt/timestamp.c instatic Datum
interval_part_common(PG_FUNCTION_ARGS, bool retnumeric)When executing date_part we multiply the total
years in the Interval by DAYS_PER_YEARresult += ((double) DAYS_PER_YEAR * SECS_PER_DAY) * (interval->month / MONTHS_PER_YEAR);
result += ((double) DAYS_PER_MONTH * SECS_PER_DAY) * (interval->month % MONTHS_PER_YEAR);
result += ((double) SECS_PER_DAY) * interval->day;However when executing EXTRACT we first truncate
DAYS_PER_YEAR to an integer, and then multiply it
by the total years in the Interval
/* this always fits into int64 */secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
(int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
interval->day) * SECS_PER_DAY;Is this truncation on purpose? It seems like
EXTRACT is not accounting for leap years in
it's calculation.- Joe Koshakow
Oops I sent that to the wrong email. If this isn't intented I've created a patch
that fixes it, with the following two open questions
* DAYS_PER_YEAR_NUM is recalculated every time. Is there anyway
to convert a float directly to a numeric to avoid this?
* For some reason the change adds a lot of trailing zeros to the result. I'm
not sure why that is.
- Joe Koshakow
Attachments:
v1-0001-Consider-leap-years-when-extracting-epoch-from-inter.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Consider-leap-years-when-extracting-epoch-from-inter.patchDownload
From 9ac8bc7e0fcd56343a9b659aa5c6624ec81c5ff2 Mon Sep 17 00:00:00 2001
From: Joseph Koshakow <koshy44@gmail.com>
Date: Wed, 23 Feb 2022 21:25:13 -0500
Subject: [PATCH] Consider leap years when extracting epoch from interval
---
src/backend/utils/adt/timestamp.c | 28 ++++++++++++++++++--------
src/include/datatype/timestamp.h | 5 +++--
src/test/regress/expected/interval.out | 18 ++++++++++++++---
src/test/regress/sql/interval.sql | 3 +++
4 files changed, 41 insertions(+), 13 deletions(-)
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 36f8a84bcc..84396e385a 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -5264,13 +5264,17 @@ interval_part_common(PG_FUNCTION_ARGS, bool retnumeric)
if (retnumeric)
{
Numeric result;
+ Numeric secs_from_year;
int64 secs_from_day_month;
int64 val;
/* this always fits into int64 */
- secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
- (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
- interval->day) * SECS_PER_DAY;
+ secs_from_day_month = ((int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
+ interval->day) * SECS_PER_DAY;
+ secs_from_year = numeric_mul_opt_error(
+ DAYS_PER_YEAR_NUM,
+ int64_to_numeric(interval->month / MONTHS_PER_YEAR * SECS_PER_DAY),
+ NULL);
/*---
* result = secs_from_day_month + interval->time / 1'000'000
@@ -5284,12 +5288,20 @@ interval_part_common(PG_FUNCTION_ARGS, bool retnumeric)
*/
if (!pg_mul_s64_overflow(secs_from_day_month, 1000000, &val) &&
!pg_add_s64_overflow(val, interval->time, &val))
- result = int64_div_fast_to_numeric(val, 6);
+ {
+ result = numeric_add_opt_error(
+ secs_from_year,
+ int64_div_fast_to_numeric(val,6),
+ NULL);
+ }
else
- result =
- numeric_add_opt_error(int64_div_fast_to_numeric(interval->time, 6),
- int64_to_numeric(secs_from_day_month),
- NULL);
+ {
+ result = numeric_add_opt_error(
+ int64_div_fast_to_numeric(interval->time, 6),
+ int64_to_numeric(secs_from_day_month),
+ NULL);
+ result = numeric_add_opt_error(result, secs_from_year, NULL);
+ }
PG_RETURN_NUMERIC(result);
}
diff --git a/src/include/datatype/timestamp.h b/src/include/datatype/timestamp.h
index 5fa38d20d8..457f10dafc 100644
--- a/src/include/datatype/timestamp.h
+++ b/src/include/datatype/timestamp.h
@@ -65,8 +65,9 @@ typedef struct
* Assorted constants for datetime-related calculations
*/
-#define DAYS_PER_YEAR 365.25 /* assumes leap year every four years */
-#define MONTHS_PER_YEAR 12
+#define DAYS_PER_YEAR 365.25 /* assumes leap year every four years */
+#define DAYS_PER_YEAR_NUM (numeric_add_opt_error(int64_to_numeric(365), numeric_div_opt_error(int64_to_numeric(1), int64_to_numeric(4), NULL), NULL))
+#define MONTHS_PER_YEAR 12
/*
* DAYS_PER_MONTH is very imprecise. The more accurate value is
* 365.2425/12 = 30.436875, or '30 days 10:29:06'. Right now we only
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index accd4a7d90..98e5cb6967 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -1038,8 +1038,20 @@ SELECT f1,
-- internal overflow test case
SELECT extract(epoch from interval '1000000000 days');
- extract
------------------------
- 86400000000000.000000
+ extract
+-------------------------------------
+ 86400000000000.00000000000000000000
+(1 row)
+
+SELECT extract(epoch from interval '4 years');
+ extract
+--------------------------------
+ 126230400.00000000000000000000
+(1 row)
+
+SELECT date_part('epoch', interval '4 years');
+ date_part
+-----------
+ 126230400
(1 row)
diff --git a/src/test/regress/sql/interval.sql b/src/test/regress/sql/interval.sql
index 6d532398bd..f2abf1f7e4 100644
--- a/src/test/regress/sql/interval.sql
+++ b/src/test/regress/sql/interval.sql
@@ -355,3 +355,6 @@ SELECT f1,
-- internal overflow test case
SELECT extract(epoch from interval '1000000000 days');
+
+SELECT extract(epoch from interval '4 years');
+SELECT date_part('epoch', interval '4 years');
--
2.25.1
Hi Joseph,
Is this truncation on purpose? It seems like
EXTRACT is not accounting for leap years in
it's calculation.
Extracting an epoch from an interval is quite a strange case since
intervals are not connected to any specific dates.
For instance:
select extract('epoch' from interval '1 month')
.. returns 2592000 = 30*24*60*60. But what if the month is February? Should
we account for the different number of days for intervals like 6 months or
24 months?
Also, leap years don't just happen every 4 years. Here is the actual logic:
bool is_leap_year(int Y) {
if(Y % 400 == 0) return true;
else if(Y % 100 == 0) return false;
else if(Y % 4 == 0) return true;
else return false;
}
And what about leap seconds?
All in all, I don't think that the benefit of the proposed change outweighs
the fact that it will break the previous behavior for the users who may
rely on it. I suggest keeping it simple, i.e. the way it is now. What I
think we could do instead is explicitly document this behavior in [1]https://www.postgresql.org/docs/current/functions-datetime.html.
[1]: https://www.postgresql.org/docs/current/functions-datetime.html
--
Best regards,
Aleksander Alekseev
On Thu, Feb 24, 2022 at 4:47 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
Extracting an epoch from an interval is quite a strange case since intervals are not connected to any specific dates.
I agree, I think it's a weird use case and that it's probably not
worth fixing. Though it was fun for me to try.
All in all, I don't think that the benefit of the proposed change outweighs the fact that it will break the previous behavior for the users who may rely on it. I suggest keeping it simple, i.e. the way it is now. What I think we could do instead is explicitly document this behavior in [1].
[1]: https://www.postgresql.org/docs/current/functions-datetime.html
I do want to briefly mention, if I'm understanding the history of
EXTRACT correctly, that the previous behavior
actually was to multiply by 365.25, not 365. However The commit that
changed the return type from numeric [1]https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2da77cdb4661826482ebf2ddba1f953bc74afe4
changed that behavior. Looking through the discussions [2]/messages/by-id/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu, I don't
see any mention of it, which makes me think
it was a mistake. However there is a lot of discussion around numeric
performance and being able to optimize
numeric division because every divisor was a power of 10. Fixing this
issue would break that assumption and
cause some performance degradations which probably isn't worth it.
[1]: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=a2da77cdb4661826482ebf2ddba1f953bc74afe4
[2]: /messages/by-id/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
- Joe Koshakow
Joseph Koshakow <koshy44@gmail.com> writes:
I do want to briefly mention, if I'm understanding the history of
EXTRACT correctly, that the previous behavior
actually was to multiply by 365.25, not 365. However The commit that
changed the return type from numeric [1]
changed that behavior. Looking through the discussions [2], I don't
see any mention of it, which makes me think
it was a mistake.
Indeed ... Peter?
regards, tom lane
On 24.02.22 03:35, Joseph Koshakow wrote:
However when executing EXTRACT we first truncate
DAYS_PER_YEAR to an integer, and then multiply it
by the total years in the Interval
/* this always fits into int64 */secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
(int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
interval->day) * SECS_PER_DAY;Is this truncation on purpose? It seems like
EXTRACT is not accounting for leap years in
it's calculation.
This was not intentional. The cast is only to make the multiplication
happen in int64; it didn't mean to drop any fractional parts.
Oops I sent that to the wrong email. If this isn't intented I've created a patch
that fixes it, with the following two open questions
* DAYS_PER_YEAR_NUM is recalculated every time. Is there anyway
to convert a float directly to a numeric to avoid this?
We really wanted to avoid doing calculations in numeric as much as
possible. So we should figure out a different way to write this. The
attached patch works for me. It's a bit ugly since it hardcodes some
factors. Maybe we can rephrase it a bit more elegantly.
Attachments:
v1-0001-Fix-extract-epoch-from-interval-calculation.patchtext/plain; charset=UTF-8; name=v1-0001-Fix-extract-epoch-from-interval-calculation.patchDownload
From 0b7222beb5260c710d79c9a0573c3b39a64acf1b Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <peter@eisentraut.org>
Date: Fri, 8 Apr 2022 13:16:25 +0200
Subject: [PATCH v1] Fix extract epoch from interval calculation
The new numeric code for extract epoch from interval accidentally
truncated the DAYS_PER_YEAR value to an integer, leading to results
that mismatched the floating-point interval_part calculations.
The commit a2da77cdb4661826482ebf2ddba1f953bc74afe4 that introduced
this actually contains the regression test change that this reverts.
I suppose this was missed at the time.
Reported-by: Joseph Koshakow <koshy44@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAAvxfHd5n%3D13NYA2q_tUq%3D3%3DSuWU-CufmTf-Ozj%3DfrEgt7pXwQ%40mail.gmail.com
---
src/backend/utils/adt/timestamp.c | 6 +++---
src/test/regress/expected/interval.out | 4 ++--
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/backend/utils/adt/timestamp.c b/src/backend/utils/adt/timestamp.c
index 1c0bf0aa5c..2677d27632 100644
--- a/src/backend/utils/adt/timestamp.c
+++ b/src/backend/utils/adt/timestamp.c
@@ -5288,9 +5288,9 @@ interval_part_common(PG_FUNCTION_ARGS, bool retnumeric)
int64 val;
/* this always fits into int64 */
- secs_from_day_month = ((int64) DAYS_PER_YEAR * (interval->month / MONTHS_PER_YEAR) +
- (int64) DAYS_PER_MONTH * (interval->month % MONTHS_PER_YEAR) +
- interval->day) * SECS_PER_DAY;
+ secs_from_day_month = ((int64) (4 * DAYS_PER_YEAR) * (interval->month / MONTHS_PER_YEAR) +
+ (int64) (4 * DAYS_PER_MONTH) * (interval->month % MONTHS_PER_YEAR) +
+ (int64) 4 * interval->day) * SECS_PER_DAY/4;
/*---
* result = secs_from_day_month + interval->time / 1'000'000
diff --git a/src/test/regress/expected/interval.out b/src/test/regress/expected/interval.out
index e4b1246f45..8e2d535543 100644
--- a/src/test/regress/expected/interval.out
+++ b/src/test/regress/expected/interval.out
@@ -953,11 +953,11 @@ SELECT f1,
@ 1 min | 0 | 0.000 | 0.000000 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 60.000000
@ 5 hours | 0 | 0.000 | 0.000000 | 0 | 5 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 18000.000000
@ 10 days | 0 | 0.000 | 0.000000 | 0 | 0 | 10 | 0 | 1 | 0 | 0 | 0 | 0 | 864000.000000
- @ 34 years | 0 | 0.000 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 34 | 3 | 0 | 0 | 1072224000.000000
+ @ 34 years | 0 | 0.000 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 34 | 3 | 0 | 0 | 1072958400.000000
@ 3 mons | 0 | 0.000 | 0.000000 | 0 | 0 | 0 | 3 | 2 | 0 | 0 | 0 | 0 | 7776000.000000
@ 14 secs ago | -14000000 | -14000.000 | -14.000000 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | -14.000000
@ 1 day 2 hours 3 mins 4 secs | 4000000 | 4000.000 | 4.000000 | 3 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 93784.000000
- @ 6 years | 0 | 0.000 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 6 | 0 | 0 | 0 | 189216000.000000
+ @ 6 years | 0 | 0.000 | 0.000000 | 0 | 0 | 0 | 0 | 1 | 6 | 0 | 0 | 0 | 189345600.000000
@ 5 mons | 0 | 0.000 | 0.000000 | 0 | 0 | 0 | 5 | 2 | 0 | 0 | 0 | 0 | 12960000.000000
@ 5 mons 12 hours | 0 | 0.000 | 0.000000 | 0 | 12 | 0 | 5 | 2 | 0 | 0 | 0 | 0 | 13003200.000000
(10 rows)
base-commit: 9a7229948c70945ca6ef0b36adfe61b74f4fdaf5
--
2.35.1
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
We really wanted to avoid doing calculations in numeric as much as
possible. So we should figure out a different way to write this. The
attached patch works for me. It's a bit ugly since it hardcodes some
factors. Maybe we can rephrase it a bit more elegantly.
I think it's fine but needs some commentary. Maybe about like
"To do this calculation in integer arithmetic even though
DAYS_PER_YEAR is fractional, multiply everything by 4
and then divide by 4 again at the end. This relies on
DAYS_PER_YEAR being a multiple of 0.25 and on SECS_PER_DAY
being a multiple of 4."
BTW, it might be good to parenthesize as
(... big calculation ...) * (SECS_PER_DAY/4)
to eliminate any question of whether the value could overflow
before the final division by 4.
regards, tom lane
On 08.04.22 15:10, Tom Lane wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
We really wanted to avoid doing calculations in numeric as much as
possible. So we should figure out a different way to write this. The
attached patch works for me. It's a bit ugly since it hardcodes some
factors. Maybe we can rephrase it a bit more elegantly.I think it's fine but needs some commentary. Maybe about like
"To do this calculation in integer arithmetic even though
DAYS_PER_YEAR is fractional, multiply everything by 4
and then divide by 4 again at the end. This relies on
DAYS_PER_YEAR being a multiple of 0.25 and on SECS_PER_DAY
being a multiple of 4."BTW, it might be good to parenthesize as
(... big calculation ...) * (SECS_PER_DAY/4)
to eliminate any question of whether the value could overflow
before the final division by 4.
done that way