Infinite loop for generate_series with timestamp arguments
Hi all,
I noticed, that the following query will send Postgres into an infinite
loop:
(a) select generate_series(timestamp '2025-01-30', timestamp
'2025-02-01', interval '1 month -29 days');
Adding one month to 2023-01-30 will result in 2023-02-28 due to February
having no 31st day, and then subtracting 29 days will result in
2025-01-30 again, resulting in a repetition from there. Postgres does
not detect this, and gets caught in an infinite loop. This query is not
the only way this can happen, one can imagine many different setups,
that, at some point, will lead to this problem.
One could certainly argue that “go and do an infinite loop” is plainly
the intended semantics of this query, but in other cases, Postgres tries
to guard against these types of looping generate_series statements:
(b) select generate_series(timestamp '2025-03-31', timestamp
'2025-04-01', interval '1 month -30 days');
→ ERROR: step size cannot equal zero
(c) select generate_series(timestamp '2024-12-31', timestamp
'2025-01-01', interval '1 month -31 days');
→ (0 rows)
It is a bit surprising to me, that all these three cases are handled
differently, as they seem exactly analogous. The issue with all the
expression is that the specified interval, when added to the start date,
results in the start date again.
select (timestamp '2025-01-30' + interval '1 month -29 days') as A,
(timestamp '2025-03-31' + interval '1 month -30 days') as B,
(timestamp '2024-12-31' + interval '1 month -31 days') as C;
→
a | b | c
---------------------+---------------------+---------------------
2025-01-31 00:00:00 | 2025-03-31 00:00:00 | 2024-12-31
00:00:00
Query (b) seems to be the most straight forward: Postgres seems to flag
the interval '1 month -30 days' as “empty” no matter the context.
select generate_series(timestamp '2024-12-31', timestamp
'2025-01-01', interval '1 month -30 days');
→ psql:commands.sql:1: ERROR: step size cannot equal zero
select timestamp '2024-12-31' + interval '1 month -30 days';
→ 2025-01-01 00:00:00
This should theoretically be an unproblematic query resulting in
[2024-12-31 00:00:00, 2025-01-01 00:00:00], but it seems like Postgres
treats '1 month -30 days' as an “zero-size interval”, despite there
being contexts, where it is not of size zero, and contexts, where other
intervals that aren’t flagged, are of size zero.
For query (c), the empty result makes it seem like Postgres interprets
it as having a negative step size for a date in the future; the same
result would happen with e. g. the interval '1 month -40 days' or
generate_series(10, 5, +1). This would fit with the above, where
Postgres assumes that a month always has 30 days for the purpose of
determining how generate_series should run.
Is this behavior intentional or should it be considered to be erroneous?
Naturally, working with human time representation can introduce many
subtleties and maybe it is beyond the scope of Postgres to correctly
handle them (whatever that means), but at least failing in a consistent
way shouldn’t be to much to ask for? If this is intentional, is there a
system to this that I have failed to see?
Best,
Jakob
(PS: as this is not really a bug report, but rather an inquiry to what
extent this behavior is intended, I hope `postgres-general` is the
correct choice of lists. If not, feel free to redirect me to a more
appropriate one.)
---
Jakob Teuber
jakob.teuber@tum.de
Jakob Teuber <jakob.teuber@tum.de> writes:
I noticed, that the following query will send Postgres into an infinite
loop:
(a) select generate_series(timestamp '2025-01-30', timestamp
'2025-02-01', interval '1 month -29 days');
One could certainly argue that “go and do an infinite loop” is plainly
the intended semantics of this query, but in other cases, Postgres tries
to guard against these types of looping generate_series statements:
(b) select generate_series(timestamp '2025-03-31', timestamp
'2025-04-01', interval '1 month -30 days');
→ ERROR: step size cannot equal zero
It does what it can, but there's no chance of being entirely perfect.
The core problem here is to figure out which direction the series
is intended to advance, so as to know whether the termination test
should be "less than" or "greater than" the end timestamp. The
way generate_series_timestamp does that is to see whether
interval comparison thinks the interval value is less than or
greater than a zero interval --- and if it happens to be exactly
equal, you get the whine about zero step size. But interval
comparison is far from bright:
* Interval comparison is based on converting interval values to a linear
* representation expressed in the units of the time field (microseconds,
* in the case of integer timestamps) with days assumed to be always 24 hours
* and months assumed to be always 30 days.
We could perhaps do better by doing the initial addition of the
interval and seeing if that produces a value greater than, less
than, or equal to the start timestamp. But I'm afraid that
doesn't move the goalposts very far, because as this example
shows, we might get different results in different months.
Another idea is to check, after doing each addition, to make
sure that the timestamp actually advanced in the expected
direction. But should we error out if not, or just stop?
regards, tom lane
I wrote:
We could perhaps do better by doing the initial addition of the
interval and seeing if that produces a value greater than, less
than, or equal to the start timestamp. But I'm afraid that
doesn't move the goalposts very far, because as this example
shows, we might get different results in different months.
Another idea is to check, after doing each addition, to make
sure that the timestamp actually advanced in the expected
direction. But should we error out if not, or just stop?
Here's a very POC-y patch using both of these ideas (and
choosing to error out if the interval changes sign).
If we go this way, generate_series_timestamptz would need
similar changes, and some regression test cases would be
appropriate. I'm not sure if the documentation needs
adjustment; it doesn't talk about the difficulty of
identifying the sign of an interval.
regards, tom lane