Query generates infinite loop
Hi All -
I was implementing the infinity time constants in DuckDB when I ran into an infinite loop. It seems that PG has the same problem for the same reason (adding an interval to an infinite timestamp produces the same timestamp, so the increment operation never goes anywhere.) Here is the query:
select COUNT(*)
FROM generate_series('-infinity'::TIMESTAMP, 'epoch'::TIMESTAMP, INTERVAL '1 DAY');
This seems like a DoS great attack, so we are disallowing infinities as bounds for both table and scalar series generation. As an upper bound, it eventually gives an error, so it seems there is not much utility anyway.
Met vriendelijke groet, best regards, mit freundlichen Grüßen,
Richard Wesley
Group-By Therapist
richard@duckdblabs.com <mailto:richard@duckdblabs.com>
Hi
st 20. 4. 2022 v 18:42 odesílatel Richard Wesley <richard@duckdblabs.com>
napsal:
Hi All -
I was implementing the infinity time constants in DuckDB when I ran into
an infinite loop. It seems that PG has the same problem for the same reason
(adding an interval to an infinite timestamp produces the same timestamp,
so the increment operation never goes anywhere.) Here is the query:1.
select COUNT(*) FROM generate_series('-infinity'::TIMESTAMP, 'epoch'::TIMESTAMP, INTERVAL '1 DAY');
This seems like a DoS great attack, so we are disallowing infinities as
bounds for both table and scalar series generation. As an upper bound, it
eventually gives an error, so it seems there is not much utility anyway.
There are more ways to achieve the same effect. The protection is safe
setting of temp_file_limit
2022-04-20 09:59:54) postgres=# set temp_file_limit to '1MB';
SET
(2022-04-20 18:51:48) postgres=# select COUNT(*)
FROM generate_series('-infinity'::TIMESTAMP, 'epoch'::TIMESTAMP, INTERVAL
'1 DAY');
ERROR: temporary file size exceeds temp_file_limit (1024kB)
(2022-04-20 18:51:50) postgres=#
Regards
Pavel
Show quoted text
Met vriendelijke groet, best regards, mit freundlichen Grüßen,
*Richard Wesley*
Group-By Therapist
richard@duckdblabs.com
Pavel Stehule <pavel.stehule@gmail.com> writes:
st 20. 4. 2022 v 18:42 odesílatel Richard Wesley <richard@duckdblabs.com>
napsal:select COUNT(*) FROM generate_series('-infinity'::TIMESTAMP, 'epoch'::TIMESTAMP, INTERVAL '1 DAY');
This seems like a DoS great attack, so we are disallowing infinities as
bounds for both table and scalar series generation. As an upper bound, it
eventually gives an error, so it seems there is not much utility anyway.
There are more ways to achieve the same effect. The protection is safe
setting of temp_file_limit
Well, there are any number of ways to DOS a database you can issue
arbitrary queries to. For instance, cross joining a number of very
large tables. So I'm not excited about that aspect of it. Still,
it's true that infinities as generate_series endpoints are going
to work pretty oddly, so I agree with the idea of forbidding 'em.
Numeric has infinity as of late, so the numeric variant would
need to do this too.
I think we can allow infinity as the step, though.
regards, tom lane
I wrote:
it's true that infinities as generate_series endpoints are going
to work pretty oddly, so I agree with the idea of forbidding 'em.
Numeric has infinity as of late, so the numeric variant would
need to do this too.
Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.
regards, tom lane
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
it's true that infinities as generate_series endpoints are going
to work pretty oddly, so I agree with the idea of forbidding 'em.Numeric has infinity as of late, so the numeric variant would
need to do this too.Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.
The regression test you added for this change causes an infinite loop when
run against an unpatched server with --install-check. That is a bit
unpleasant. Is there something we can and should do about that? I was
expecting regression test failures of course but not an infinite loop
leading towards disk exhaustion.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes:
The regression test you added for this change causes an infinite loop when
run against an unpatched server with --install-check. That is a bit
unpleasant. Is there something we can and should do about that? I was
expecting regression test failures of course but not an infinite loop
leading towards disk exhaustion.
We very often add regression test cases that will cause unpleasant
failures on unpatched code. I categorically reject the idea that
that's not a good thing, and question why you think that running
known-broken code against a regression suite is an important use case.
regards, tom lane
On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
it's true that infinities as generate_series endpoints are going
to work pretty oddly, so I agree with the idea of forbidding 'em.Numeric has infinity as of late, so the numeric variant would
need to do this too.Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.The regression test you added for this change causes an infinite loop when
run against an unpatched server with --install-check. That is a bit
unpleasant. Is there something we can and should do about that? I was
expecting regression test failures of course but not an infinite loop
leading towards disk exhaustion.Cheers,
Jeff
This came up once before
/messages/by-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg@mail.gmail.com
Corey Huinker <corey.huinker@gmail.com> writes:
On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.
This came up once before
/messages/by-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg@mail.gmail.com
Oh! I'd totally forgotten that thread, but given that discussion,
and particularly the counterexample at
/messages/by-id/16807.1456091547@sss.pgh.pa.us
it now feels to me like maybe this change was a mistake. Perhaps
instead of the committed change, we ought to go the other way and
rip out the infinity checks in numeric generate_series().
In view of tomorrow's minor-release wrap, there is not time for
the sort of more leisured discussion that I now think this topic
needs. I propose to revert eafdf9de0 et al before the wrap,
and think about this at more length before doing anything.
regards, tom lane
On Mon, May 9, 2022 at 12:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
On Wed, May 4, 2022 at 3:01 PM Jeff Janes <jeff.janes@gmail.com> wrote:
On Wed, Apr 20, 2022 at 5:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oh --- looks like numeric generate_series() already throws error for
this, so we should just make the timestamp variants do the same.This came up once before
/messages/by-id/CAB7nPqQUuUh_W3s55eSiMnt901Ud3meF7f_96yPkKcqfd1ZaMg@mail.gmail.com
Oh! I'd totally forgotten that thread, but given that discussion,
and particularly the counterexample at/messages/by-id/16807.1456091547@sss.pgh.pa.us
it now feels to me like maybe this change was a mistake. Perhaps
instead of the committed change, we ought to go the other way and
rip out the infinity checks in numeric generate_series().
The infinite-upper-bound-withlimit-pushdown counterexample makes sense, but
seems like we're using generate_series() only because we lack a function
that generates a series of N elements, without a specified upper bound,
something like
generate_finite_series( start, step, num_elements )
And if we did that, I'd lobby that we have one that takes dates as well as
one that takes timestamps, because that was my reason for starting the
thread above.
Corey Huinker <corey.huinker@gmail.com> writes:
The infinite-upper-bound-withlimit-pushdown counterexample makes sense, but
seems like we're using generate_series() only because we lack a function
that generates a series of N elements, without a specified upper bound,
something like
generate_finite_series( start, step, num_elements )
Yeah, that could be a reasonable thing to add.
And if we did that, I'd lobby that we have one that takes dates as well as
one that takes timestamps, because that was my reason for starting the
thread above.
Less sure about that. ISTM the reason that the previous proposal failed
was that it introduced too much ambiguity about how to resolve
unknown-type arguments. Wouldn't the same problems arise here?
regards, tom lane
Less sure about that. ISTM the reason that the previous proposal failed
was that it introduced too much ambiguity about how to resolve
unknown-type arguments. Wouldn't the same problems arise here?
If I recall, the problem was that the lack of a date-specific
generate_series function would result in a date value being coerced to
timestamp, and thus adding generate_series(date, date, step) would change
behavior of existing code, and that was a POLA violation (among other bad
things).
By adding a different function, there is no prior behavior to worry about.
So we should be safe with the following signatures doing the right thing,
yes?:
generate_finite_series(start timestamp, step interval, num_elements
integer)
generate_finite_series(start date, step integer, num_elements integer)
generate_finite_series(start date, step interval year to month,
num_elements integer)
Corey Huinker <corey.huinker@gmail.com> writes:
Less sure about that. ISTM the reason that the previous proposal failed
was that it introduced too much ambiguity about how to resolve
unknown-type arguments. Wouldn't the same problems arise here?
By adding a different function, there is no prior behavior to worry about.
True, that's one less thing to worry about.
So we should be safe with the following signatures doing the right thing,
yes?:
generate_finite_series(start timestamp, step interval, num_elements
integer)
generate_finite_series(start date, step integer, num_elements integer)
generate_finite_series(start date, step interval year to month,
num_elements integer)
No. You can experiment with it easily enough using stub functions:
regression=# create function generate_finite_series(start timestamp, step interval, num_elements
regression(# integer) returns timestamp as 'select $1' language sql;
CREATE FUNCTION
regression=# create function generate_finite_series(start date, step integer, num_elements integer) returns timestamp as 'select $1' language sql;
CREATE FUNCTION
regression=# create function generate_finite_series(start date, step interval year to month,
regression(# num_elements integer) returns timestamp as 'select $1' language sql;;
CREATE FUNCTION
regression=# select generate_finite_series(current_date, '1 day', 10);
ERROR: function generate_finite_series(date, unknown, integer) is not unique
LINE 1: select generate_finite_series(current_date, '1 day', 10);
^
HINT: Could not choose a best candidate function. You might need to add explicit type casts.
It's even worse if the first argument is also an unknown-type literal.
Sure, you could add explicit casts to force the choice of variant,
but then ease of use went out the window somewhere --- and IMO this
proposal is mostly about ease of use, since there's no fundamentally
new functionality.
It looks like you could make it work with just these three variants:
regression=# \df generate_finite_series
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+------------------------+-----------------------------+------------------------------------------------------------------------+------
public | generate_finite_series | timestamp without time zone | start date, step interval, num_elements integer | func
public | generate_finite_series | timestamp with time zone | start timestamp with time zone, step interval, num_elements integer | func
public | generate_finite_series | timestamp without time zone | start timestamp without time zone, step interval, num_elements integer | func
(3 rows)
I get non-error results with these:
regression=# select generate_finite_series(current_date, '1 day', 10);
generate_finite_series
------------------------
2022-05-10 00:00:00
(1 row)
regression=# select generate_finite_series('now', '1 day', 10);
generate_finite_series
-------------------------------
2022-05-10 19:35:33.773738-04
(1 row)
That shows that an unknown-type literal in the first argument will default
to timestamptz given these choices, which seems like a sane default.
BTW, you don't get to say "interval year to month" as a function argument,
or at least it won't do anything useful. If you want to restrict the
contents of the interval it'll have to be a runtime check inside the
function.
regards, tom lane