statement_timeout affects query results fetching?
Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
When executing a query that selects a long result set, if the code
processing the results takes its time (i.e.g more than statement_timeout),
a timeout occurs. My expectation was that statement_timeout only affects
query *processing*, and does not cover the frontend actually processing the
result.
First, I wanted to confirm that this is the case (and not some sort of bug
in my code).
If this is the case, is this intended? Here are some points:
* It makes statement_timeout very difficult to use; clients do very diverse
things with database results, it may be very difficult (or impossible) to
estimate how much time they should take (e.g. random load factors on the
client machine or on some other machine receiving results).
* It makes it impossible to specifically detect problematic *queries* which
take too long to execute (as opposed to the time taken to process their
results).
If you do insist that this behavior is correct, a documentation update for
statement_timeout might make this clearer.
Thanks,
Shay
On Sat, Aug 8, 2015 at 5:31 AM, Shay Rojansky <roji@roji.org> wrote:
Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
When executing a query that selects a long result set, if the code
processing the results takes its time (i.e.g more than statement_timeout), a
timeout occurs. My expectation was that statement_timeout only affects query
*processing*, and does not cover the frontend actually processing the
result.First, I wanted to confirm that this is the case (and not some sort of bug
in my code).If this is the case, is this intended? Here are some points:
* It makes statement_timeout very difficult to use; clients do very diverse
things with database results, it may be very difficult (or impossible) to
estimate how much time they should take (e.g. random load factors on the
client machine or on some other machine receiving results).
* It makes it impossible to specifically detect problematic *queries* which
take too long to execute (as opposed to the time taken to process their
results).If you do insist that this behavior is correct, a documentation update for
statement_timeout might make this clearer.
I think the issue here is that we start returning rows to the client
while the query is still executing.
Suppose each row of output takes 100 ms of CPU time to generate and
there are 1,000,000 rows. Then it's quite conceivable that we could
be under the statement_timeout when we start returning rows to the
client, but over the statement_timeout by the time we finish -- and
the user would probably want statement_timeout to kick in in that
case, because that's a lotta CPU time.
I suppose we could try to toll statement_timeout while we're blocked
waiting for the client, but nobody wrote the code for that yet. And
it would mean that you can't use statement_timeout to prevent xmin
from lagging, which could be why you set it in the first place. There
might also be some usability difficulties: pg_stat_activity shows the
time the query started, so if you know what statement_timeout is you
can tell how close it is to being killed. If some of the time isn't
counted, then you can't tell any more.
Another approach (which I think might be better) is to have GUCs like
statement_cpu_limit and statement_io_limit that kill a query when it
uses more than the configured amount of that resource.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Shay Rojansky <roji@roji.org> writes:
Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
When executing a query that selects a long result set, if the code
processing the results takes its time (i.e.g more than statement_timeout),
a timeout occurs. My expectation was that statement_timeout only affects
query *processing*, and does not cover the frontend actually processing the
result.
Are you using a cursor, or something like that?
libpq ordinarily absorbs the result set as fast as possible and then hands
it back to the application as one blob; the time subsequently spent by the
application looking at the blob doesn't count against statement_timeout.
As Robert says, statement_timeout *does* include time spent sending the
result set to the client, and we're not going to change that, because it
would be too hard to disentangle calculation from I/O. So if the client
isn't prompt about absorbing all the data then you have to factor that
into your setting. But it would be a slightly unusual usage pattern
AFAIK.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for your responses.
I'm not using cursors or anything fancy. The expected behavior (as far as I
can tell) for a .NET database driver is to read one row at a time from the
database and make it available. There's even a standard API option for
fetching data on a column-by-column basis: this allows the user to not hold
the entire row in memory (imagine rows with megabyte-sized columns). This
makes sense to me; Tom, doesn't the libpq behavior you describe of
absorbing the result set as fast as possible mean that a lot of memory is
wasted on the client side? I'd be interested in your take on this.
I can definitely appreciate the complexity of changing this behavior. I
think that some usage cases (such as mine) would benefit from a timeout on
the time until the first row is sent, this would allow to put an upper cap
on stuff like query complexity, for example.
Shay
On Sat, Aug 8, 2015 at 5:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Shay Rojansky <roji@roji.org> writes:
Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
When executing a query that selects a long result set, if the code
processing the results takes its time (i.e.g more thanstatement_timeout),
a timeout occurs. My expectation was that statement_timeout only affects
query *processing*, and does not cover the frontend actually processingthe
result.
Are you using a cursor, or something like that?
libpq ordinarily absorbs the result set as fast as possible and then hands
it back to the application as one blob; the time subsequently spent by the
application looking at the blob doesn't count against statement_timeout.As Robert says, statement_timeout *does* include time spent sending the
result set to the client, and we're not going to change that, because it
would be too hard to disentangle calculation from I/O. So if the client
isn't prompt about absorbing all the data then you have to factor that
into your setting. But it would be a slightly unusual usage pattern
AFAIK.regards, tom lane
I'd also recommend adding a sentence about this aspect of statement_timeout
in the docs to prevent confusion...
On Sat, Aug 8, 2015 at 5:30 PM, Shay Rojansky <roji@roji.org> wrote:
Show quoted text
Thanks for your responses.
I'm not using cursors or anything fancy. The expected behavior (as far as
I can tell) for a .NET database driver is to read one row at a time from
the database and make it available. There's even a standard API option for
fetching data on a column-by-column basis: this allows the user to not hold
the entire row in memory (imagine rows with megabyte-sized columns). This
makes sense to me; Tom, doesn't the libpq behavior you describe of
absorbing the result set as fast as possible mean that a lot of memory is
wasted on the client side? I'd be interested in your take on this.I can definitely appreciate the complexity of changing this behavior. I
think that some usage cases (such as mine) would benefit from a timeout on
the time until the first row is sent, this would allow to put an upper cap
on stuff like query complexity, for example.Shay
On Sat, Aug 8, 2015 at 5:13 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Shay Rojansky <roji@roji.org> writes:
Hi everyone, I'm seeing some strange behavior and wanted to confirm it.
When executing a query that selects a long result set, if the code
processing the results takes its time (i.e.g more thanstatement_timeout),
a timeout occurs. My expectation was that statement_timeout only affects
query *processing*, and does not cover the frontend actually processingthe
result.
Are you using a cursor, or something like that?
libpq ordinarily absorbs the result set as fast as possible and then hands
it back to the application as one blob; the time subsequently spent by the
application looking at the blob doesn't count against statement_timeout.As Robert says, statement_timeout *does* include time spent sending the
result set to the client, and we're not going to change that, because it
would be too hard to disentangle calculation from I/O. So if the client
isn't prompt about absorbing all the data then you have to factor that
into your setting. But it would be a slightly unusual usage pattern
AFAIK.regards, tom lane
On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky <roji@roji.org> wrote:
the entire row in memory (imagine rows with megabyte-sized columns). This
makes sense to me; Tom, doesn't the libpq behavior you describe of absorbing
the result set as fast as possible mean that a lot of memory is wasted on
the client side?
It sure does.
I can definitely appreciate the complexity of changing this behavior. I
think that some usage cases (such as mine) would benefit from a timeout on
the time until the first row is sent, this would allow to put an upper cap
on stuff like query complexity, for example.
Unfortunately, it would not do any such thing. It's possible for the
first row to be returned really really fast and then for an arbitrary
amount of time to pass and computation to happen before all the rows
are returned. A plan can have a startup cost of zero and a run cost
of a billion (or a trillion). This kind of scenario isn't even
particularly uncommon. You just need a plan that looks like this:
Nested Loop
-> Nested Loop
-> Nested Loop
-> Seq Scan
-> Index Scan
-> Index Scan
-> Index Scan
You can just keep pushing more nested loop/index scans on there and
the first row will still pop out quite fast. But if the seq-scanned
table is large, generating the whole result set can take a long, long
time.
Even worse, you could have a case like this:
SELECT somefunc(a) FROM foo;
Now suppose somefunc is normally very quick, but if a = 42 then it
does pg_sleep() in a loop until the world ends. You're going to have
however many rows of foo have a != 42 pop out near-instantaneously,
and then it will go into the tank and not come out until the meaning
of life, the universe, and everything is finally revealed.
That second case is a little extreme, and a little artificial, but the
point is this: just as you don't want the client to have to buffer the
results in memory, the server doesn't either. It's not the case that
the server computes the rows and sends them to you. Each one is sent
to you as it is computed, and in the normal case, at the time the
first row is sent, only a small percentage of the total work of the
query has been performed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the explanation Robert, that makes total sense. However, it
seems like the utility of PG's statement_timeout is much more limited than
I thought.
In case you're interested, I dug a little further and it seems that
Microsoft's client for SQL Server implements the following timeout (source
<https://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlcommand.commandtimeout%28v=vs.110%29.aspx?f=255&MSPPError=-2147217396>
):
cumulative time-out (for all network packets that are read during the
invocation of a method) for all network reads during command execution or
processing of the results. A time-out can still occur after the first row
is returned, and does not include user processing time, only network read
time.
Since it doesn't seem possible to have a clean query-processing-only
timeout at the backend, we may be better off doing something similar to the
above and enforce timeouts on the client only. Any further thoughts on this
would be appreciated.
On Sun, Aug 9, 2015 at 5:21 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Sat, Aug 8, 2015 at 11:30 AM, Shay Rojansky <roji@roji.org> wrote:
the entire row in memory (imagine rows with megabyte-sized columns). This
makes sense to me; Tom, doesn't the libpq behavior you describe ofabsorbing
the result set as fast as possible mean that a lot of memory is wasted on
the client side?It sure does.
I can definitely appreciate the complexity of changing this behavior. I
think that some usage cases (such as mine) would benefit from a timeouton
the time until the first row is sent, this would allow to put an upper
cap
on stuff like query complexity, for example.
Unfortunately, it would not do any such thing. It's possible for the
first row to be returned really really fast and then for an arbitrary
amount of time to pass and computation to happen before all the rows
are returned. A plan can have a startup cost of zero and a run cost
of a billion (or a trillion). This kind of scenario isn't even
particularly uncommon. You just need a plan that looks like this:Nested Loop
-> Nested Loop
-> Nested Loop
-> Seq Scan
-> Index Scan
-> Index Scan
-> Index ScanYou can just keep pushing more nested loop/index scans on there and
the first row will still pop out quite fast. But if the seq-scanned
table is large, generating the whole result set can take a long, long
time.Even worse, you could have a case like this:
SELECT somefunc(a) FROM foo;
Now suppose somefunc is normally very quick, but if a = 42 then it
does pg_sleep() in a loop until the world ends. You're going to have
however many rows of foo have a != 42 pop out near-instantaneously,
and then it will go into the tank and not come out until the meaning
of life, the universe, and everything is finally revealed.That second case is a little extreme, and a little artificial, but the
point is this: just as you don't want the client to have to buffer the
results in memory, the server doesn't either. It's not the case that
the server computes the rows and sends them to you. Each one is sent
to you as it is computed, and in the normal case, at the time the
first row is sent, only a small percentage of the total work of the
query has been performed.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky <roji@roji.org> wrote:
Thanks for the explanation Robert, that makes total sense. However, it seems
like the utility of PG's statement_timeout is much more limited than I
thought.In case you're interested, I dug a little further and it seems that
Microsoft's client for SQL Server implements the following timeout (source):cumulative time-out (for all network packets that are read during the
invocation of a method) for all network reads during command execution or
processing of the results. A time-out can still occur after the first row is
returned, and does not include user processing time, only network read time.Since it doesn't seem possible to have a clean query-processing-only timeout
at the backend, we may be better off doing something similar to the above
and enforce timeouts on the client only. Any further thoughts on this would
be appreciated.
An alternative you may want to consider is using the Execute message
with a non-zero row count and reading all of the returned rows as they
come back, buffering them in memory. When those have all been
consumed, issue another Execute message and get some more rows.
AFAICS, the biggest problem with this is that there's no good way to
bound the number of rows returned by size rather than by number, which
has been complained about before by somebody else in a situation
similar to yours. Another problem is that I believe it will cause
cursor_tuple_fraction to kick in, which may change query plans. But
it does have the advantage that the query will be suspended from the
server's point of view, which I *think* will toll statement_timeout.
You might also consider exposing some knobs to the user, so that they
can set the number of rows fetched in one go, and let that be all the
rows or only some of them.
We really need a better way of doing this, but I think this is the way
other drivers are handling it now.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks (once again!) for the valuable suggestions Robert.
The idea of chunking/buffering via cursors has been raised before for
another purpose - allowing multiple queries "concurrently" at the API level
(where concurrently means interleaving when reading the resultsets). This
would imply exposing the number of rows fetched to the user like you
suggested. However, I don't think there's a way we can remove the API
option to *not* buffer (as I said before, ADO.NET even provides a standard
API feature for reading column-by-column), and therefore the general
problem remains...
I think the right solution for us at the driver level would be to switch to
driver-enforced timeouts, i.e. to no longer use statement_timeout but look
at socket read times instead. I'll look into doing that for our next
version.
Thanks for all your thoughts!
On Mon, Aug 10, 2015 at 2:30 PM, Robert Haas <robertmhaas@gmail.com> wrote:
Show quoted text
On Mon, Aug 10, 2015 at 5:25 AM, Shay Rojansky <roji@roji.org> wrote:
Thanks for the explanation Robert, that makes total sense. However, it
seems
like the utility of PG's statement_timeout is much more limited than I
thought.In case you're interested, I dug a little further and it seems that
Microsoft's client for SQL Server implements the following timeout(source):
cumulative time-out (for all network packets that are read during the
invocation of a method) for all network reads during command execution or
processing of the results. A time-out can still occur after the firstrow is
returned, and does not include user processing time, only network read
time.
Since it doesn't seem possible to have a clean query-processing-only
timeout
at the backend, we may be better off doing something similar to the above
and enforce timeouts on the client only. Any further thoughts on thiswould
be appreciated.
An alternative you may want to consider is using the Execute message
with a non-zero row count and reading all of the returned rows as they
come back, buffering them in memory. When those have all been
consumed, issue another Execute message and get some more rows.AFAICS, the biggest problem with this is that there's no good way to
bound the number of rows returned by size rather than by number, which
has been complained about before by somebody else in a situation
similar to yours. Another problem is that I believe it will cause
cursor_tuple_fraction to kick in, which may change query plans. But
it does have the advantage that the query will be suspended from the
server's point of view, which I *think* will toll statement_timeout.You might also consider exposing some knobs to the user, so that they
can set the number of rows fetched in one go, and let that be all the
rows or only some of them.We really need a better way of doing this, but I think this is the way
other drivers are handling it now.--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company