Add a property to automatically suspend portals as they produce given number of bytes

Started by Vladimir Sitnikov12 months ago4 messages
#1Vladimir Sitnikov
sitnikov.vladimir@gmail.com

Hi,

Applications often face an "out of memory" condition as they try to fetch
"N rows" from the database.
If N is small, then the execution becomes inefficient due to many
roundtrips.
If N is high, there's a risk that many rows would overflow the client's
memory.

Note: the client can't stop reading the data at arbitrary times as they
might want to execute another query
while processing the resultset, so "fetch all rows at once and read them
from the socket as needed" is not really an option.

The problem is the clients can't tell how much memory it would take to
buffer the results of "fetch 100 rows".
The row size might vary.

Currently, the only case the database returns PortalSuspended seems to be
when max_rows is reached when executing a select.
I wonder if the database can suspend portals earlier in case it knows it
produced a lot of data.

In other words:
1) If the user calls execute(fetch all rows), then follow the current
behavior.
2) If the user provides non-zero max_rows, then optionally suspend the
portal as the result exceeds a pre-configured amount (GUC?).

Then clients could configure the connection like "ok, please suspend
portals if a single fetch exceeds 50MiB".
To my understanding,
* it should be a backward-compatible change
* it would require no protocol changes,
* it would make applications more robust when it comes to "out of memory
while fetching many rows"
* it might improve the performance overall as the applications could
increase their pessimistic "max_rows" fetch size

Is it ok to contribute a patch like that?

Vladimir

#2Kirill Reshke
reshkekirill@gmail.com
In reply to: Vladimir Sitnikov (#1)
Re: Add a property to automatically suspend portals as they produce given number of bytes

Hi, client can use CURSOR feature to process data in batches. What is the
case where proposed feature solves problem that CURSOR does not?

https://www.postgresql.org/docs/current/plpgsql-cursors.html

On Fri, 17 Jan 2025, 16:08 Vladimir Sitnikov, <sitnikov.vladimir@gmail.com>
wrote:

Show quoted text

Hi,

Applications often face an "out of memory" condition as they try to fetch
"N rows" from the database.
If N is small, then the execution becomes inefficient due to many
roundtrips.
If N is high, there's a risk that many rows would overflow the client's
memory.

Note: the client can't stop reading the data at arbitrary times as they
might want to execute another query
while processing the resultset, so "fetch all rows at once and read them
from the socket as needed" is not really an option.

The problem is the clients can't tell how much memory it would take to
buffer the results of "fetch 100 rows".
The row size might vary.

Currently, the only case the database returns PortalSuspended seems to be
when max_rows is reached when executing a select.
I wonder if the database can suspend portals earlier in case it knows it
produced a lot of data.

In other words:
1) If the user calls execute(fetch all rows), then follow the current
behavior.
2) If the user provides non-zero max_rows, then optionally suspend the
portal as the result exceeds a pre-configured amount (GUC?).

Then clients could configure the connection like "ok, please suspend
portals if a single fetch exceeds 50MiB".
To my understanding,
* it should be a backward-compatible change
* it would require no protocol changes,
* it would make applications more robust when it comes to "out of memory
while fetching many rows"
* it might improve the performance overall as the applications could
increase their pessimistic "max_rows" fetch size

Is it ok to contribute a patch like that?

Vladimir

#3Vladimir Sitnikov
sitnikov.vladimir@gmail.com
In reply to: Kirill Reshke (#2)
Re: Add a property to automatically suspend portals as they produce given number of bytes

Kirill,

cursor does not provide a way to limit the fetch size based on the memory
consumption.
Imagine a table like (id int8, value jsonb).
If we use "fetch 1000", then it might require 1GiB on the client if every
row contains 1MiB json.
If the client plays defensively and goes for "fetch 10", it might take a
lot of time if jsons are small.

Neither cursor nor extended protocol solve the problem.

Vladimir

#4Kirill Reshke
reshkekirill@gmail.com
In reply to: Vladimir Sitnikov (#3)
Re: Add a property to automatically suspend portals as they produce given number of bytes

On Fri, 17 Jan 2025, 17:22 Vladimir Sitnikov, <sitnikov.vladimir@gmail.com>
wrote:

Kirill,

cursor does not provide a way to limit the fetch size based on the memory
consumption.
Imagine a table like (id int8, value jsonb).
If we use "fetch 1000", then it might require 1GiB on the client if every
row contains 1MiB json.
If the client plays defensively and goes for "fetch 10", it might take a
lot of time if jsons are small.

Neither cursor nor extended protocol solve the problem.

Vladimir

Hi!
Thank you for explaining this. I think you can propose your patch now,
because I don't see any major show-stopper right now.

The only issue is that this would be a PostreSQL extension, which will
impose extra maintenance pain to kernel hackers.

Also, note that we do not know individual row size in advance, because
tuples attributes may be toasted. So, your query will return first time it
tries to allocate more than $limit bytes, not before. Or, at least,
straightforward implementation of this feature would.

Best regards,
Kirill Reshke