Protocol 3, Execute, maxrows to return, impact?
I was wondering, if there is any real advantage to actually specify say
64 for the maxrows parameter to the Execute message in the PostgreSQL
network protocol?
I.e.:
- Is it easier on the server, because it somehow uses less total memory when
batching the Executes with 64 rows at a time?
- Is it better for latency when expecting notices/notifies in between?
- Is it simply slowing down the protocol because every 64 rows I
introduce a round-trip delay for the next PortalSuspended/Execute combo?
Please note that we're only running the query once, I'm just batching
the results with the Execute maxrows parameter.
--
Sincerely,
Stephen R. van den Berg.
Limited offer: Pacemakers, with a unique life-time warranty
"Stephen R. van den Berg" <srb@cuci.nl> writes:
I was wondering, if there is any real advantage to actually specify say
64 for the maxrows parameter to the Execute message in the PostgreSQL
network protocol?
There's no benefit in it from the server's perspective, if that's what
you meant. The point of the parameter is to allow the client to avoid
running out of memory to store all of a huge query result --- it can
pull it in sections, instead. (Think of it as a built-in cursor
FETCH facility.)
- Is it simply slowing down the protocol because every 64 rows I
introduce a round-trip delay for the next PortalSuspended/Execute combo?
Yup; there's no free lunch ...
regards, tom lane
On Thu, Jul 10, 2008 at 05:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
I was wondering, if there is any real advantage to actually specify say
64 for the maxrows parameter to the Execute message in the PostgreSQL
network protocol?
There's no benefit in it from the server's perspective, if that's what
you meant. The point of the parameter is to allow the client to avoid
running out of memory to store all of a huge query result --- it can
pull it in sections, instead. (Think of it as a built-in cursor
FETCH facility.)
Then, from a client perspective, there is no use at all, because the
client can actually pause reading the results at any time it wants,
when it wants to avoid storing all of the result rows. The network
will perform the cursor/fetch facility for it.
--
Sincerely,
Stephen R. van den Berg.
"Stephen R. van den Berg" <srb@cuci.nl> writes:
Then, from a client perspective, there is no use at all, because the
client can actually pause reading the results at any time it wants,
when it wants to avoid storing all of the result rows. The network
will perform the cursor/fetch facility for it.
[ shrug... ] In principle you could write a client library that would
act that way, but I think you'll find that none of the extant ones
will hand back an incomplete query result to the application.
A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.
regards, tom lane
Tom Lane wrote:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
Then, from a client perspective, there is no use at all, because the
client can actually pause reading the results at any time it wants,
when it wants to avoid storing all of the result rows. The network
will perform the cursor/fetch facility for it.[ shrug... ] In principle you could write a client library that would
act that way, but I think you'll find that none of the extant ones
will hand back an incomplete query result to the application.A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.
I really think we need to get something like this into libpq. It's on my
TODO list after notification payloads and libpq support for arrays and
composites. We'll need to come up with an API before we do much else.
cheers
andrew
Tom Lane wrote:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
Then, from a client perspective, there is no use at all, because the
client can actually pause reading the results at any time it wants,
when it wants to avoid storing all of the result rows. The network
will perform the cursor/fetch facility for it.
[ shrug... ] In principle you could write a client library that would
act that way, but I think you'll find that none of the extant ones
will hand back an incomplete query result to the application.
True. But I have written one just now. The language is called Pike,
it's a C/C++/Java lookalike. And I start returning rows as they arrive,
and pause reading from the network when the application wants to pause.
A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.
Interleaved retrieval using multiple portals is not what most libraries
support, I'd guess.
It can be supported at the application layer using multiple cursors, but
that works with my approach as well.
In practice, most applications that need that, open multiple
connections to the same database (I'd think).
The only thing I could imagine is that *if* at the server end, the
notifications that arrive during the retrieval of one long running
Execute, are queued *after* all the data, instead of inserted into
the datastream, then it might be worth doing it differently.
Incidentally, the nice thing about my library is that it automatically
does arguments in binary which are easily processed in binary
(TEXT/BYTEA/ and all those others I mentioned earlier).
It automatically transmits those arguments in binary for *both*
arguments and rowresults; i.e. in one row I can have both text and
binary columns, without the application needing to specify which is
which.
--
Sincerely,
Stephen R. van den Berg.
"If you can't explain it to an 8-year-old, you don't understand it."
(I don't really have much to add to the discussion here; I'm just
posting for the record on the question of client behaviour, since
I also wrote and maintain a client library in C++.)
At 2008-07-10 18:40:03 +0200, srb@cuci.nl wrote:
I start returning rows as they arrive, and pause reading from the
network when the application wants to pause.
My library also starts returning rows as they arrive, and in fact my
application makes heavy use of that feature. The data rows are read
from a non-blocking socket and the caller either does something for
each one, or waits until they've all arrived before proceeding.
Interleaved retrieval using multiple portals is not what most
libraries support, I'd guess.
My code did support that mode of operation in theory, but in practice
in the few situations where I have needed to use something like it, I
found it more convenient to open explicit cursors and FETCH from them
(but I usually needed this inside a transaction, and so did not open
multiple connections).
Thus my code always sets maxrows to 0 at the moment, and so...
The only thing I could imagine is that *if* at the server end, the
notifications that arrive during the retrieval of one long running
Execute, are queued *after* all the data, instead of inserted into
the datastream, then it might be worth doing it differently.
...I can't comment on this interesting observation.
i.e. in one row I can have both text and binary columns, without the
application needing to specify which is which.
Yes, that's nice. My first attempt to define an API for bind variables
set the data format to text by default and allowed it to be overriden,
but that was much too troublesome. Now the code decides by itself what
format is best to use for a given query.
(Again, though my library certainly supports mixing text and binary
format columns, my application has not needed to use this feature.)
-- ams
"Stephen R. van den Berg" <srb@cuci.nl> writes:
A possibly more convincing argument is that with that approach, the
connection is completely tied up --- you cannot issue additional
database commands based on what you just read, nor pull rows from
multiple portals in an interleaved fashion.Interleaved retrieval using multiple portals is not what most libraries
support, I'd guess. It can be supported at the application layer using
multiple cursors, but that works with my approach as well.In practice, most applications that need that, open multiple
connections to the same database (I'd think).
Er? There's nothing particularly unusual about application logic like:
$sth->execute('huge select');
while ($sth->fetch('foreign_key')) {
... do some processing which is hard to do in server-side language ...
$sth->execute('insert resulting data');
}
Most drivers do support this kind of interface but they may be reading the
entire result set for "huge select" in advance. However if ti's large enough
then this is only going to really work if you can start a new portal while the
outer portal is actually running on the backend. If the driver tries to cache
the whole result set the programmer will be sad.
Back when I was doing PHP programming and I discovered that PHP's Postgres
driver didn't support this I thought it was an outrageous bug. (It didn't help
that the behaviour was to misbehave randomly rather than throw a meaningful
error.)
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!
Gregory Stark <stark@enterprisedb.com> writes:
"Stephen R. van den Berg" <srb@cuci.nl> writes:
In practice, most applications that need that, open multiple
connections to the same database (I'd think).
Er? There's nothing particularly unusual about application logic like:
$sth->execute('huge select');
while ($sth->fetch('foreign_key')) {
... do some processing which is hard to do in server-side language ...
$sth->execute('insert resulting data');
}
Moreover, there's often good reasons to do it all within one
transaction, which is impossible if you rely on a separate connection
to issue the inserts on.
regards, tom lane
"Abhijit Menon-Sen" <ams@oryx.com> writes:
Interleaved retrieval using multiple portals is not what most
libraries support, I'd guess.My code did support that mode of operation in theory, but in practice
in the few situations where I have needed to use something like it, I
found it more convenient to open explicit cursors and FETCH from them
Note that using FETCH for each record means a round trip to the server for
each record. If you're dealing with a lot of records that could be a lot
slower than streaming them to the client as quickly as it can consume them.
Now I'm not sure anyone's actually done any experiments to optimize libpq or
other drivers to stream data efficiently, so I'm not sure how much you would
really lose in practice today.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning
Gregory Stark wrote:
"Abhijit Menon-Sen" <ams@oryx.com> writes:
Interleaved retrieval using multiple portals is not what most
libraries support, I'd guess.
My code did support that mode of operation in theory, but in practice
in the few situations where I have needed to use something like it, I
found it more convenient to open explicit cursors and FETCH from them
Note that using FETCH for each record means a round trip to the server for
each record. If you're dealing with a lot of records that could be a lot
slower than streaming them to the client as quickly as it can consume them.
Now I'm not sure anyone's actually done any experiments to optimize libpq or
other drivers to stream data efficiently, so I'm not sure how much you would
really lose in practice today.
My Pike drivers now support multiple simultaneous portals and
automatic streaming by presending overlapping Execute statements with
a dynamically adapted fetchlimit calculated per select as the query
progresses.
The only support still lacking is COPY.
--
Sincerely,
Stephen R. van den Berg.
In this signature, the concluding three words `were left out'.
Stephen R. van den Berg wrote:
My Pike drivers now support multiple simultaneous portals and
automatic streaming by presending overlapping Execute statements with
a dynamically adapted fetchlimit calculated per select as the query
progresses.
They also support COPY now.
The driver beats libpq in speed by about 62%.
The memory consumption is on demand, by row, and not the whole result set.
Transport to and from the query is in binary and dynamically determined
per datatype, no quoting necessary.
Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working driver
in Pike at:
http://admin.cuci.nl/psgsql.pike.tar.gz
Pike is a C/C++/Java like interpreted language.
The production driver uses a PGsql assist class which is written in C to
accelerate (amazingly) few core functions (not included, but the driver
works fully without the PGsql assist class).
--
Sincerely,
Stephen R. van den Berg.
"There are 10 types of people in the world.
Those who understand binary and those who do not."
I replied to this post, yesterday. Yet I don't see my reply appear,
could it have been caught in a spamfilter or something?
--
Sincerely,
Stephen R. van den Berg.
"Even if man could understand women, he still wouldn't believe it."
On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
Stephen R. van den Berg wrote:
My Pike drivers now support multiple simultaneous portals and
automatic streaming by presending overlapping Execute statements with
a dynamically adapted fetchlimit calculated per select as the query
progresses.They also support COPY now.
The driver beats libpq in speed by about 62%.
The memory consumption is on demand, by row, and not the whole
result set.
Transport to and from the query is in binary and dynamically
determined
per datatype, no quoting necessary.Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working driver
in Pike at:
This is very exciting news, I'd love to look at it, is there any way
it could be re-licensed so that it can be incorporated into say the
jdbc driver ?
Dave
Dave Cramer wrote:
On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
Stephen R. van den Berg wrote:
The driver beats libpq in speed by about 62%.
Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working driver
in Pike at:
This is very exciting news, I'd love to look at it, is there any way
it could be re-licensed so that it can be incorporated into say the
jdbc driver ?
Since I wrote it, I can relicense it any which way I want.
What kind of license would you like to have?
--
Sincerely,
Stephen R. van den Berg.
"Even if man could understand women, he still wouldn't believe it."
On Mon, 2008-07-28 at 18:45 +0200, Stephen R. van den Berg wrote:
Dave Cramer wrote:
On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
Stephen R. van den Berg wrote:
The driver beats libpq in speed by about 62%.Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working driver
in Pike at:This is very exciting news, I'd love to look at it, is there any way
it could be re-licensed so that it can be incorporated into say the
jdbc driver ?Since I wrote it, I can relicense it any which way I want.
What kind of license would you like to have?
The JDBC driver and PostgreSQL itself are BSD licensed:
http://jdbc.postgresql.org/license.html
http://www.postgresql.org/about/licence
Sincerely,
Joshua D. Drake
--
Sincerely,
Stephen R. van den Berg."Even if man could understand women, he still wouldn't believe it."
--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
On 28-Jul-08, at 12:45 PM, Stephen R. van den Berg wrote:
Dave Cramer wrote:
On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
Stephen R. van den Berg wrote:
The driver beats libpq in speed by about 62%.Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working driver
in Pike at:This is very exciting news, I'd love to look at it, is there any way
it could be re-licensed so that it can be incorporated into say the
jdbc driver ?Since I wrote it, I can relicense it any which way I want.
What kind of license would you like to have?
As Joshua mentioned BSD is the preferred postgresql license. As I
understand it I can't even look at your code and subsequently use
anything in the JDBC driver
Dave
Show quoted text
--
Sincerely,
Stephen R. van den Berg."Even if man could understand women, he still wouldn't believe it."
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Jul 28, 2008, at 1:54 PM, Dave Cramer wrote:
On 28-Jul-08, at 12:45 PM, Stephen R. van den Berg wrote:
Dave Cramer wrote:
On 27-Jul-08, at 3:00 PM, Stephen R. van den Berg wrote:
Stephen R. van den Berg wrote:
The driver beats libpq in speed by about 62%.Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working
driver
in Pike at:This is very exciting news, I'd love to look at it, is there any way
it could be re-licensed so that it can be incorporated into say the
jdbc driver ?Since I wrote it, I can relicense it any which way I want.
What kind of license would you like to have?As Joshua mentioned BSD is the preferred postgresql license. As I
understand it I can't even look at your code and subsequently use
anything in the JDBC driver
The GPL does not cover implementation ideas, not to mention that the
author just described the implementation. Furthermore, one could not
"take" anything from the Pike driver for the JDBC driver because it is
a completely different language. It seems like you confused the GPL
with an NDA.
Anyway, what does "The driver beats libpq in speed by about 62%" mean?
Cheers,
M
Dave Cramer wrote:
Since I wrote it, I can relicense it any which way I want.
What kind of license would you like to have?
As Joshua mentioned BSD is the preferred postgresql license. As I
understand it I can't even look at your code and subsequently use
anything in the JDBC driver
I'll relicense it under a BSD license, so you can use it.
Watch this space.
--
Sincerely,
Stephen R. van den Berg.
"Even if man could understand women, he still wouldn't believe it."
On 7/27/08, Stephen R. van den Berg <srb@cuci.nl> wrote:
Stephen R. van den Berg wrote:
My Pike drivers now support multiple simultaneous portals and
automatic streaming by presending overlapping Execute statements with
a dynamically adapted fetchlimit calculated per select as the query
progresses.They also support COPY now.
The driver beats libpq in speed by about 62%.
The memory consumption is on demand, by row, and not the whole result set.
Transport to and from the query is in binary and dynamically determined
per datatype, no quoting necessary.Anyone interested in taking a peek at the (GPL copyright) driver, I
temporarily put up a small package which contains the working driver
in Pike at:
I'd love to take a look at the code, but the link seems to be 404'd at the
moment.