[patch] libpq one-row-at-a-time API

Started by Marko Kreenalmost 14 years ago55 messageshackers
Jump to latest
#1Marko Kreen
markokr@gmail.com

The row-processor API is now in 9.2, but it solves only the
"different-row-storage" problem, but not the "one-row-at-a-time"
problem, as libpq is still in control until all rows are received.

This means libpq cannet still be used to implement iterative
result processing that almost all high-level languages are using.

We discussed potential API for fetching on single row at a time,
but did not reach conclusion. Basic arguments were:

1) Tom: PQisBusy must keep current behaviour. Thus also PQgetResult()
must keep current behaviour:
* PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult
* PQisBusy() -> 1: need to call PQconsumeInput()
* PQisBusy() must be callable several times in a row, thus be
stateless from clients POV.

2) Me: behaviour must not be controlled by callback, but client code
that uses PQgetResult() + PQisBusy().

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row. And everything else follows that.

Such API is implemented in attached patch:

* PQsetSingleRowMode(conn): set's single-row mode.

* PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY.
Thus keeping the property of being repeatedly callable.

* PQgetResult(): returns copy of the row if PGASYNC_ROW_READY. Sets row
resultStatus to PGRES_SINGLE_TUPLE. This needs to be different from
PGRES_TUPLES_OK to detect resultset end.

* PQgetRowData(): can be called instead PQgetResult() to get raw row data
in buffer, for more efficient processing. This is optional feature
that provides the original row-callback promise of avoiding unnecessary
row data copy.

* Although PQgetRowData() makes callback API unnecessary, it is still
fully compatible with it - the callback should not see any difference
whether the resultset is processed in single-row mode or
old single-PGresult mode. Unless it wants to - it can check
PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

There is some duplicate code here that can be refactored (callback exec),
but I did not do it yet to avoid affecting existing code too much.

--
marko

PS. If a squint it seems like fix of exising API instead of new feature,
so perhaps it can still fit into 9.2?

Attachments:

single-row.difftext/x-diff; charset=us-asciiDownload+275-2
#2Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#1)
Re: [patch] libpq one-row-at-a-time API

On Fri, Jun 15, 2012 at 1:21 PM, Marko Kreen <markokr@gmail.com> wrote:

The row-processor API is now in 9.2, but it solves only the
"different-row-storage" problem, but not the "one-row-at-a-time"
problem, as libpq is still in control until all rows are received.

This means libpq cannet still be used to implement iterative
result processing that almost all high-level languages are using.

We discussed potential API for fetching on single row at a time,
but did not reach conclusion.  Basic arguments were:

1) Tom: PQisBusy must keep current behaviour.  Thus also PQgetResult()
  must keep current behaviour:
  * PQisBusy() -> 0: need to call PQgetResult(), which returns PGresult
  * PQisBusy() -> 1: need to call PQconsumeInput()
  * PQisBusy() must be callable several times in a row, thus be
    stateless from clients POV.

2) Me: behaviour must not be controlled by callback, but client code
  that uses PQgetResult() + PQisBusy().

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row.  And everything else follows that.

Such API is implemented in attached patch:

* PQsetSingleRowMode(conn): set's single-row mode.

* PQisBusy(): stops after each row in single-row mode, sets PGASYNC_ROW_READY.
 Thus keeping the property of being repeatedly callable.

* PQgetResult(): returns copy of the row if PGASYNC_ROW_READY.  Sets row
 resultStatus to PGRES_SINGLE_TUPLE.  This needs to be different from
 PGRES_TUPLES_OK to detect resultset end.

* PQgetRowData(): can be called instead PQgetResult() to get raw row data
 in buffer, for more efficient processing.  This is optional feature
 that provides the original row-callback promise of avoiding unnecessary
 row data copy.

* Although PQgetRowData() makes callback API unnecessary, it is still
 fully compatible with it - the callback should not see any difference
 whether the resultset is processed in single-row mode or
 old single-PGresult mode.  Unless it wants to - it can check
 PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

There is some duplicate code here that can be refactored (callback exec),
but I did not do it yet to avoid affecting existing code too much.

--
marko

PS. If a squint it seems like fix of exising API instead of new feature,
so perhaps it can still fit into 9.2?

+1 on rushing in row processing for 9.2, but only if the API feels
right (i'll spend some time to review). I found the lack of iterative
row processing to be really unfortunate.

merlin

#3Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#1)
Re: [patch] libpq one-row-at-a-time API

Demos:

https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-sync.c
https://github.com/markokr/libpq-rowproc-demos/blob/master/demo-onerow-async.c

Few clarifications below.

On Fri, Jun 15, 2012 at 9:21 PM, Marko Kreen <markokr@gmail.com> wrote:

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row.  And everything else follows that.

Such API is implemented in attached patch:

* PQsetSingleRowMode(conn): set's single-row mode.

The function can be called only after PQsend* and before any
rows have arrived. This guarantees there will be no surprises
to PQexec* users who expect full resultset at once. Also it
guarantees that user will process resultset with PQgetResult()
loop, either sync or async. Next PQexec/PQsend call will
reset the flag. So it is active only for duration of processing
results from one command.

Currently it returns FALSE if called in wrong place and does
nothing. Only question I see here is whether it should set
error state on connection or not. It does not seem to be
improvement.

* PQgetRowData(): can be called instead PQgetResult() to get raw row data
 in buffer, for more efficient processing.  This is optional feature
 that provides the original row-callback promise of avoiding unnecessary
 row data copy.

* Although PQgetRowData() makes callback API unnecessary, it is still
 fully compatible with it - the callback should not see any difference
 whether the resultset is processed in single-row mode or
 old single-PGresult mode.  Unless it wants to - it can check
 PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

The PQgetResult() is compatible with callbacks, the PQgetRowData()
bypasses them.

--
marko

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#3)
Re: [patch] libpq one-row-at-a-time API

Marko Kreen <markokr@gmail.com> writes:

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row. And everything else follows that.

* PQgetRowData(): can be called instead PQgetResult() to get raw row data
in buffer, for more efficient processing. This is optional feature
that provides the original row-callback promise of avoiding unnecessary
row data copy.

* Although PQgetRowData() makes callback API unnecessary, it is still
fully compatible with it - the callback should not see any difference
whether the resultset is processed in single-row mode or
old single-PGresult mode. Unless it wants to - it can check
PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

I guess this raises the question of whether we ought to revert the
row-callback patch entirely and support only this approach. IMO
it is (barely) not too late to do that for 9.2, if we want to.
If we don't want to, then this is just another new feature and
should be considered for 9.3.

What I like about this is the greatly simpler and harder-to-misuse
API. The only arguable drawback is that there's still at least one
malloc/free cycle per tuple, imposed by the creation of a PGresult
for each one, whereas the callback approach avoids that. But worrying
about that could be considered to be vast overoptimization; the backend
has certainly spent a lot more overhead than that generating the tuple.

regards, tom lane

#5Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#4)
Re: [patch] libpq one-row-at-a-time API

On Sat, Jun 16, 2012 at 6:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I guess this raises the question of whether we ought to revert the
row-callback patch entirely and support only this approach.  IMO
it is (barely) not too late to do that for 9.2, if we want to.
If we don't want to, then this is just another new feature and
should be considered for 9.3.

I think row-callback is dangerous API that does not solve any
important problems.

But I do like the 2-phase processing the rowproc patch introduced
and having a way to bypass unnecessary malloc()+copy.

So my preference would be to simply remove the callback API
but keep the processing and provide PQgetRowData() instead.

Although the win that it brings is significantly smaller thanks
to single-row PQgetResult(). So if it does not sound interesting
to others, it can be dropped. Because the single-row processing
is the important feature we need, rest is extra.

--
marko

#6Simon Riggs
simon@2ndQuadrant.com
In reply to: Tom Lane (#4)
Re: [patch] libpq one-row-at-a-time API

On 16 June 2012 23:09, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marko Kreen <markokr@gmail.com> writes:

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row.  And everything else follows that.

* PQgetRowData(): can be called instead PQgetResult() to get raw row data
 in buffer, for more efficient processing.  This is optional feature
 that provides the original row-callback promise of avoiding unnecessary
 row data copy.

* Although PQgetRowData() makes callback API unnecessary, it is still
 fully compatible with it - the callback should not see any difference
 whether the resultset is processed in single-row mode or
 old single-PGresult mode.  Unless it wants to - it can check
 PGRES_TUPLES_OK vs. PGRES_SINGLE_TUPLE.

I guess this raises the question of whether we ought to revert the
row-callback patch entirely and support only this approach.  IMO
it is (barely) not too late to do that for 9.2, if we want to.
If we don't want to, then this is just another new feature and
should be considered for 9.3.

What I like about this is the greatly simpler and harder-to-misuse
API.  The only arguable drawback is that there's still at least one
malloc/free cycle per tuple, imposed by the creation of a PGresult
for each one, whereas the callback approach avoids that.  But worrying
about that could be considered to be vast overoptimization; the backend
has certainly spent a lot more overhead than that generating the tuple.

I prefer the description of Marko's API than the one we have now.

Adopting one API in 9.2 and another in 9.3 would be fairly bad.
Perhaps we can have both?

Can we see a performance test? "Add a row processor API to libpq for
better handling of large result sets". So idea is we do this many,
many times so we need to double check the extra overhead is not a
problem in cases where the dumping overhead is significant.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#7Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#5)
Re: [patch] libpq one-row-at-a-time API

On Sat, Jun 16, 2012 at 7:58 PM, Marko Kreen <markokr@gmail.com> wrote:

So my preference would be to simply remove the callback API
but keep the processing and provide PQgetRowData() instead.

This is implemented in attached patch. It also
converts dblink to use single-row API.

The patch should be applied on top of previous
single-row patch.

Both can be seen also here:

https://github.com/markokr/postgres/commits/single-row

--
marko

Attachments:

remove-rowproc.diff.gzapplication/x-gzip; name=remove-rowproc.diff.gzDownload
#8Marko Kreen
markokr@gmail.com
In reply to: Simon Riggs (#6)
Re: [patch] libpq one-row-at-a-time API

On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

I prefer the description of Marko's API than the one we have now.

Adopting one API in 9.2 and another in 9.3 would be fairly bad.
Perhaps we can have both?

I see no reason the keep the (public) callback API around,
except if we don't bother to remove it now.

Can we see a performance test? "Add a row processor API to libpq for
better handling of large result sets". So idea is we do this many,
many times so we need to double check the extra overhead is not a
problem in cases where the dumping overhead is significant.

Not sure what do to want to performance test.

PQgetRowData() uses exactly the same pipeline
that callbacks used. It will use few more C calls,
not sure it make sense to benchmark them.

Recent dblink change did change palloc() + copy
zero-termination dance to PQgetResult(), which
does malloc() + copy dance internally. This malloc
vs. palloc might be benchmarkable, but it seems to
go into micro-benchmarking world as the big win came
from avoiding buffering rows. So yeah, maybe using
PQgetRowData() might be tiny bit faster, but I don't
expect much difference.

But all this affects new users only. The thing that affects
everybody was the 2-step row processing change
that was done during rowproc patch.

I did benchmark it, and it seems there are column-size
+ column-count patterns where new way is faster,
and some patterns where old way is faster. But the
difference did not raise above test noise so I concluded
it is insignificant and the malloc+copy avoidance is worth it.

Ofcourse, additional any benchmarking is welcome, so feel
free to pick any situation you care about.

--
marko

#9Simon Riggs
simon@2ndQuadrant.com
In reply to: Marko Kreen (#8)
Re: [patch] libpq one-row-at-a-time API

On 17 June 2012 19:37, Marko Kreen <markokr@gmail.com> wrote:

On Sun, Jun 17, 2012 at 2:07 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

I prefer the description of Marko's API than the one we have now.

Adopting one API in 9.2 and another in 9.3 would be fairly bad.
Perhaps we can have both?

I see no reason the keep the (public) callback API around,
except if we don't bother to remove it now.

OK by me.

Can we see a performance test? "Add a row processor API to libpq for
better handling of large result sets". So idea is we do this many,
many times so we need to double check the extra overhead is not a
problem in cases where the dumping overhead is significant.

...

I did benchmark it, and it seems there are column-size
+ column-count patterns where new way is faster,
and some patterns where old way is faster.  But the
difference did not raise above test noise so I concluded
it is insignificant and the malloc+copy avoidance is worth it.

As long as we've checked that's fine.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#1)
Re: [patch] libpq one-row-at-a-time API

Marko Kreen <markokr@gmail.com> writes:

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row. And everything else follows that.

Such API is implemented in attached patch:

I'm starting to look at this patch now. I think we could drop the
PQgetRowData() API: it complicates matters for little gain that I can
see. The argument for it was to avoid the cost of creating a PGresult
per row, but we're already going to pay the cost of creating a
PGresult in order to return the PGRES_SINGLE_TUPLE status. And as was
pointed out upthread, any per-tuple malloc costs are going to be in
the noise compared to the server-side effort expended to create the
tuple, anyway. The point of this feature is to avoid accumulating the
entire resultset in memory, not to micro-optimize linear-time costs.

Moreover, if the argument for changing 9.2 at this late date is to get
rid of a fragile, breakable API, surely an API that's designed around
returning pointers into the library's network buffer ought to be a
prime target.

And lastly, since the proposed patch for dblink doesn't use
PQgetRowData, there's not even much reason to think that it's
bug-free.

regards, tom lane

#11Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#10)
Re: [patch] libpq one-row-at-a-time API

On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marko Kreen <markokr@gmail.com> writes:

Now, looking at the problem with some perspective, the solution
is obvious: when in single-row mode, the PQgetResult() must return
proper PGresult for that single row. And everything else follows that.

Such API is implemented in attached patch:

I'm starting to look at this patch now. I think we could drop the
PQgetRowData() API: it complicates matters for little gain that I can
see. The argument for it was to avoid the cost of creating a PGresult
per row, but we're already going to pay the cost of creating a
PGresult in order to return the PGRES_SINGLE_TUPLE status.

No. Please look again, it is supposed to be called instead of PGgetResult().

--
marko

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#11)
Re: [patch] libpq one-row-at-a-time API

Marko Kreen <markokr@gmail.com> writes:

On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm starting to look at this patch now. I think we could drop the
PQgetRowData() API: it complicates matters for little gain that I can
see. The argument for it was to avoid the cost of creating a PGresult
per row, but we're already going to pay the cost of creating a
PGresult in order to return the PGRES_SINGLE_TUPLE status.

No. Please look again, it is supposed to be called instead of PGgetResult().

Mm. I still think we should drop it, because it's still a dangerous API
that's not necessary for the principal benefit of this feature.

regards, tom lane

#13Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#12)
Re: [patch] libpq one-row-at-a-time API

On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marko Kreen <markokr@gmail.com> writes:

On Mon, Jul 16, 2012 at 4:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I'm starting to look at this patch now. I think we could drop the
PQgetRowData() API: it complicates matters for little gain that I can
see. The argument for it was to avoid the cost of creating a PGresult
per row, but we're already going to pay the cost of creating a
PGresult in order to return the PGRES_SINGLE_TUPLE status.

No. Please look again, it is supposed to be called instead of PGgetResult().

Mm. I still think we should drop it, because it's still a dangerous API
that's not necessary for the principal benefit of this feature.

Yes, it is a secondary feature, but it fits the needs of the actual target
audience of the single-row feature - various high-level wrappers of libpq.

Also it is needed for high-performance situations, where the
single-row-mode fits well even for C clients, except the
advantage is negated by new malloc-per-row overhead.

--
marko

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#13)
Re: [patch] libpq one-row-at-a-time API

Marko Kreen <markokr@gmail.com> writes:

On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mm. I still think we should drop it, because it's still a dangerous API
that's not necessary for the principal benefit of this feature.

Yes, it is a secondary feature, but it fits the needs of the actual target
audience of the single-row feature - various high-level wrappers of libpq.

Also it is needed for high-performance situations, where the
single-row-mode fits well even for C clients, except the
advantage is negated by new malloc-per-row overhead.

Absolutely no evidence has been presented that there's any useful
performance gain to be had there. Moreover, if there were, we could
probably work a bit harder at making PGresult creation cheaper, rather
than having to expose a dangerous API.

regards, tom lane

#15Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#14)
Re: [patch] libpq one-row-at-a-time API

On Mon, Jul 16, 2012 at 6:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marko Kreen <markokr@gmail.com> writes:

On Mon, Jul 16, 2012 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Mm. I still think we should drop it, because it's still a dangerous API
that's not necessary for the principal benefit of this feature.

Yes, it is a secondary feature, but it fits the needs of the actual target
audience of the single-row feature - various high-level wrappers of libpq.

Also it is needed for high-performance situations, where the
single-row-mode fits well even for C clients, except the
advantage is negated by new malloc-per-row overhead.

Absolutely no evidence has been presented that there's any useful
performance gain to be had there. Moreover, if there were, we could
probably work a bit harder at making PGresult creation cheaper, rather
than having to expose a dangerous API.

Ok, I'm more interested in primary feature,
so no more objections from me.

--
marko

#16Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#14)
Re: [patch] libpq one-row-at-a-time API

Here is 2 approaches how to get to state where only PQsetSingleRowMode()
is available. Both on top of REL9_2_STABLE branch.

a) Remove callback hooks, keep rowBuf, implement single-row-mode on
top of that. This was posted before, I just additionally removed
the PQgetRowData() function.

git pull git://github.com/markokr/postgres.git single-row-mode1
https://github.com/markokr/postgres/commits/single-row-mode1

Commits:
libpq: Single-row based processing
libpq, dblink: Remove row processor API

Advantage: makes easier to play with PQgetRowData() or potatial
faster PGresult creation methods. Smaller change compared to
libpq from 9.2beta than b).

b) Revert row-callback changes completely, implement single-row-mode on
top of virgin libpq. Only problem here was keeping fixes implemented
as part of row-callback patch. Single-row-mode itself is quite simple.

git pull git://github.com/markokr/postgres.git single-row-mode1
https://github.com/markokr/postgres/commits/single-row-mode1

Feature patch:
https://github.com/markokr/postgres/commit/b5e822125c655f189875401c61317242705143b9

Commits:
dblink: revert conversion to row processor API patch
libpq: revert row processor API patch
libpq: random fixes
libpq: single-row mode
dblink: use single-row-mode

Advantage: smaller change compared to libpq from 9.1 than a).

As the patch has suffered a lot from trying to provide both macro- and
micro-optimization (on-the-fly row processing vs. more efficient row
processing), maybe b) is safer choice for 9.2?

In case somebody wants to look at the patches without git (or web),
I attaches them as tgz too.

--
marko

Attachments:

single-row.tgzapplication/x-gtarDownload
#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#16)
Re: [patch] libpq one-row-at-a-time API

Marko Kreen <markokr@gmail.com> writes:

Here is 2 approaches how to get to state where only PQsetSingleRowMode()
is available. Both on top of REL9_2_STABLE branch.

a) Remove callback hooks, keep rowBuf, implement single-row-mode on
top of that. This was posted before, I just additionally removed
the PQgetRowData() function.

b) Revert row-callback changes completely, implement single-row-mode on
top of virgin libpq. Only problem here was keeping fixes implemented
as part of row-callback patch. Single-row-mode itself is quite simple.

Yeah, I was wondering if we shouldn't revisit the whole patch given that
where we're ending up looks little like where we thought we were going.
I hadn't had time to pursue the idea, but I'm glad you did. Will look
at these.

One reason to stick with approach (a) is that it gives us the option
to easily add PQgetRowData(), in case future testing shows that that's
actually worth the risk and usage restrictions. While I'm a bit dubious
of that, I'm prepared to be proven wrong.

regards, tom lane

#18Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#17)
Re: [patch] libpq one-row-at-a-time API

Here is a simple test program that takes a SELECT
query, reads it and outputs a COPY-formatted stream
to standard output, to simulate some activity.

It operates on 3 modes, specified by commant-line switches:

-f Load full resultset at once - old way.
-s Single-Row mode using PQgetResult().
-z Single-Row mode using PQgetRowData().

It is compiled with 2 different libpqs that correspond to
single-row-modeX branches in my github repo:

rowdump1 - libpq with rowBuf + PQgetRowData(). rowBuf is
required for PQgetRowData.
[ https://github.com/markokr/postgres/tree/single-row-mode1 ]

rowdump2 - Plain libpq patched for single-row mode.
No PQgetRowData() here.
[ https://github.com/markokr/postgres/tree/single-row-mode2 ]

Notes:

* Hardest part is picking realistic queries that matter.
It's possible to construct artificial queries that make
results go either way.

* It does not make sense for compare -f with others. But it
does make sense to compare -f from differently patched libpqs
to detect any potential slowdowns.

* The time measured is User Time of client process.

-------------------------------------------------------
QUERY: select 10000,200,300000,rpad('x',30,'z') from
generate_series(1,5000000)
./rowdump1 -f: 3.90 3.90 3.93 avg: 3.91
./rowdump2 -f: 4.03 4.13 4.05 avg: 4.07
./rowdump1 -s: 6.26 6.33 6.49 avg: 6.36
./rowdump2 -s: 7.48 7.46 7.50 avg: 7.48
./rowdump1 -z: 2.88 2.90 2.79 avg: 2.86
QUERY: select
rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
from generate_series(1,3000000)
./rowdump1 -f: 6.29 6.36 6.14 avg: 6.26
./rowdump2 -f: 6.79 6.69 6.72 avg: 6.73
./rowdump1 -s: 7.71 7.72 7.80 avg: 7.74
./rowdump2 -s: 8.14 8.16 8.57 avg: 8.29
./rowdump1 -z: 6.45 5.15 5.16 avg: 5.59
QUERY: select
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
from generate_series(1,800000)
./rowdump1 -f: 5.68 5.66 5.72 avg: 5.69
./rowdump2 -f: 5.69 5.84 5.67 avg: 5.73
./rowdump1 -s: 7.68 7.76 7.67 avg: 7.70
./rowdump2 -s: 7.57 7.54 7.62 avg: 7.58
./rowdump1 -z: 2.78 2.82 2.72 avg: 2.77
QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
generate_series(1,100000)
./rowdump1 -f: 2.71 2.66 2.58 avg: 2.65
./rowdump2 -f: 3.11 3.14 3.16 avg: 3.14
./rowdump1 -s: 2.64 2.61 2.64 avg: 2.63
./rowdump2 -s: 3.15 3.11 3.11 avg: 3.12
./rowdump1 -z: 2.53 2.51 2.46 avg: 2.50
-------------------------------------------------------

Test code attached. Please play with it.

By this test, both rowBuf and PQgetRowData() look good.

--
marko

Attachments:

Makefiletext/plain; charset=us-asciiDownload
xtest.shapplication/x-shDownload
rowdump.ctext/x-csrc; charset=us-asciiDownload
#19Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#18)
Re: [patch] libpq one-row-at-a-time API

On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen <markokr@gmail.com> wrote:

Here is a simple test program that takes a SELECT
query, reads it and outputs a COPY-formatted stream
to standard output, to simulate some activity.

It operates on 3 modes, specified by commant-line switches:

-f Load full resultset at once - old way.
-s Single-Row mode using PQgetResult().
-z Single-Row mode using PQgetRowData().

It is compiled with 2 different libpqs that correspond to
single-row-modeX branches in my github repo:

rowdump1 - libpq with rowBuf + PQgetRowData(). rowBuf is
required for PQgetRowData.
[ https://github.com/markokr/postgres/tree/single-row-mode1 ]

rowdump2 - Plain libpq patched for single-row mode.
No PQgetRowData() here.
[ https://github.com/markokr/postgres/tree/single-row-mode2 ]

Notes:

* Hardest part is picking realistic queries that matter.
It's possible to construct artificial queries that make
results go either way.

* It does not make sense for compare -f with others. But it
does make sense to compare -f from differently patched libpqs
to detect any potential slowdowns.

* The time measured is User Time of client process.

-------------------------------------------------------
QUERY: select 10000,200,300000,rpad('x',30,'z') from
generate_series(1,5000000)
./rowdump1 -f: 3.90 3.90 3.93 avg: 3.91
./rowdump2 -f: 4.03 4.13 4.05 avg: 4.07
./rowdump1 -s: 6.26 6.33 6.49 avg: 6.36
./rowdump2 -s: 7.48 7.46 7.50 avg: 7.48
./rowdump1 -z: 2.88 2.90 2.79 avg: 2.86
QUERY: select
rpad('x',10,'z'),rpad('x',20,'z'),rpad('x',30,'z'),rpad('x',40,'z'),rpad('x',50,'z'),rpad('x',60,'z')
from generate_series(1,3000000)
./rowdump1 -f: 6.29 6.36 6.14 avg: 6.26
./rowdump2 -f: 6.79 6.69 6.72 avg: 6.73
./rowdump1 -s: 7.71 7.72 7.80 avg: 7.74
./rowdump2 -s: 8.14 8.16 8.57 avg: 8.29
./rowdump1 -z: 6.45 5.15 5.16 avg: 5.59
QUERY: select
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100
from generate_series(1,800000)
./rowdump1 -f: 5.68 5.66 5.72 avg: 5.69
./rowdump2 -f: 5.69 5.84 5.67 avg: 5.73
./rowdump1 -s: 7.68 7.76 7.67 avg: 7.70
./rowdump2 -s: 7.57 7.54 7.62 avg: 7.58
./rowdump1 -z: 2.78 2.82 2.72 avg: 2.77
QUERY: select 1000,rpad('x', 400, 'z'),rpad('x', 4000, 'z') from
generate_series(1,100000)
./rowdump1 -f: 2.71 2.66 2.58 avg: 2.65
./rowdump2 -f: 3.11 3.14 3.16 avg: 3.14
./rowdump1 -s: 2.64 2.61 2.64 avg: 2.63
./rowdump2 -s: 3.15 3.11 3.11 avg: 3.12
./rowdump1 -z: 2.53 2.51 2.46 avg: 2.50
-------------------------------------------------------

Test code attached. Please play with it.

By this test, both rowBuf and PQgetRowData() look good.

I agree on performance grounds. It's important for libpq to be fast.

It seems odd (but maybe ok) that you have to set the single row mode
on the connection only to have the server reset it whenever you call a
send function: maybe rename to PQsetResultSingleRowMode?

Does PQgetRowData() break the ability to call PQgetvalue() against the
result as well as other functions like PQgetisnull()? If so, I
object.

merlin

#20Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#19)
Re: [patch] libpq one-row-at-a-time API

On Tue, Jul 24, 2012 at 12:27 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Mon, Jul 23, 2012 at 2:05 PM, Marko Kreen <markokr@gmail.com> wrote:
It seems odd (but maybe ok) that you have to set the single row mode
on the connection only to have the server reset it whenever you call a
send function: maybe rename to PQsetResultSingleRowMode?

Server does not reset it, it's purely client-side flag. It is reset
by next PQsend*/PQexec* call. If there are several resultsets
sent by server for one query, they all share the same setting.

I don't think extra-long function names that "describe exactly"
the function behavior are improvement over shorter but inexact
names, as you need to read the docs anyway to know
the real behavior. But its a matter of taste, so it can be
renamed if people want it.

Alternative is to create parallel PQsend* functions that set the flag.
It gets rid of the possibility of any extra activity between PQsend
and PQsetSingleRowMode(). But it seems messy to do that
just for single-row-mode, instead it makes sense to have PQsend*
that takes a flags argument to tune it's behavior. But that
is worth doing only if we have more flags than just one.

Does PQgetRowData() break the ability to call PQgetvalue() against the
result as well as other functions like PQgetisnull()? If so, I
object.

I don't get what are you objecting to. The PQgetRowData()
is called instead of PQgetResult() and it returns zero-row PGresult
for row header and list of PGdataValues that point to actual
data. You call the value functions, they don't crash, but as
the result has zero rows (the data is not copied to it)
they don't do anything useful either.

The whole point of the API is to avoid the unnecessary copying.

You can mix the calls to PQgetRowData() and PQgetResult()
during one resultset, it works fine although does not seem
that useful.

I guess you fear that some code can unexpectedly see
new behavior, and that exactly why the row-callback API
needs to be scrapped - it allowed such possibility.

But with PQsetSingleRowMode and PQgetRowData, the
new behavior is seen only by new code that clearly
expects it, so I don't see what the problem is.

--
marko

#21Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#20)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#21)
#23Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#22)
#24Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#21)
#25Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#24)
#26Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#24)
#27Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#25)
#28Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#27)
#29Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#28)
#30Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#28)
#31Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#30)
#32Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#31)
#33Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#32)
#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Merlin Moncure (#33)
#35Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#26)
#36Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#34)
#37Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#36)
#38Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#34)
#39Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#38)
#40Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#39)
#41Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#40)
#42Merlin Moncure
mmoncure@gmail.com
In reply to: Merlin Moncure (#41)
#43Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#41)
#44Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#43)
#45Leon Smith
leon.p.smith@gmail.com
In reply to: Merlin Moncure (#44)
#46Jan Wieck
JanWieck@Yahoo.com
In reply to: Leon Smith (#45)
#47Leon Smith
leon.p.smith@gmail.com
In reply to: Jan Wieck (#46)
#48Jan Wieck
JanWieck@Yahoo.com
In reply to: Leon Smith (#47)
#49Merlin Moncure
mmoncure@gmail.com
In reply to: Jan Wieck (#48)
#50Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#38)
#51Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#50)
#52Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#51)
#53Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#52)
#54Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#53)
#55Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#54)