Allow substitute allocators for PGresult.

Started by Kyotaro HORIGUCHIabout 14 years ago123 messages
#1Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
3 attachment(s)

Hello. This message is a proposal of a pair of patches that
enables the memory allocator for PGresult in libpq to be
replaced.

The comment at the the begging of pqexpbuffer.c says that libpq
should not rely on palloc(). Besides, Tom Lane said that palloc
should not be visible outside the backend(*1) and I agree with
it.

*1: http://archives.postgresql.org/pgsql-hackers/1999-02/msg00364.php

On the other hand, in dblink, dblink-plus (our product!), and
maybe FDW's connect to other PostgreSQL servers are seem to copy
the result of the query contained in PGresult into tuple store. I
guess that this is in order to avoid memory leakage on
termination in halfway.

But it is rather expensive to copy whole PGresult, and the
significance grows as the data received gets larger. Furthermore,
it requires about twice as much memory as the net size of the
data. And it is fruitless to copy'n modify libpq or reinvent it
from scratch. So we shall be happy to be able to use palloc's in
libpq at least for PGresult for such case in spite of the policy.

For these reasons, I propose to make allocators for PGresult
replaceable.

The modifications are made up into two patches.

1. dupEvents() and pqAddTuple() get new memory block by malloc
currently, but the aquired memory block is linked into
PGresult finally. So I think it is preferable to use
pqResultAlloc() or its descendents in consistensy with the
nature of the place to link.

But there is not PQresultRealloc() and it will be costly, so
pqAddTuple() is not modified in this patch.

2. Define three function pointers
PQpgresult_(malloc|realloc|free) and replace the calls to
malloc/realloc/free in the four functions below with these
pointers.

PQmakeEmptyPGresult()
pqResultAlloc()
PQclear()
pqAddTuple()

This patches make the tools run in backend process and use libpq
possible to handle PGresult as it is with no copy, no more memory.

(Of cource, someone wants to use his/her custom allocator for
PGresult on standalone tools could do that using this feature.)

Three files are attached to this message.

First, the patch with respect to "1" above.
Second, the patch with respect to "2" above.
Third, a very simple sample program.

I have built and briefly tested on CentOS6, with the sample
program mentioned above and valgrind, but not on Windows.

How do you think about this?

Regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_dupevents_alloc_r1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 113aab0..8e32b18 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -49,7 +49,7 @@ static int	static_client_encoding = PG_SQL_ASCII;
 static bool static_std_strings = false;
 
 
-static PGEvent *dupEvents(PGEvent *events, int count);
+static PGEvent *dupEvents(PGresult *res, PGEvent *events, int count);
 static bool PQsendQueryStart(PGconn *conn);
 static int PQsendQueryGuts(PGconn *conn,
 				const char *command,
@@ -186,7 +186,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 		/* copy events last; result must be valid if we need to PQclear */
 		if (conn->nEvents > 0)
 		{
-			result->events = dupEvents(conn->events, conn->nEvents);
+			result->events = dupEvents(result, conn->events, conn->nEvents);
 			if (!result->events)
 			{
 				PQclear(result);
@@ -337,7 +337,7 @@ PQcopyResult(const PGresult *src, int flags)
 	/* Wants to copy PGEvents? */
 	if ((flags & PG_COPYRES_EVENTS) && src->nEvents > 0)
 	{
-		dest->events = dupEvents(src->events, src->nEvents);
+		dest->events = dupEvents(dest, dest->events, src->nEvents);
 		if (!dest->events)
 		{
 			PQclear(dest);
@@ -374,7 +374,7 @@ PQcopyResult(const PGresult *src, int flags)
  * Also, the resultInitialized flags are all cleared.
  */
 static PGEvent *
-dupEvents(PGEvent *events, int count)
+dupEvents(PGresult *res, PGEvent *events, int count)
 {
 	PGEvent    *newEvents;
 	int			i;
@@ -382,7 +382,7 @@ dupEvents(PGEvent *events, int count)
 	if (!events || count <= 0)
 		return NULL;
 
-	newEvents = (PGEvent *) malloc(count * sizeof(PGEvent));
+	newEvents = (PGEvent *) pqResultAlloc(res, count * sizeof(PGEvent), TRUE);
 	if (!newEvents)
 		return NULL;
 
@@ -392,14 +392,9 @@ dupEvents(PGEvent *events, int count)
 		newEvents[i].passThrough = events[i].passThrough;
 		newEvents[i].data = NULL;
 		newEvents[i].resultInitialized = FALSE;
-		newEvents[i].name = strdup(events[i].name);
+		newEvents[i].name = pqResultStrdup(res, events[i].name);
 		if (!newEvents[i].name)
-		{
-			while (--i >= 0)
-				free(newEvents[i].name);
-			free(newEvents);
 			return NULL;
-		}
 	}
 
 	return newEvents;
@@ -661,12 +656,8 @@ PQclear(PGresult *res)
 			(void) res->events[i].proc(PGEVT_RESULTDESTROY, &evt,
 									   res->events[i].passThrough);
 		}
-		free(res->events[i].name);
 	}
 
-	if (res->events)
-		free(res->events);
-
 	/* Free all the subsidiary blocks */
 	while ((block = res->curBlock) != NULL)
 	{
libpq_replasable_alloc_r1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..3b26c7c 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQpgresult_malloc	  161
+PQpgresult_realloc	  162
+PQpgresult_free		  163
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 8e32b18..a574848 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -67,6 +67,15 @@ static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
 
+/* ---
+ * malloc/realloc/free for PGResult is replasable for in-backend use
+ * Note that the events having the event id PGEVT_RESULTDESTROY won't
+ * fire when you free the memory blocks for PGresult without
+ * PQclear().
+ */
+void *(*PQpgresult_malloc)(size_t size) = malloc;
+void *(*PQpgresult_realloc)(void *ptr, size_t size) = realloc;
+void (*PQpgresult_free)(void *ptr) = free;
 
 /* ----------------
  * Space management for PGresult.
@@ -138,7 +147,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 {
 	PGresult   *result;
 
-	result = (PGresult *) malloc(sizeof(PGresult));
+	result = (PGresult *) PQpgresult_malloc(sizeof(PGresult));
 	if (!result)
 		return NULL;
 
@@ -536,7 +545,8 @@ pqResultAlloc(PGresult *res, size_t nBytes, bool isBinary)
 	 */
 	if (nBytes >= PGRESULT_SEP_ALLOC_THRESHOLD)
 	{
-		block = (PGresult_data *) malloc(nBytes + PGRESULT_BLOCK_OVERHEAD);
+		block =
+			(PGresult_data *) PQpgresult_malloc(nBytes + PGRESULT_BLOCK_OVERHEAD);
 		if (!block)
 			return NULL;
 		space = block->space + PGRESULT_BLOCK_OVERHEAD;
@@ -560,7 +570,7 @@ pqResultAlloc(PGresult *res, size_t nBytes, bool isBinary)
 	}
 
 	/* Otherwise, start a new block. */
-	block = (PGresult_data *) malloc(PGRESULT_DATA_BLOCKSIZE);
+	block = (PGresult_data *) PQpgresult_malloc(PGRESULT_DATA_BLOCKSIZE);
 	if (!block)
 		return NULL;
 	block->next = res->curBlock;
@@ -662,12 +672,12 @@ PQclear(PGresult *res)
 	while ((block = res->curBlock) != NULL)
 	{
 		res->curBlock = block->next;
-		free(block);
+		PQpgresult_free(block);
 	}
 
 	/* Free the top-level tuple pointer array */
 	if (res->tuples)
-		free(res->tuples);
+		PQpgresult_free(res->tuples);
 
 	/* zero out the pointer fields to catch programming errors */
 	res->attDescs = NULL;
@@ -679,7 +689,7 @@ PQclear(PGresult *res)
 	/* res->curBlock was zeroed out earlier */
 
 	/* Free the PGresult structure itself */
-	free(res);
+	PQpgresult_free(res);
 }
 
 /*
@@ -844,10 +854,11 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
 
 		if (res->tuples == NULL)
 			newTuples = (PGresAttValue **)
-				malloc(newSize * sizeof(PGresAttValue *));
+				PQpgresult_malloc(newSize * sizeof(PGresAttValue *));
 		else
 			newTuples = (PGresAttValue **)
-				realloc(res->tuples, newSize * sizeof(PGresAttValue *));
+				PQpgresult_realloc(res->tuples,
+								 newSize * sizeof(PGresAttValue *));
 		if (!newTuples)
 			return FALSE;		/* malloc or realloc failed */
 		res->tupArrSize = newSize;
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index d13a5b9..c958df1 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -226,6 +226,14 @@ typedef struct pgresAttDesc
 } PGresAttDesc;
 
 /* ----------------
+ * malloc/realloc/free for PGResult is replasable for in-backend use
+ * ----------------
+ */
+extern void *(*PQpgresult_malloc)(size_t size);
+extern void *(*PQpgresult_realloc)(void *ptr, size_t size);
+extern void (*PQpgresult_free)(void *ptr);
+
+/* ----------------
  * Exported functions of libpq
  * ----------------
  */
testlibpq.c.gzapplication/octet-streamDownload
#2Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Kyotaro HORIGUCHI (#1)
Re: Allow substitute allocators for PGresult.

On 11.11.2011 11:18, Kyotaro HORIGUCHI wrote:

The comment at the the begging of pqexpbuffer.c says that libpq
should not rely on palloc(). Besides, Tom Lane said that palloc
should not be visible outside the backend(*1) and I agree with
it.

*1: http://archives.postgresql.org/pgsql-hackers/1999-02/msg00364.php

On the other hand, in dblink, dblink-plus (our product!), and
maybe FDW's connect to other PostgreSQL servers are seem to copy
the result of the query contained in PGresult into tuple store. I
guess that this is in order to avoid memory leakage on
termination in halfway.

But it is rather expensive to copy whole PGresult, and the
significance grows as the data received gets larger. Furthermore,
it requires about twice as much memory as the net size of the
data. And it is fruitless to copy'n modify libpq or reinvent it
from scratch. So we shall be happy to be able to use palloc's in
libpq at least for PGresult for such case in spite of the policy.

For these reasons, I propose to make allocators for PGresult
replaceable.

You could use the resource owner mechanism to track them. Register a
callback function with RegisterResourceReleaseCallback(). Whenever a
PGresult is returned from libpq, add it to e.g a linked list, kept in
TopMemoryContext, and also store a reference to CurrentResourceOwner in
the list element. In the callback function, scan through the list and
free all the PGresults associated with the resource owner that's being
released.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)
Re: Allow substitute allocators for PGresult.

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 11.11.2011 11:18, Kyotaro HORIGUCHI wrote:

The comment at the the begging of pqexpbuffer.c says that libpq
should not rely on palloc(). Besides, Tom Lane said that palloc
should not be visible outside the backend(*1) and I agree with
it.

*1: http://archives.postgresql.org/pgsql-hackers/1999-02/msg00364.php

On the other hand, in dblink, dblink-plus (our product!), and
maybe FDW's connect to other PostgreSQL servers are seem to copy
the result of the query contained in PGresult into tuple store. I
guess that this is in order to avoid memory leakage on
termination in halfway.

But it is rather expensive to copy whole PGresult, and the
significance grows as the data received gets larger. Furthermore,
it requires about twice as much memory as the net size of the
data. And it is fruitless to copy'n modify libpq or reinvent it
from scratch. So we shall be happy to be able to use palloc's in
libpq at least for PGresult for such case in spite of the policy.

For these reasons, I propose to make allocators for PGresult
replaceable.

You could use the resource owner mechanism to track them.

Heikki's idea is probably superior so far as PG backend usage is
concerned in isolation, but I wonder if there are scenarios where a
client application would like to be able to manage libpq's allocations.
If so, Kyotaro-san's approach would solve more problems than just
dblink's.

However, the bigger picture here is that I think Kyotaro-san's desire to
not have dblink return a tuplestore may be misplaced. Tuplestores can
spill to disk, while PGresults don't; so the larger the result, the
more important it is to push it into a tuplestore and PQclear it as soon
as possible.

Despite that worry, it'd likely be a good idea to adopt one or the other
of these solutions anyway, because I think there are corner cases where
dblink.c can leak a PGresult --- for instance, what if dblink_res_error
fails due to out-of-memory before reaching PQclear? And we could get
rid of the awkward and none-too-cheap PG_TRY blocks that it uses to try
to defend against such leaks in other places.

So I'm in favor of making a change along that line, although I'd want
to see more evidence before considering changing dblink to not return
tuplestores.

regards, tom lane

#4Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#3)
Re: Allow substitute allocators for PGresult.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Heikki's idea is probably superior so far as PG backend usage is
concerned in isolation, but I wonder if there are scenarios where a
client application would like to be able to manage libpq's allocations.

The answer to that is certainly 'yes'. It was one of the first things
that I complained about when moving from Oracle to PG. With OCI, you
can bulk load results directly into application-allocated memory areas.

Haven't been following the dblink discussion, so not going to comment
about that piece.

Thanks,

Stephen

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stephen Frost (#4)
Re: Allow substitute allocators for PGresult.

Stephen Frost <sfrost@snowman.net> writes:

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Heikki's idea is probably superior so far as PG backend usage is
concerned in isolation, but I wonder if there are scenarios where a
client application would like to be able to manage libpq's allocations.

The answer to that is certainly 'yes'. It was one of the first things
that I complained about when moving from Oracle to PG. With OCI, you
can bulk load results directly into application-allocated memory areas.

Well, loading data in a form whereby the application can access it
without going through the PGresult accessor functions would be an
entirely different (and vastly larger) project. I'm not sure I want
to open that can of worms --- it seems like you could write a huge
amount of code trying to provide every format someone might want,
and still find that there were impedance mismatches for many
applications.

AIUI Kyotaro-san is just suggesting that the app should be able to
provide a substitute malloc function for use in allocating PGresult
space (and not, I think, anything else that libpq allocates internally).
Basically this would allow PGresults to be cleaned up with methods other
than calling PQclear on each one. It wouldn't affect how you'd interact
with one while you had it. That seems like pretty much exactly what we
want for preventing memory leaks in the backend; but is it going to be
useful for other apps?

regards, tom lane

#6Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#5)
Re: Allow substitute allocators for PGresult.

On Sat, Nov 12, 2011 at 12:48 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

AIUI Kyotaro-san is just suggesting that the app should be able to
provide a substitute malloc function for use in allocating PGresult
space (and not, I think, anything else that libpq allocates internally).
Basically this would allow PGresults to be cleaned up with methods other
than calling PQclear on each one.  It wouldn't affect how you'd interact
with one while you had it.  That seems like pretty much exactly what we
want for preventing memory leaks in the backend; but is it going to be
useful for other apps?

I think it will.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro HORIGUCHI (#1)
Re: Allow substitute allocators for PGresult.

Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> writes:

Hello. This message is a proposal of a pair of patches that
enables the memory allocator for PGresult in libpq to be
replaced.

Since there seems to be rough consensus that something like this would
be a good idea, I looked more closely at the details of the patch.
I think the design could use some adjustment.

To start with, the patch proposes exposing some global variables that
affect the behavior of libpq process-wide. This seems like a pretty bad
design, because a single process could contain multiple usages of libpq
with different requirements. As an example, if dblink.c were to set
these variables inside a backend process, it would break usage of libpq
from PL/Perl via DBI. Global variables also tend to be a bad idea
whenever you think about multi-threaded applications --- they require
locking facilities, which are not in this patch.

I think it'd be better to consider the PGresult alloc/free functions to
be a property of a PGconn, which you'd set with a function call along the
lines of PQsetResultAllocator(conn, alloc_func, realloc_func, free_func)
after having successfully opened a connection. Then we just have some
more PGconn fields (and I guess PGresult will need a copy of the
free_func pointer) and no new global variables.

I am also feeling dubious about whether it's a good idea to expect the
functions to have exactly the signature of malloc/free. They are
essentially callbacks, and in most places where a library provides for
callbacks, it's customary to include a "void *" passthrough argument
in case the callback needs some context information. I am not sure that
dblink.c would need such a thing, but if we're trying to design a
general-purpose feature, then we probably should have it. The cost
would be having shim functions inside libpq for the default case, but
it doesn't seem likely that they'd cost enough to notice.

The patch lacks any user documentation, which it surely must have if
we are claiming this is a user-visible feature. And I think it could
use some attention to updating code comments, notably the large block
about PGresult space management near the top of fe-exec.c.

Usually, when writing a feature of this sort, it's a good idea to
implement a prototype use-case to make sure you've not overlooked
anything. So I'd feel happier about the patch if it came along with
a patch to make dblink.c use it to prevent memory leaks.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Heikki Linnakangas (#2)
Re: Allow substitute allocators for PGresult.

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

On 11.11.2011 11:18, Kyotaro HORIGUCHI wrote:

For these reasons, I propose to make allocators for PGresult
replaceable.

You could use the resource owner mechanism to track them.

BTW, I just thought of a potentially fatal objection to making PGresult
allocation depend on palloc: libpq is absolutely not prepared to handle
losing control on out-of-memory. While I'm not certain that its
behavior with malloc is entirely desirable either (it tends to loop in
hopes of getting the memory next time), we cannot just plop in palloc
in place of malloc and imagine that we're not breaking it.

This makes me think that Heikki's approach is by far the more tenable
one, so far as dblink is concerned. Perhaps the substitute-malloc idea
is still useful for some other application, but I'm inclined to put that
idea on the back burner until we have a concrete use case for it.

regards, tom lane

#9Matteo Beccati
php@beccati.com
In reply to: Robert Haas (#6)
Re: Allow substitute allocators for PGresult.

On 12/11/2011 07:36, Robert Haas wrote:

On Sat, Nov 12, 2011 at 12:48 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

AIUI Kyotaro-san is just suggesting that the app should be able to
provide a substitute malloc function for use in allocating PGresult
space (and not, I think, anything else that libpq allocates internally).
Basically this would allow PGresults to be cleaned up with methods other
than calling PQclear on each one. It wouldn't affect how you'd interact
with one while you had it. That seems like pretty much exactly what we
want for preventing memory leaks in the backend; but is it going to be
useful for other apps?

I think it will.

Maybe I've just talking nonsense, I just have little experience hacking
the pgsql and pdo-pgsql exstensions, but to me it would seem something
that could easily avoid an extra duplication of the data returned by
pqgetvalue. To me it seems a pretty nice win.

Cheers
--
Matteo Beccati

Development & Consulting - http://www.beccati.com/

#10Stephen Frost
sfrost@snowman.net
In reply to: Tom Lane (#5)
Re: Allow substitute allocators for PGresult.

* Tom Lane (tgl@sss.pgh.pa.us) wrote:

Well, loading data in a form whereby the application can access it
without going through the PGresult accessor functions would be an
entirely different (and vastly larger) project.

Looking through the thread, I agree that it's a different thing than
what's being discussed here.

I'm not sure I want
to open that can of worms --- it seems like you could write a huge
amount of code trying to provide every format someone might want,
and still find that there were impedance mismatches for many
applications.

The OCI approach is actually very similar to how we handle our
catalogs internally.. Imagine you define a C struct which matched your
table structure, then you allocate 5000 (or however) of those, give the
base pointer to the 'getResult' call and a integer array of offsets into
that structure for each of the columns. There might have been a few
other minor things (like some notion of how to handle NULLs), but it was
pretty straight-forward from the C perspective, imv.

Trying to provide alternative formats (I'm guessing you were referring
to something like XML..? Or some complex structure?) would certainly be
a whole different ballgame.

Thanks,

Stephen

Show quoted text

AIUI Kyotaro-san is just suggesting that the app should be able to
provide a substitute malloc function for use in allocating PGresult
space (and not, I think, anything else that libpq allocates internally).
Basically this would allow PGresults to be cleaned up with methods other
than calling PQclear on each one. It wouldn't affect how you'd interact
with one while you had it. That seems like pretty much exactly what we
want for preventing memory leaks in the backend; but is it going to be
useful for other apps?

regards, tom lane

#11Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#1)
Re: Allow substitute allocators for PGresult.

Hello,

At Fri, 11 Nov 2011 11:29:30 +0200, Heikki Linnakangas wrote

You could use the resource owner mechanism to track
them. Register a callback function with
RegisterResourceReleaseCallback().

Thank you for letting me know about it. I have dug up a message
in pg-hackers refering to the mechanism on discussion about
postgresql-fdw. I'll put further thought into dblink-plus taking
it into account.

By the way, thinking about memory management for the result in
libpq is considerable as another issue.

At Sat, 12 Nov 2011 12:29:50 -0500, Tom Lane wrote

To start with, the patch proposes exposing some global
variables that affect the behavior of libpq process-wide. This
seems like a pretty bad design, because a single process could
contain multiple usages of libpq

You're right to say the design is bad. I've designed it to have
minimal impact on libpq by limiting usage and imposing any
reponsibility on the users, that is the developers of the modules
using it. If there are any other applications that want to use
their own allocators, there are some points to be considered.

I think it is preferable consiering multi-threading to make libpq
write PGresult into memory blocks passed from the application
like OCI does, instead of letting libpq itself make request for
them.

This approach hands the responsibility of memory management to
the user and gives them the capability to avoid memory exhaustion
by their own measures.

On the other hand, this way could produce the situation that
libpq cannot write all of the data to receive from the server
onto handed memory block. So, the API must be able to return the
partial data to the caller.

More advancing, if libpq could store the result directly into
user-allocated memory space using tuplestore-like interface, it
is better on performance if the final storage is a tuplestore
itself.

I will be happy with the part-by-part passing of result. So I
will think about this as the next issue.

So I'd feel happier about the patch if it came along with a
patch to make dblink.c use it to prevent memory leaks.

I take it is about my original patch.

Mmm, I heard that dblink copies received data in PGResult to
tuple store not only because of the memory leak, but less memory
usage (after the copy is finished). I think I could show you the
patch ignoring the latter, but it might take some time for me to
start from understand dblink and tuplestore closely...

If I find RegisterResourceReleaseCallback short for our
requirement, I will show it. If not, I withdraw this patch for
ongoing CF and propose another patch based on the discussion
above at another time.

Please let me have a little more time.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#12Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#11)
Re: Allow substitute allocators for PGresult.

Hello,

me> I'll put further thought into dblink-plus taking it into
me> account.
..
me> Please let me have a little more time.

I've inquired the developer of dblink-plus about
RegisterResourceReleaseCallback(). He said that the function is
in bad compatibility with current implementation. In addition to
this, storing into tuplestore directly seems to me a good idea
than palloc'ed PGresult.

So I tried to make libpq/PGresult be able to handle alternative
tuple store by hinting to PGconn, and modify dblink to use the
mechanism as the first sample code.

I will show it as a series of patches in next message.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#13Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#11)
2 attachment(s)
Re: Allow substitute allocators for PGresult.

Hello, This is the next version of Allow substitute allocators
for PGresult.

Totally chaning the concept from the previous one, this patch
allows libpq to handle alternative tuple store for received
tuples.

Design guidelines are shown below.

- No need to modify existing client code of libpq.

- Existing libpq client runs with roughly same performance, and
dblink with modification runs faster to some extent and
requires less memory.

I have measured roughly of run time and memory requirement for
three configurations on CentOS6 on Vbox with 2GB mem 4 cores
running on Win7-Corei7, transferring (30 bytes * 2 cols) *
2000000 tuples (120MB net) within this virutal machine. The
results are below.

xfer time Peak RSS
Original : 6.02s 850MB
libpq patch + Original dblink : 6.11s 850MB
full patch : 4.44s 643MB

xfer time here is the mean of five 'real time's measured by
running sql script like this after warmup run.

=== test.sql
select dblink_connect('c', 'host=localhost port=5432 dbname=test');
select * from dblink('c', 'select a,c from foo limit 2000000') as (a text, b bytea) limit 1;

select dblink_disconnect('c');
===
$ for i in $(seq 1 10); do time psql test -f t.sql; done
===

Peak RSS is measured by picking up heap Rss in /proc/[pid]/smaps.

It seems somewhat slow using patched libpq and original dblink,
but it seems within error range too. If this amount of slowdown
is not permissible, it might be improved by restoring the static
call route before for extra redundancy of the code.

On the other hand, full patch version seems obviously fast and
requires less memory. Isn't it nice?

This patch consists of two sub patches.

The first is a patch for libpq to allow rewiring tuple storage
mechanism. But default behavior is not changed. Existing libpq
client should run with it.

The second is modify dblink to storing received tuples into
tuplestore directly using the mechanism above.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_subst_storage_v1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index a360d78..1af8df6 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,7 +160,3 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
-PQregisterTupleAdder	  161
-PQgetAsCstring		  162
-PQgetAddTupleParam	  163
-PQsetAddTupleErrMes	  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 437be26..50f3f83 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,7 +2692,6 @@ makeEmptyPGconn(void)
 	conn->allow_ssl_try = true;
 	conn->wait_ssl_try = false;
 #endif
-	conn->addTupleFunc = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5065,10 +5064,3 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
-
-void
-PQregisterTupleAdder(PGconn *conn, addTupleFunction func, void *param)
-{
-	conn->addTupleFunc = func;
-	conn->addTupleFuncParam = param;
-}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index c8ec9bd..113aab0 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -48,6 +48,7 @@ char	   *const pgresStatus[] = {
 static int	static_client_encoding = PG_SQL_ASCII;
 static bool static_std_strings = false;
 
+
 static PGEvent *dupEvents(PGEvent *events, int count);
 static bool PQsendQueryStart(PGconn *conn);
 static int PQsendQueryGuts(PGconn *conn,
@@ -65,9 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
-static void *pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func,
-								   int id, size_t len);
-static void *pqAddTuple(PGresult *res, PGresAttValue *tup);
+
 
 /* ----------------
  * Space management for PGresult.
@@ -161,9 +160,6 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
-	result->addTupleFunc = pqDefaultAddTupleFunc;
-	result->addTupleFuncParam = NULL;
-	result->addTupleFuncErrMes = NULL;
 
 	if (conn)
 	{
@@ -198,12 +194,6 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 			}
 			result->nEvents = conn->nEvents;
 		}
-
-		if (conn->addTupleFunc)
-		{
-			result->addTupleFunc = conn->addTupleFunc;
-			result->addTupleFuncParam = conn->addTupleFuncParam;
-		}
 	}
 	else
 	{
@@ -497,33 +487,6 @@ PQresultAlloc(PGresult *res, size_t nBytes)
 	return pqResultAlloc(res, nBytes, TRUE);
 }
 
-void *
-pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func, int id, size_t len)
-{
-	void *p;
-
-	switch (func)
-	{
-		case ADDTUP_ALLOC_TEXT:
-			return pqResultAlloc(res, len, TRUE);
-
-		case ADDTUP_ALLOC_BINARY:
-			p = pqResultAlloc(res, len, FALSE);
-
-			if (id == -1)
-				res->addTupleFuncParam = p;
-
-			return p;
-
-		case ADDTUP_ADD_TUPLE:
-			return pqAddTuple(res, res->addTupleFuncParam);
-
-		default:
-			/* Ignore */
-			break;
-	}
-	return NULL;
-}
 /*
  * pqResultAlloc -
  *		Allocate subsidiary storage for a PGresult.
@@ -867,9 +830,9 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
- *	  Returns tup if OK, NULL if not enough memory to add the row.
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row
  */
-static void *
+int
 pqAddTuple(PGresult *res, PGresAttValue *tup)
 {
 	if (res->ntups >= res->tupArrSize)
@@ -895,13 +858,13 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
 			newTuples = (PGresAttValue **)
 				realloc(res->tuples, newSize * sizeof(PGresAttValue *));
 		if (!newTuples)
-			return NULL;		/* malloc or realloc failed */
+			return FALSE;		/* malloc or realloc failed */
 		res->tupArrSize = newSize;
 		res->tuples = newTuples;
 	}
 	res->tuples[res->ntups] = tup;
 	res->ntups++;
-	return tup;
+	return TRUE;
 }
 
 /*
@@ -2859,43 +2822,6 @@ PQgetisnull(const PGresult *res, int tup_num, int field_num)
 		return 0;
 }
 
-/* PQgetAsCString
- *	returns the field as C string.
- */
-char *
-PQgetAsCstring(PGresAttValue *attval)
-{
-	return attval->len == NULL_LEN ? NULL : attval->value;
-}
-
-/* PQgetAddTupleParam
- *	Get the pointer to the contextual parameter from PGresult which is
- *	registered to PGconn by PQregisterTupleAdder
- */
-void *
-PQgetAddTupleParam(const PGresult *res)
-{
-	if (!res)
-		return NULL;
-	return res->addTupleFuncParam;
-}
-
-/* PQsetAddTupleErrMes
- *	Set the error message pass back to the caller of addTupleFunc
- *  mes must be a malloc'ed memory block and it is released by the
- *  caller of addTupleFunc if set.
- *  You can replace the previous message by alternative mes, or clear
- *  it with NULL.
- */
-void
-PQsetAddTupleErrMes(PGresult *res, char *mes)
-{
-	/* Free existing message */
-	if (res->addTupleFuncErrMes)
-		free(res->addTupleFuncErrMes);
-	res->addTupleFuncErrMes = mes;
-}
-
 /* PQnparams:
  *	returns the number of input parameters of a prepared statement.
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index c7f74ae..77c4d5a 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -733,10 +733,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	if (conn->curTuple == NULL)
 	{
 		conn->curTuple = (PGresAttValue *)
-			result->addTupleFunc(result, ADDTUP_ALLOC_BINARY, -1,
-								 nfields * sizeof(PGresAttValue));
+			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
 		if (conn->curTuple == NULL)
-			goto addTupleError;
+			goto outOfMemory;
 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
 
 		/*
@@ -758,7 +757,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto addTupleError;
+			goto outOfMemory;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -788,12 +787,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 				vlen = 0;
 			if (tup[i].value == NULL)
 			{
-				AddTupFunc func =
-					(binary ? ADDTUP_ALLOC_BINARY : ADDTUP_ALLOC_TEXT);
-				tup[i].value =
-					(char *) result->addTupleFunc(result, func, i, vlen + 1);
+				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
 				if (tup[i].value == NULL)
-					goto addTupleError;
+					goto outOfMemory;
 			}
 			tup[i].len = vlen;
 			/* read in the value */
@@ -816,9 +812,8 @@ getAnotherTuple(PGconn *conn, bool binary)
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!result->addTupleFunc(result, ADDTUP_ADD_TUPLE, 0, 0))
-		goto addTupleError;
-
+	if (!pqAddTuple(result, tup))
+		goto outOfMemory;
 	/* and reset for a new message */
 	conn->curTuple = NULL;
 
@@ -826,7 +821,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 		free(bitmap);
 	return 0;
 
-addTupleError:
+outOfMemory:
 	/* Replace partially constructed result with an error result */
 
 	/*
@@ -834,21 +829,8 @@ addTupleError:
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	resetPQExpBuffer(&conn->errorMessage);
-
-	/*
-	 * If error message is passed from addTupleFunc, set it into
-	 * PGconn, assume out of memory if not.
-	 */
-	appendPQExpBufferStr(&conn->errorMessage,
-						 libpq_gettext(result->addTupleFuncErrMes ?
-									   result->addTupleFuncErrMes :
-									   "out of memory for query result\n"));
-	if (result->addTupleFuncErrMes)
-	{
-		free(result->addTupleFuncErrMes);
-		result->addTupleFuncErrMes = NULL;
-	}
+	printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext("out of memory for query result\n"));
 
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index d14b57a..45a84d8 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -634,10 +634,9 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	if (conn->curTuple == NULL)
 	{
 		conn->curTuple = (PGresAttValue *)
-			result->addTupleFunc(result, ADDTUP_ALLOC_BINARY, -1,
-								 nfields * sizeof(PGresAttValue));
+			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
 		if (conn->curTuple == NULL)
-			goto addTupleError;
+			goto outOfMemory;
 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
 	}
 	tup = conn->curTuple;
@@ -674,12 +673,11 @@ getAnotherTuple(PGconn *conn, int msgLength)
 			vlen = 0;
 		if (tup[i].value == NULL)
 		{
-			AddTupFunc func = (result->attDescs[i].format != 0 ?
-							   ADDTUP_ALLOC_BINARY : ADDTUP_ALLOC_TEXT);
-			tup[i].value =
-				(char *) result->addTupleFunc(result, func, i, vlen + 1);
+			bool		isbinary = (result->attDescs[i].format != 0);
+
+			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
 			if (tup[i].value == NULL)
-				goto addTupleError;
+				goto outOfMemory;
 		}
 		tup[i].len = vlen;
 		/* read in the value */
@@ -691,36 +689,22 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!result->addTupleFunc(result, ADDTUP_ADD_TUPLE, 0, 0))
-		goto addTupleError;
-
+	if (!pqAddTuple(result, tup))
+		goto outOfMemory;
 	/* and reset for a new message */
 	conn->curTuple = NULL;
 
 	return 0;
 
-addTupleError:
+outOfMemory:
 
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
 	pqClearAsyncResult(conn);
-	resetPQExpBuffer(&conn->errorMessage);
-
-	/*
-	 * If error message is passed from addTupleFunc, set it into
-	 * PGconn, assume out of memory if not.
-	 */
-	appendPQExpBufferStr(&conn->errorMessage,
-						 libpq_gettext(result->addTupleFuncErrMes ?
-									   result->addTupleFuncErrMes : 
-									   "out of memory for query result\n"));
-	if (result->addTupleFuncErrMes)
-	{
-		free(result->addTupleFuncErrMes);
-		result->addTupleFuncErrMes = NULL;
-	}
+	printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext("out of memory for query result\n"));
 	pqSaveErrorResult(conn);
 
 	/* Discard the failed message by pretending we read it */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index bdce294..d13a5b9 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -116,16 +116,6 @@ typedef enum
 	PQPING_NO_ATTEMPT			/* connection not attempted (bad params) */
 } PGPing;
 
-/* AddTupFunc is one of the parameters of addTupleFunc that decides
- * the function of the addTupleFunction. See addTupleFunction for
- * details */
-typedef enum 
-{
-	ADDTUP_ALLOC_TEXT,          /* Returns non-aligned memory for text value */
-	ADDTUP_ALLOC_BINARY,        /* Returns aligned memory for binary value */
-	ADDTUP_ADD_TUPLE            /* Adds tuple data into tuple storage */
-} AddTupFunc;
-
 /* PGconn encapsulates a connection to the backend.
  * The contents of this struct are not supposed to be known to applications.
  */
@@ -235,12 +225,6 @@ typedef struct pgresAttDesc
 	int			atttypmod;		/* type-specific modifier info */
 } PGresAttDesc;
 
-typedef struct pgresAttValue
-{
-	int			len;			/* length in bytes of the value */
-	char	   *value;			/* actual value, plus terminating zero byte */
-} PGresAttValue;
-
 /* ----------------
  * Exported functions of libpq
  * ----------------
@@ -432,52 +416,6 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
-/*
- * Typedef for tuple storage function.
- *
- * This function pointer is used for tuple storage function in
- * PGresult and PGconn.
- *
- * addTupleFunction is called for four types of function designated by
- * the enum AddTupFunc.
- *
- * id is the identifier for allocated memory block. The caller sets -1
- * for PGresAttValue array, and 0 to number of cols - 1 for each
- * column.
- *
- * ADDTUP_ALLOC_TEXT requests the size bytes memory block for a text
- * value which may not be alingned to the word boundary.
- *
- * ADDTUP_ALLOC_BINARY requests the size bytes memory block for a
- * binary value which is aligned to the word boundary.
- *
- * ADDTUP_ADD_TUPLE requests to add tuple data into storage, and
- * free the memory blocks allocated by this function if necessary.
- * id and size are ignored.
- *
- * This function must return non-NULL value for success and must
- * return NULL for failure and may set error message by
- * PQsetAddTupleErrMes in malloc'ed memory. Assumed by caller as out
- * of memory if the error message is NULL on failure. This function is
- * assumed not to throw any exception.
- */
-	typedef void *(*addTupleFunction)(PGresult *res, AddTupFunc func,
-									  int id, size_t size);
-
-/*
- * Register alternative tuple storage function to PGconn.
- * 
- * By registering this function, pg_result disables its own tuple
- * storage and calls it to append rows one by one.
- *
- * func is tuple store function. See addTupleFunction.
- * 
- * addTupFuncParam is contextual storage that can be get with
- * PQgetAddTupleParam in func.
- */
-extern void PQregisterTupleAdder(PGconn *conn, addTupleFunction func,
-								 void *addTupFuncParam);
-
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -516,9 +454,6 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
-extern char *PQgetAsCstring(PGresAttValue *attdesc);
-extern void *PQgetAddTupleParam(const PGresult *res);
-extern void	PQsetAddTupleErrMes(PGresult *res, char *mes);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 45e4c93..64dfcb2 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -134,6 +134,12 @@ typedef struct pgresParamDesc
 
 #define NULL_LEN		(-1)	/* pg_result len for NULL value */
 
+typedef struct pgresAttValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, plus terminating zero byte */
+} PGresAttValue;
+
 /* Typedef for message-field list entries */
 typedef struct pgMessageField
 {
@@ -203,11 +209,6 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
-
-	addTupleFunction addTupleFunc; /* Tuple storage function. See
-									* addTupleFunction for details. */
-	void *addTupleFuncParam;       /* Contextual parameter for addTupleFunc */
-	char *addTupleFuncErrMes;      /* Error message returned from addTupFunc */
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -442,13 +443,6 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
-
-    /* Tuple store function. The two fields below is copied to newly
-	 * created PGresult if addTupleFunc is not NULL. Use default
-	 * function if addTupleFunc is NULL. */
-	addTupleFunction addTupleFunc; /* Tuple storage function. See
-									* addTupleFunction for details. */
-	void *addTupleFuncParam;       /* Contextual parameter for addTupFunc */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -513,6 +507,7 @@ extern void
 pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /* This lets gcc check the format string for consistency. */
 __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
+extern int	pqAddTuple(PGresult *res, PGresAttValue *tup);
 extern void pqSaveMessageField(PGresult *res, char code,
 				   const char *value);
 extern void pqSaveParameterStatus(PGconn *conn, const char *name,
dblink_direct_tuplestore_v1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 62c810a..fb2e10e 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	AttInMetadata *attinmeta;
+	MemoryContext oldcontext;
+	char *attrvalbuf;
+	void **valbuf;
+	size_t *valbufsize;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static void *addTuple(PGresult *res, AddTupFunc func, int id, size_t size);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,30 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterTupleAdder(conn, addTuple, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* This is only for backward compatibility */
+		if (storeinfo.nummismatch)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -580,7 +612,6 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +671,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +747,206 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterTupleAdder(conn, addTuple, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* This is only for backward compatibility */
+			if (storeinfo.nummismatch)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	TupleDesc	tupdesc;
+	int i;
+	
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
+	
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->valbuf = (void **)malloc(sinfo->nattrs * sizeof(void *));
+	sinfo->valbufsize = (size_t *)malloc(sinfo->nattrs * sizeof(size_t));
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbufsize[i] = 0;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	/* Preallocate memory of same size with PGresAttDesc array for values. */
+	sinfo->attrvalbuf = (char *) malloc(sinfo->nattrs * sizeof(PGresAttValue));
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		if (sinfo->valbuf[i])
+		{
+			free(sinfo->valbuf[i]);
+			sinfo->valbuf[i] = NULL;
 		}
+	}
+	if (sinfo->attrvalbuf)
+		free(sinfo->attrvalbuf);
+	sinfo->attrvalbuf = NULL;
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+static void *
+addTuple(PGresult *res, AddTupFunc  func, int id, size_t size)
+{
+	storeInfo *sinfo = (storeInfo *)PQgetAddTupleParam(res);
+	HeapTuple	tuple;
+	int fields = PQnfields(res);
+	int i;
+	PGresAttValue *attval;
+	char        **cstrs;
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->error_occurred)
+		return NULL;
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	switch (func)
+	{
+		case ADDTUP_ALLOC_TEXT:
+		case ADDTUP_ALLOC_BINARY:
+			if (id == -1)
+				return sinfo->attrvalbuf;
+
+			if (id < 0 || id >= sinfo->nattrs)
+				return NULL;
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
+			if (sinfo->valbufsize[id] < size)
 			{
-				HeapTuple	tuple;
+				if (sinfo->valbuf[id] == NULL)
+					sinfo->valbuf[id] = malloc(size);
+				else
+					sinfo->valbuf[id] = realloc(sinfo->valbuf[id], size);
+				sinfo->valbufsize[id] = size;
+			}
+			return sinfo->valbuf[id];
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+		case ADDTUP_ADD_TUPLE:
+			break;   /* Go through */
+		default:
+			/* Ignore */
+			break;
+	}
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
-			}
+		PQsetAddTupleErrMes(res,
+							strdup("function returning record called in "
+								   "context that cannot accept type record"));
+		return NULL;
+	}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+	/*
+	 * Rewrite PGresAttDesc[] to char(*)[] in-place.
+	 */
+	Assert(sizeof(char*) <= sizeof(PGresAttValue));
+	attval = (PGresAttValue *)sinfo->attrvalbuf;
+	cstrs   = (char **)sinfo->attrvalbuf;
+	for(i = 0 ; i < fields ; i++)
+		cstrs[i] = PQgetAsCstring(attval++);
 
-		PQclear(res);
+	PG_TRY();
+	{
+		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+		tuplestore_puttuple(sinfo->tuplestore, tuple);
 	}
 	PG_CATCH();
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		/*
+		 * Return the error message in the exception to the caller and
+		 * cancel the exception.
+		 */
+		ErrorData *edata;
+
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+
+		finishStoreInfo(sinfo);
+
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		PQsetAddTupleErrMes(res, strdup(edata->message));
+		return NULL;
 	}
 	PG_END_TRY();
+
+	return sinfo->attrvalbuf;
 }
 
 /*
#14Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#13)
2 attachment(s)
Re: Allow substitute allocators for PGresult.

Ouch! I'm sorry for making a reverse patch for the first modification.

This is an amendment of the message below. The body text is
copied into this message.

http://archives.postgresql.org/message-id/20111201.192419.103527179.horiguchi.kyotaro@oss.ntt.co.jp

=======
Hello, This is the next version of Allow substitute allocators
for PGresult.

Totally chaning the concept from the previous one, this patch
allows libpq to handle alternative tuple store for received
tuples.

Design guidelines are shown below.

- No need to modify existing client code of libpq.

- Existing libpq client runs with roughly same performance, and
dblink with modification runs faster to some extent and
requires less memory.

I have measured roughly of run time and memory requirement for
three configurations on CentOS6 on Vbox with 2GB mem 4 cores
running on Win7-Corei7, transferring (30 bytes * 2 cols) *
2000000 tuples (120MB net) within this virutal machine. The
results are below.

xfer time Peak RSS
Original : 6.02s 850MB
libpq patch + Original dblink : 6.11s 850MB
full patch : 4.44s 643MB

xfer time here is the mean of five 'real time's measured by
running sql script like this after warmup run.

=== test.sql
select dblink_connect('c', 'host=localhost port=5432 dbname=test');
select * from dblink('c', 'select a,c from foo limit 2000000') as (a text, b bytea) limit 1;

select dblink_disconnect('c');
===
$ for i in $(seq 1 10); do time psql test -f t.sql; done
===

Peak RSS is measured by picking up heap Rss in /proc/[pid]/smaps.

It seems somewhat slow using patched libpq and original dblink,
but it seems within error range too. If this amount of slowdown
is not permissible, it might be improved by restoring the static
call route before for extra redundancy of the code.

On the other hand, full patch version seems obviously fast and
requires less memory. Isn't it nice?

This patch consists of two sub patches.

The first is a patch for libpq to allow rewiring tuple storage
mechanism. But default behavior is not changed. Existing libpq
client should run with it.

The second is modify dblink to storing received tuples into
tuplestore directly using the mechanism above.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_subst_storage_v1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..a360d78 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQregisterTupleAdder	  161
+PQgetAsCstring		  162
+PQgetAddTupleParam	  163
+PQsetAddTupleErrMes	  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 50f3f83..437be26 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,6 +2692,7 @@ makeEmptyPGconn(void)
 	conn->allow_ssl_try = true;
 	conn->wait_ssl_try = false;
 #endif
+	conn->addTupleFunc = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5064,3 +5065,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+void
+PQregisterTupleAdder(PGconn *conn, addTupleFunction func, void *param)
+{
+	conn->addTupleFunc = func;
+	conn->addTupleFuncParam = param;
+}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 113aab0..c8ec9bd 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -48,7 +48,6 @@ char	   *const pgresStatus[] = {
 static int	static_client_encoding = PG_SQL_ASCII;
 static bool static_std_strings = false;
 
-
 static PGEvent *dupEvents(PGEvent *events, int count);
 static bool PQsendQueryStart(PGconn *conn);
 static int PQsendQueryGuts(PGconn *conn,
@@ -66,7 +65,9 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
-
+static void *pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func,
+								   int id, size_t len);
+static void *pqAddTuple(PGresult *res, PGresAttValue *tup);
 
 /* ----------------
  * Space management for PGresult.
@@ -160,6 +161,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->addTupleFunc = pqDefaultAddTupleFunc;
+	result->addTupleFuncParam = NULL;
+	result->addTupleFuncErrMes = NULL;
 
 	if (conn)
 	{
@@ -194,6 +198,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 			}
 			result->nEvents = conn->nEvents;
 		}
+
+		if (conn->addTupleFunc)
+		{
+			result->addTupleFunc = conn->addTupleFunc;
+			result->addTupleFuncParam = conn->addTupleFuncParam;
+		}
 	}
 	else
 	{
@@ -487,6 +497,33 @@ PQresultAlloc(PGresult *res, size_t nBytes)
 	return pqResultAlloc(res, nBytes, TRUE);
 }
 
+void *
+pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func, int id, size_t len)
+{
+	void *p;
+
+	switch (func)
+	{
+		case ADDTUP_ALLOC_TEXT:
+			return pqResultAlloc(res, len, TRUE);
+
+		case ADDTUP_ALLOC_BINARY:
+			p = pqResultAlloc(res, len, FALSE);
+
+			if (id == -1)
+				res->addTupleFuncParam = p;
+
+			return p;
+
+		case ADDTUP_ADD_TUPLE:
+			return pqAddTuple(res, res->addTupleFuncParam);
+
+		default:
+			/* Ignore */
+			break;
+	}
+	return NULL;
+}
 /*
  * pqResultAlloc -
  *		Allocate subsidiary storage for a PGresult.
@@ -830,9 +867,9 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
- *	  Returns TRUE if OK, FALSE if not enough memory to add the row
+ *	  Returns tup if OK, NULL if not enough memory to add the row.
  */
-int
+static void *
 pqAddTuple(PGresult *res, PGresAttValue *tup)
 {
 	if (res->ntups >= res->tupArrSize)
@@ -858,13 +895,13 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
 			newTuples = (PGresAttValue **)
 				realloc(res->tuples, newSize * sizeof(PGresAttValue *));
 		if (!newTuples)
-			return FALSE;		/* malloc or realloc failed */
+			return NULL;		/* malloc or realloc failed */
 		res->tupArrSize = newSize;
 		res->tuples = newTuples;
 	}
 	res->tuples[res->ntups] = tup;
 	res->ntups++;
-	return TRUE;
+	return tup;
 }
 
 /*
@@ -2822,6 +2859,43 @@ PQgetisnull(const PGresult *res, int tup_num, int field_num)
 		return 0;
 }
 
+/* PQgetAsCString
+ *	returns the field as C string.
+ */
+char *
+PQgetAsCstring(PGresAttValue *attval)
+{
+	return attval->len == NULL_LEN ? NULL : attval->value;
+}
+
+/* PQgetAddTupleParam
+ *	Get the pointer to the contextual parameter from PGresult which is
+ *	registered to PGconn by PQregisterTupleAdder
+ */
+void *
+PQgetAddTupleParam(const PGresult *res)
+{
+	if (!res)
+		return NULL;
+	return res->addTupleFuncParam;
+}
+
+/* PQsetAddTupleErrMes
+ *	Set the error message pass back to the caller of addTupleFunc
+ *  mes must be a malloc'ed memory block and it is released by the
+ *  caller of addTupleFunc if set.
+ *  You can replace the previous message by alternative mes, or clear
+ *  it with NULL.
+ */
+void
+PQsetAddTupleErrMes(PGresult *res, char *mes)
+{
+	/* Free existing message */
+	if (res->addTupleFuncErrMes)
+		free(res->addTupleFuncErrMes);
+	res->addTupleFuncErrMes = mes;
+}
+
 /* PQnparams:
  *	returns the number of input parameters of a prepared statement.
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index 77c4d5a..c7f74ae 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -733,9 +733,10 @@ getAnotherTuple(PGconn *conn, bool binary)
 	if (conn->curTuple == NULL)
 	{
 		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
+			result->addTupleFunc(result, ADDTUP_ALLOC_BINARY, -1,
+								 nfields * sizeof(PGresAttValue));
 		if (conn->curTuple == NULL)
-			goto outOfMemory;
+			goto addTupleError;
 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
 
 		/*
@@ -757,7 +758,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+			goto addTupleError;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -787,9 +788,12 @@ getAnotherTuple(PGconn *conn, bool binary)
 				vlen = 0;
 			if (tup[i].value == NULL)
 			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
+				AddTupFunc func =
+					(binary ? ADDTUP_ALLOC_BINARY : ADDTUP_ALLOC_TEXT);
+				tup[i].value =
+					(char *) result->addTupleFunc(result, func, i, vlen + 1);
 				if (tup[i].value == NULL)
-					goto outOfMemory;
+					goto addTupleError;
 			}
 			tup[i].len = vlen;
 			/* read in the value */
@@ -812,8 +816,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
+	if (!result->addTupleFunc(result, ADDTUP_ADD_TUPLE, 0, 0))
+		goto addTupleError;
+
 	/* and reset for a new message */
 	conn->curTuple = NULL;
 
@@ -821,7 +826,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 		free(bitmap);
 	return 0;
 
-outOfMemory:
+addTupleError:
 	/* Replace partially constructed result with an error result */
 
 	/*
@@ -829,8 +834,21 @@ outOfMemory:
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->addTupleFuncErrMes ?
+									   result->addTupleFuncErrMes :
+									   "out of memory for query result\n"));
+	if (result->addTupleFuncErrMes)
+	{
+		free(result->addTupleFuncErrMes);
+		result->addTupleFuncErrMes = NULL;
+	}
 
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 45a84d8..d14b57a 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -634,9 +634,10 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	if (conn->curTuple == NULL)
 	{
 		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
+			result->addTupleFunc(result, ADDTUP_ALLOC_BINARY, -1,
+								 nfields * sizeof(PGresAttValue));
 		if (conn->curTuple == NULL)
-			goto outOfMemory;
+			goto addTupleError;
 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
 	}
 	tup = conn->curTuple;
@@ -673,11 +674,12 @@ getAnotherTuple(PGconn *conn, int msgLength)
 			vlen = 0;
 		if (tup[i].value == NULL)
 		{
-			bool		isbinary = (result->attDescs[i].format != 0);
-
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
+			AddTupFunc func = (result->attDescs[i].format != 0 ?
+							   ADDTUP_ALLOC_BINARY : ADDTUP_ALLOC_TEXT);
+			tup[i].value =
+				(char *) result->addTupleFunc(result, func, i, vlen + 1);
 			if (tup[i].value == NULL)
-				goto outOfMemory;
+				goto addTupleError;
 		}
 		tup[i].len = vlen;
 		/* read in the value */
@@ -689,22 +691,36 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
+	if (!result->addTupleFunc(result, ADDTUP_ADD_TUPLE, 0, 0))
+		goto addTupleError;
+
 	/* and reset for a new message */
 	conn->curTuple = NULL;
 
 	return 0;
 
-outOfMemory:
+addTupleError:
 
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->addTupleFuncErrMes ?
+									   result->addTupleFuncErrMes : 
+									   "out of memory for query result\n"));
+	if (result->addTupleFuncErrMes)
+	{
+		free(result->addTupleFuncErrMes);
+		result->addTupleFuncErrMes = NULL;
+	}
 	pqSaveErrorResult(conn);
 
 	/* Discard the failed message by pretending we read it */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index d13a5b9..bdce294 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -116,6 +116,16 @@ typedef enum
 	PQPING_NO_ATTEMPT			/* connection not attempted (bad params) */
 } PGPing;
 
+/* AddTupFunc is one of the parameters of addTupleFunc that decides
+ * the function of the addTupleFunction. See addTupleFunction for
+ * details */
+typedef enum 
+{
+	ADDTUP_ALLOC_TEXT,          /* Returns non-aligned memory for text value */
+	ADDTUP_ALLOC_BINARY,        /* Returns aligned memory for binary value */
+	ADDTUP_ADD_TUPLE            /* Adds tuple data into tuple storage */
+} AddTupFunc;
+
 /* PGconn encapsulates a connection to the backend.
  * The contents of this struct are not supposed to be known to applications.
  */
@@ -225,6 +235,12 @@ typedef struct pgresAttDesc
 	int			atttypmod;		/* type-specific modifier info */
 } PGresAttDesc;
 
+typedef struct pgresAttValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, plus terminating zero byte */
+} PGresAttValue;
+
 /* ----------------
  * Exported functions of libpq
  * ----------------
@@ -416,6 +432,52 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for tuple storage function.
+ *
+ * This function pointer is used for tuple storage function in
+ * PGresult and PGconn.
+ *
+ * addTupleFunction is called for four types of function designated by
+ * the enum AddTupFunc.
+ *
+ * id is the identifier for allocated memory block. The caller sets -1
+ * for PGresAttValue array, and 0 to number of cols - 1 for each
+ * column.
+ *
+ * ADDTUP_ALLOC_TEXT requests the size bytes memory block for a text
+ * value which may not be alingned to the word boundary.
+ *
+ * ADDTUP_ALLOC_BINARY requests the size bytes memory block for a
+ * binary value which is aligned to the word boundary.
+ *
+ * ADDTUP_ADD_TUPLE requests to add tuple data into storage, and
+ * free the memory blocks allocated by this function if necessary.
+ * id and size are ignored.
+ *
+ * This function must return non-NULL value for success and must
+ * return NULL for failure and may set error message by
+ * PQsetAddTupleErrMes in malloc'ed memory. Assumed by caller as out
+ * of memory if the error message is NULL on failure. This function is
+ * assumed not to throw any exception.
+ */
+	typedef void *(*addTupleFunction)(PGresult *res, AddTupFunc func,
+									  int id, size_t size);
+
+/*
+ * Register alternative tuple storage function to PGconn.
+ * 
+ * By registering this function, pg_result disables its own tuple
+ * storage and calls it to append rows one by one.
+ *
+ * func is tuple store function. See addTupleFunction.
+ * 
+ * addTupFuncParam is contextual storage that can be get with
+ * PQgetAddTupleParam in func.
+ */
+extern void PQregisterTupleAdder(PGconn *conn, addTupleFunction func,
+								 void *addTupFuncParam);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +516,9 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern char *PQgetAsCstring(PGresAttValue *attdesc);
+extern void *PQgetAddTupleParam(const PGresult *res);
+extern void	PQsetAddTupleErrMes(PGresult *res, char *mes);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 64dfcb2..45e4c93 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -134,12 +134,6 @@ typedef struct pgresParamDesc
 
 #define NULL_LEN		(-1)	/* pg_result len for NULL value */
 
-typedef struct pgresAttValue
-{
-	int			len;			/* length in bytes of the value */
-	char	   *value;			/* actual value, plus terminating zero byte */
-} PGresAttValue;
-
 /* Typedef for message-field list entries */
 typedef struct pgMessageField
 {
@@ -209,6 +203,11 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
+
+	addTupleFunction addTupleFunc; /* Tuple storage function. See
+									* addTupleFunction for details. */
+	void *addTupleFuncParam;       /* Contextual parameter for addTupleFunc */
+	char *addTupleFuncErrMes;      /* Error message returned from addTupFunc */
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -443,6 +442,13 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+    /* Tuple store function. The two fields below is copied to newly
+	 * created PGresult if addTupleFunc is not NULL. Use default
+	 * function if addTupleFunc is NULL. */
+	addTupleFunction addTupleFunc; /* Tuple storage function. See
+									* addTupleFunction for details. */
+	void *addTupleFuncParam;       /* Contextual parameter for addTupFunc */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -507,7 +513,6 @@ extern void
 pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /* This lets gcc check the format string for consistency. */
 __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
-extern int	pqAddTuple(PGresult *res, PGresAttValue *tup);
 extern void pqSaveMessageField(PGresult *res, char code,
 				   const char *value);
 extern void pqSaveParameterStatus(PGconn *conn, const char *name,
dblink_direct_tuplestore_v1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 62c810a..fb2e10e 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	AttInMetadata *attinmeta;
+	MemoryContext oldcontext;
+	char *attrvalbuf;
+	void **valbuf;
+	size_t *valbufsize;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static void *addTuple(PGresult *res, AddTupFunc func, int id, size_t size);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,30 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterTupleAdder(conn, addTuple, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* This is only for backward compatibility */
+		if (storeinfo.nummismatch)
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -580,7 +612,6 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +671,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +747,206 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterTupleAdder(conn, addTuple, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* This is only for backward compatibility */
+			if (storeinfo.nummismatch)
+			{
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	TupleDesc	tupdesc;
+	int i;
+	
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
+	
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->valbuf = (void **)malloc(sinfo->nattrs * sizeof(void *));
+	sinfo->valbufsize = (size_t *)malloc(sinfo->nattrs * sizeof(size_t));
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbufsize[i] = 0;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	/* Preallocate memory of same size with PGresAttDesc array for values. */
+	sinfo->attrvalbuf = (char *) malloc(sinfo->nattrs * sizeof(PGresAttValue));
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		if (sinfo->valbuf[i])
+		{
+			free(sinfo->valbuf[i]);
+			sinfo->valbuf[i] = NULL;
 		}
+	}
+	if (sinfo->attrvalbuf)
+		free(sinfo->attrvalbuf);
+	sinfo->attrvalbuf = NULL;
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+static void *
+addTuple(PGresult *res, AddTupFunc  func, int id, size_t size)
+{
+	storeInfo *sinfo = (storeInfo *)PQgetAddTupleParam(res);
+	HeapTuple	tuple;
+	int fields = PQnfields(res);
+	int i;
+	PGresAttValue *attval;
+	char        **cstrs;
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->error_occurred)
+		return NULL;
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	switch (func)
+	{
+		case ADDTUP_ALLOC_TEXT:
+		case ADDTUP_ALLOC_BINARY:
+			if (id == -1)
+				return sinfo->attrvalbuf;
+
+			if (id < 0 || id >= sinfo->nattrs)
+				return NULL;
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
+			if (sinfo->valbufsize[id] < size)
 			{
-				HeapTuple	tuple;
+				if (sinfo->valbuf[id] == NULL)
+					sinfo->valbuf[id] = malloc(size);
+				else
+					sinfo->valbuf[id] = realloc(sinfo->valbuf[id], size);
+				sinfo->valbufsize[id] = size;
+			}
+			return sinfo->valbuf[id];
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+		case ADDTUP_ADD_TUPLE:
+			break;   /* Go through */
+		default:
+			/* Ignore */
+			break;
+	}
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
-			}
+		PQsetAddTupleErrMes(res,
+							strdup("function returning record called in "
+								   "context that cannot accept type record"));
+		return NULL;
+	}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+	/*
+	 * Rewrite PGresAttDesc[] to char(*)[] in-place.
+	 */
+	Assert(sizeof(char*) <= sizeof(PGresAttValue));
+	attval = (PGresAttValue *)sinfo->attrvalbuf;
+	cstrs   = (char **)sinfo->attrvalbuf;
+	for(i = 0 ; i < fields ; i++)
+		cstrs[i] = PQgetAsCstring(attval++);
 
-		PQclear(res);
+	PG_TRY();
+	{
+		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+		tuplestore_puttuple(sinfo->tuplestore, tuple);
 	}
 	PG_CATCH();
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		/*
+		 * Return the error message in the exception to the caller and
+		 * cancel the exception.
+		 */
+		ErrorData *edata;
+
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+
+		finishStoreInfo(sinfo);
+
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		PQsetAddTupleErrMes(res, strdup(edata->message));
+		return NULL;
 	}
 	PG_END_TRY();
+
+	return sinfo->attrvalbuf;
 }
 
 /*
#15Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#14)
1 attachment(s)
Re: Allow substitute allocators for PGresult.

Hello,

The documentation had slipped my mind.

This is the patch to add the documentation of PGresult custom
storage. It shows in section '31.19. Alternative result
storage'.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_subst_storage_doc_v1.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 252ff8c..dc2acb6 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7229,6 +7229,325 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-alterstorage">
+  <title>Alternative result storage</title>
+
+  <indexterm zone="libpq-alterstorage">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, users can get the result of command
+   execution from <structname>PGresult</structname> aquired
+   with <function>PGgetResult</function>
+   from <structname>PGConn</structname>. While the memory areas for
+   the PGresult are allocated with malloc() internally within calls of
+   command execution functions such as <function>PQexec</function>
+   and <function>PQgetResult</function>. If you have difficulties to
+   handle the result records in the form of PGresult, you can instruct
+   PGconn to store them into your own storage instead of PGresult.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-registertupleadder">
+    <term>
+     <function>PQregisterTupleAdder</function>
+     <indexterm>
+      <primary>PQregisterTupleAdder</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a function to allocate memory for each tuple and column
+       values, and add the completed tuple into your storage.
+<synopsis>
+void PQregisterTupleAdder(PGconn *conn,
+                          addTupleFunction func,
+                          void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the tuple adder
+	       function. PGresult created from this connection calles
+	       this function to store the result tuples instead of
+	       storing into its internal storage.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Tuple adder function to set. NULL means to use the
+	       default storage.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-addtuplefunction">
+    <term>
+     <type>addTupleFunction</type>
+     <indexterm>
+      <primary>addTupleFunction</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the callback function to serve memory blocks for
+       each tuple and its column values, and to add the constructed
+       tuple into your own storage.
+<synopsis>
+typedef enum 
+{
+  ADDTUP_ALLOC_TEXT,
+  ADDTUP_ALLOC_BINARY,
+  ADDTUP_ADD_TUPLE
+} AddTupFunc;
+
+void *(*addTupleFunction)(PGresult *res,
+                          AddTupFunc func,
+                          int id,
+                          size_t size);
+</synopsis>
+     </para>
+
+     <para>
+       Generally this function must return NULL for failure and should
+       set the error message
+       with <function>PGsetAddTupleErrMes</function> if the cause is
+       other than out of memory. This funcion must not throw any
+       exception. This function is called in the sequence following.
+
+       <itemizedlist spacing="compact">
+	 <listitem>
+	   <simpara>Call with <parameter>func</parameter>
+	   = <firstterm>ADDTUP_ALLOC_BINARY</firstterm>
+	   and <parameter>id</parameter> = -1 to request the memory
+	   for tuple used as an array
+	   of <type>PGresAttValue</type> </simpara>
+	 </listitem>
+	 <listitem>
+	   <simpara>Call with <parameter>func</parameter>
+	   = <firstterm>ADDTUP_ALLOC_TEXT</firstterm>
+	   or <firstterm>ADDTUP_ALLOC_TEXT</firstterm>
+	   and <parameter>id</parameter> is zero or positive number
+	   to request the memory for each column value in current
+	   tuple.</simpara>
+	 </listitem>
+	 <listitem>
+	   <simpara>Call with <parameter>func</parameter>
+	   = <firstterm>ADDTUP_ADD_TUPLE</firstterm> to request the
+	   constructed tuple to store.</simpara>
+	 </listitem>
+       </itemizedlist>
+     </para>
+     <para>
+       Calling <type>addTupleFunction</type>
+       with <parameter>func</parameter> =
+       <firstterm>ADDTUP_ALLOC_TEXT</firstterm> is telling to return a
+        memory block with at least <parameter>size</parameter> bytes
+        which may not be aligned to the word boundary.
+       <parameter>id</parameter> is a zero or positive number
+       distinguishes the usage of requested memory block, that is the
+       position of the column for which the memory block is used.
+     </para>
+     <para>
+       When <parameter>func</parameter>
+       = <firstterm>ADDTUP_ALLOC_BINARY</firstterm>, this function is
+       telled to return a memory block with at
+       least <parameter>size</parameter> bytes which is aligned to the
+       word boundary.
+       <parameter>id</parameter> is the identifier distinguishes the
+       usage of requested memory block. -1 means that it is used as an
+       array of <type>PGresAttValue</type> to store the tuple. Zero or
+       positive numbers have the same meanings as for
+       <firstterm>ADDTUP_ALLOC_BINARY</firstterm>.
+     </para>
+     <para>When <parameter>func</parameter>
+       = <firstterm>ADDTUP_ADD_TUPLE</firstterm>, this function is
+       telled to store the <type>PGresAttValue</type> structure
+       constructed by the caller into your storage. The pointer to the
+       tuple structure is not passed so you should memorize the
+       pointer to the memory block passed the caller on
+       <parameter>func</parameter>
+       = <parameter>ADDTUP_ALLOC_BINARY</parameter>
+       with <parameter>id</parameter> is -1. This function must return
+       any non-NULL values for success. You must properly put back the
+       memory blocks passed to the caller for this function if needed.
+     </para>
+     <variablelist>
+       <varlistentry>
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+	 <term><parameter>func</parameter></term>
+	 <listitem>
+	   <para>
+	     An <type>enum</type> value telling the function to perform.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to contextual parameter passed to func.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgestasctring">
+    <term>
+     <function>PQgetAsCstring</function>
+     <indexterm>
+      <primary>PQgetAsCstring</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Get the value of the column pointed
+	by <parameter>attval</parameter> in the form of
+	zero-terminated C string. Returns NULL if the value is null.
+<synopsis>
+char *PQgetAsCstring(PGresAttValue *attval)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>attval</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresAttValue</type> object
+		to retrieve the value.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetaddtupleparam">
+    <term>
+     <function>PQgetAddTupleParam</function>
+     <indexterm>
+      <primary>PQgetAddTupleParam</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Get the pointer passed to <function>PQregisterTupleAdder</function>
+	as <parameter>param</parameter>.
+<synopsis>
+void *PQgetTupleParam(PGresult *res)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetaddtupleerrmes">
+    <term>
+     <function>PQsetAddTupleErrMes</function>
+     <indexterm>
+      <primary>PQsetAddTupleErrMes</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred in <type>addTupleFunction</type>.
+	If this message is not set, the error is assumed to be out of
+	memory.
+<synopsis>
+void PQsetAddTupleErrMes(PGresult *res, char *mes)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		in <type>addTupleFunction</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>mes</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the memory block containing the error
+		message, which must be allocated by alloc(). The
+		memory block will be freed with free() in the caller
+		of
+		<type>addTupleFunction</type> only if it returns NULL.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message
+		previously set, it is freed and the given message is
+		set. Set NULL to cancel the the costom message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
#16Greg Smith
greg@2ndQuadrant.com
In reply to: Kyotaro HORIGUCHI (#14)
Re: Allow substitute allocators for PGresult.

On 12/01/2011 05:48 AM, Kyotaro HORIGUCHI wrote:

xfer time Peak RSS
Original : 6.02s 850MB
libpq patch + Original dblink : 6.11s 850MB
full patch : 4.44s 643MB

These look like interesting results. Currently Tom is listed as the
reviewer on this patch, based on comments made before the CF really
started. And the patch has been incorrectly been sitting in "Waiting
for author" for the last week; oops. I'm not sure what to do with this
one now except raise a general call to see if anyone wants to take a
look at it, now that it seems to be in good enough shape to deliver
measurable results.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Greg Smith (#16)
Re: Allow substitute allocators for PGresult.

Greg Smith <greg@2ndQuadrant.com> writes:

On 12/01/2011 05:48 AM, Kyotaro HORIGUCHI wrote:

xfer time Peak RSS
Original : 6.02s 850MB
libpq patch + Original dblink : 6.11s 850MB
full patch : 4.44s 643MB

These look like interesting results. Currently Tom is listed as the
reviewer on this patch, based on comments made before the CF really
started. And the patch has been incorrectly been sitting in "Waiting
for author" for the last week; oops. I'm not sure what to do with this
one now except raise a general call to see if anyone wants to take a
look at it, now that it seems to be in good enough shape to deliver
measurable results.

I did list myself as reviewer some time ago, but if anyone else wants to
take it I won't be offended ;-)

regards, tom lane

#18Robert Haas
robertmhaas@gmail.com
In reply to: Kyotaro HORIGUCHI (#15)
Re: Allow substitute allocators for PGresult.

On Thu, Dec 8, 2011 at 5:41 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:

 This is the patch to add the documentation of PGresult custom
 storage. It shows in section '31.19. Alternative result
 storage'.

It would be good to consolidate this into the main patch.

I find the names of the functions added here to be quite confusing and
would suggest renaming them. I expected PQgetAsCstring to do
something similar to PQgetvalue, but the code is completely different,
and even after reading the documentation I still don't understand what
that function is supposed to be used for. Why "as cstring"? What
would the other option be?

I also don't think the "add tuple" terminology is particularly good.
It's not obvious from the name that what you're doing is overriding
the way memory is allocated and results are stored.

Also, what about the problem Tom mentioned here?

http://archives.postgresql.org/message-id/1042.1321123761@sss.pgh.pa.us

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#19Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Robert Haas (#18)
Re: Allow substitute allocators for PGresult.

Hello, thank you for taking the time for comment.

At Wed, 21 Dec 2011 11:09:59 -0500, Robert Haas <robertmhaas@gmail.com> wrote

I find the names of the functions added here to be quite
confusing and would suggest renaming them. I expected
PQgetAsCstring to do something similar to PQgetvalue, but the
code is completely different,

To be honest, I've also felt that kind of perplexity. If the
problem is simply of the naming, I can propose the another name
"PQreadAttValue"... This is not so good too...

But...

and even after reading the documentation I still don't
understand what that function is supposed to be used for. Why
"as cstring"? What would the other option be?

Is it a problem of the poor description? Or about the raison
d'être of the function?

The immediate cause of the existence of the function is that
getAnotherTuple internally stores the field values of the tuples
sent from the server, in the form of PGresAttValue, and I have
found only one route to store a tuple into TupleStore is
BuildeTupleFromCStrings() to tupelstore_puttuple() which is
dblink does in materializeResult(), and of cource C-string is the
most natural format in C program, and I have hesitated to modify
execTuples.c, and I wanted to hide the details of PGresAttValue.

Assuming that the values are passed as PGresAttValue* is given
(for the reasons of performance and the extent of the
modification), the "adding tuple" functions should get the value
from the struct. This can be done in two ways from the view of
authority (`encapsulation', in other words) and convenience, one
is with the knowledge of the structure, and the other is without
it. PQgetAsCstring is the latter approach. (And it is
inconsistent with the fact that the definition of PGresAttValue
is moved into lipq-fe.h from libpq-int.h. The details of the
structure should be hidden like PGresult in this approach).

But it is not obvious that the choice is better than the another
one. If we consider that PGresAttValue is too simple and stable
to hide the details, PQgetAsCString will be taken off and the
problem will go out. PGresAttValue needs documentation in this
case.

I prefer to handle PGresAttValue directly if no problem.

I also don't think the "add tuple" terminology is particularly good.
It's not obvious from the name that what you're doing is overriding
the way memory is allocated and results are stored.

This phrase is taken from pqAddTuple() in fe-exec.c at first and
have not been changed after the function is integrated with other
functions.

I propose "tuple storage handler" for the alternative.

- typedef void *(*addTupleFunction)(...);
+ typedef void *(*tupleStorageHandler)(...);
- typedef enum { ADDTUP_*, } AddTupFunc;
+ typedef enum { TSHANDLER_*, } TSHandlerCommand;
- void *PQgetAddTupleParam(...);
+ void *PQgetTupleStrageHandlerContext(...);
- void PQregisterTupleAdder(...);
+ void PQregisterTupleStoreHandler(...);
- addTupleFunction PGresult.addTupleFunc;
+ tupleStorageHandler PGresult.tupleStorageHandlerFunc;
- void *PGresult.addTuleFuncParam;
+ void *PGresult.tupleStorageHandlerContext;
- char *PGresult.addTuleFuncErrMes;
+ void *PGresult.tupelStrageHandlerErrMes;

Also, what about the problem Tom mentioned here?

http://archives.postgresql.org/message-id/1042.1321123761@sss.pgh.pa.us

The plan that simply replace malloc's with something like
palloc's is abandoned for the narrow scope.

dblink-plus copies whole PGresult into TupleStore in order to
avoid making orphaned memory on SIGINT. The resource owner
mechanism is principally applicable to that but practically hard
for the reason that current implementation without radically
modification couldn't accept it.. In addition to that, dblink
also does same thing for maybe the same reason with dblink-plus
and another reason as far as I heard.

Whatever the reason is, both dblink and dblink-plus do the same
thing that could lower the performance than expected.

If TupleStore(TupleDesc) is preferable to PGresult for in-backend
use and oridinary(client-use) libpq users can handle only
PGresult, the mechanism like this patch would be reuired to
maintain the compatibility, I think. To the contrary, if there is
no significant reason to use TupleStore in backend use - it
implies that existing mechanisms like resource owner can save the
backend inexpensively from possible inconvenience caused by using
PGresult storage in backends - PGresult should be used as it is.

I think TupleStore prepared to be used in backend is preferable
for the usage and don't want to get data making detour via
PGresult.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#20Greg Smith
greg@2ndQuadrant.com
In reply to: Kyotaro HORIGUCHI (#19)
1 attachment(s)
Speed dblink using alternate libpq tuple storage

One patch that fell off the truck during a turn in the November
CommitFest was "Allow substitute allocators for PGresult". Re-reading
the whole thing again, it actually turned into a rather different
submission in the middle, and I know I didn't follow that shift
correctly then. I'm replying to its thread but have changed the subject
to reflect that change. From a procedural point of view, I don't feel
right kicking this back to its author on a Friday night when the
deadline for resubmitting it would be Sunday. Instead I've refreshed
the patch myself and am adding it to the January CommitFest. The new
patch is a single file; it's easy enough to split out the dblink changes
if someone wants to work with the pieces separately.

After my meta-review I think we should get another reviewer familiar
with using dblink to look at this next. This is fundamentally a
performance patch now. Some results and benchmarking code were
submitted along with it; the other issues are moot if those aren't
reproducible. The secondary goal for a new review here is to provide
another opinion on the naming issues and abstraction concerns raised so far.

To clear out the original line of thinking, this is not a replacement
low-level storage allocator anymore. The idea of using such a mechanism
to help catch memory leaks has also been dropped.

Instead this adds and documents a new path for libpq callers to more
directly receive tuples, for both improved speed and lower memory
usage. dblink has been modified to use this new mechanism.
Benchmarking by the author suggests no significant change in libpq speed
when only that change was made, while the modified dblink using the new
mechanism was significantly faster. It jumped from 332K tuples/sec to
450K, a 35% gain, and had a lower memory footprint too. Test
methodology and those results are at
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00008.php

Robert Haas did a quick code review of this already, it along with
author response mixed in are at
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01149.php I see
two areas of contention there:

-There are several naming bits no one is happy with yet. Robert didn't
like some of them, but neither did Kyotaro. I don't have an opinion
myself. Is it the case that some changes to the existing code's
terminology are what's actually needed to make this all better? Or is
this just fundamentally warty and there's nothing to be done about it.
Dunno.

-There is an abstraction wrapper vs. coding convenience trade-off
centering around PGresAttValue. It sounded to me like it raised always
fun questions like "where's the right place for the line between
lipq-fe.h and libpq-int.h to be?"

dblink is pretty popular, and this is a big performance win for it. If
naming and API boundary issues are the worst problems here, this sounds
like something well worth pursuing as part of 9.2's still advancing
performance theme.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachments:

alt-result-storage-v1.patchtext/x-patch; name=alt-result-storage-v1.patchDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..f48fe4f 100644
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
*************** typedef struct remoteConn
*** 63,73 ****
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
- static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 63,85 ----
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
+ typedef struct storeInfo
+ {
+ 	Tuplestorestate *tuplestore;
+ 	int nattrs;
+ 	AttInMetadata *attinmeta;
+ 	MemoryContext oldcontext;
+ 	char *attrvalbuf;
+ 	void **valbuf;
+ 	size_t *valbufsize;
+ 	bool error_occurred;
+ 	bool nummismatch;
+ } storeInfo;
+ 
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
*************** static char *escape_param_str(const char
*** 90,95 ****
--- 102,111 ----
  static void validate_pkattnums(Relation rel,
  				   int2vector *pkattnums_arg, int32 pknumatts_arg,
  				   int **pkattnums, int *pknumatts);
+ static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+ static void finishStoreInfo(storeInfo *sinfo);
+ static void *addTuple(PGresult *res, AddTupFunc func, int id, size_t size);
+ 
  
  /* Global */
  static remoteConn *pconn = NULL;
*************** dblink_fetch(PG_FUNCTION_ARGS)
*** 503,508 ****
--- 519,525 ----
  	char	   *curname = NULL;
  	int			howmany = 0;
  	bool		fail = true;	/* default to backward compatible */
+ 	storeInfo   storeinfo;
  
  	DBLINK_INIT;
  
*************** dblink_fetch(PG_FUNCTION_ARGS)
*** 559,573 ****
--- 576,605 ----
  	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
  
  	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQregisterTupleAdder(conn, addTuple, &storeinfo);
+ 
+ 	/*
  	 * Try to execute the query.  Note that since libpq uses malloc, the
  	 * PGresult will be long-lived even though we are still in a short-lived
  	 * memory context.
  	 */
  	res = PQexec(conn, buf.data);
+ 	finishStoreInfo(&storeinfo);
+ 
  	if (!res ||
  		(PQresultStatus(res) != PGRES_COMMAND_OK &&
  		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
+ 		/* This is only for backward compatibility */
+ 		if (storeinfo.nummismatch)
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_DATATYPE_MISMATCH),
+ 					 errmsg("remote query result rowtype does not match "
+ 							"the specified FROM clause rowtype")));
  		dblink_res_error(conname, res, "could not fetch from cursor", fail);
  		return (Datum) 0;
  	}
*************** dblink_fetch(PG_FUNCTION_ARGS)
*** 580,586 ****
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
--- 612,617 ----
*************** dblink_record_internal(FunctionCallInfo
*** 640,645 ****
--- 671,677 ----
  	remoteConn *rconn = NULL;
  	bool		fail = true;	/* default to backward compatible */
  	bool		freeconn = false;
+ 	storeInfo   storeinfo;
  
  	/* check to see if caller supports us returning a tuplestore */
  	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
*************** dblink_record_internal(FunctionCallInfo
*** 715,878 ****
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
- 	{
  		res = PQgetResult(conn);
- 		/* NULL means we're all done with the async results */
- 		if (!res)
- 			return (Datum) 0;
- 	}
  
! 	/* if needed, close the connection to the database and cleanup */
! 	if (freeconn)
! 		PQfinish(conn);
  
! 	if (!res ||
! 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
! 		dblink_res_error(conname, res, "could not execute query", fail);
! 		return (Datum) 0;
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
- /*
-  * Materialize the PGresult to return them as the function result.
-  * The res will be released in this function.
-  */
  static void
! materializeResult(FunctionCallInfo fcinfo, PGresult *res)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
  
! 	Assert(rsinfo->returnMode == SFRM_Materialize);
  
! 	PG_TRY();
  	{
! 		TupleDesc	tupdesc;
! 		bool		is_sql_cmd = false;
! 		int			ntuples;
! 		int			nfields;
! 
! 		if (PQresultStatus(res) == PGRES_COMMAND_OK)
! 		{
! 			is_sql_cmd = true;
  
! 			/*
! 			 * need a tuple descriptor representing one TEXT column to return
! 			 * the command status string as our result tuple
! 			 */
! 			tupdesc = CreateTemplateTupleDesc(1, false);
! 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
! 							   TEXTOID, -1, 0);
! 			ntuples = 1;
! 			nfields = 1;
! 		}
! 		else
! 		{
! 			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
  
! 			is_sql_cmd = false;
  
! 			/* get a tuple descriptor for our result type */
! 			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 			{
! 				case TYPEFUNC_COMPOSITE:
! 					/* success */
! 					break;
! 				case TYPEFUNC_RECORD:
! 					/* failed to determine actual type of RECORD */
! 					ereport(ERROR,
! 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 						errmsg("function returning record called in context "
! 							   "that cannot accept type record")));
! 					break;
! 				default:
! 					/* result type isn't composite */
! 					elog(ERROR, "return type must be a row type");
! 					break;
! 			}
  
! 			/* make sure we have a persistent copy of the tupdesc */
! 			tupdesc = CreateTupleDescCopy(tupdesc);
! 			ntuples = PQntuples(res);
! 			nfields = PQnfields(res);
  		}
  
! 		/*
! 		 * check result and tuple descriptor have the same number of columns
! 		 */
! 		if (nfields != tupdesc->natts)
! 			ereport(ERROR,
! 					(errcode(ERRCODE_DATATYPE_MISMATCH),
! 					 errmsg("remote query result rowtype does not match "
! 							"the specified FROM clause rowtype")));
! 
! 		if (ntuples > 0)
! 		{
! 			AttInMetadata *attinmeta;
! 			Tuplestorestate *tupstore;
! 			MemoryContext oldcontext;
! 			int			row;
! 			char	  **values;
  
! 			attinmeta = TupleDescGetAttInMetadata(tupdesc);
  
! 			oldcontext = MemoryContextSwitchTo(
! 									rsinfo->econtext->ecxt_per_query_memory);
! 			tupstore = tuplestore_begin_heap(true, false, work_mem);
! 			rsinfo->setResult = tupstore;
! 			rsinfo->setDesc = tupdesc;
! 			MemoryContextSwitchTo(oldcontext);
  
! 			values = (char **) palloc(nfields * sizeof(char *));
  
! 			/* put all tuples into the tuplestore */
! 			for (row = 0; row < ntuples; row++)
  			{
! 				HeapTuple	tuple;
  
! 				if (!is_sql_cmd)
! 				{
! 					int			i;
  
! 					for (i = 0; i < nfields; i++)
! 					{
! 						if (PQgetisnull(res, row, i))
! 							values[i] = NULL;
! 						else
! 							values[i] = PQgetvalue(res, row, i);
! 					}
! 				}
! 				else
! 				{
! 					values[0] = PQcmdStatus(res);
! 				}
  
! 				/* build the tuple and put it into the tuplestore. */
! 				tuple = BuildTupleFromCStrings(attinmeta, values);
! 				tuplestore_puttuple(tupstore, tuple);
! 			}
  
! 			/* clean up and return the tuplestore */
! 			tuplestore_donestoring(tupstore);
! 		}
  
! 		PQclear(res);
  	}
  	PG_CATCH();
  	{
! 		/* be sure to release the libpq result */
! 		PQclear(res);
! 		PG_RE_THROW();
  	}
  	PG_END_TRY();
  }
  
  /*
--- 747,952 ----
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
+ 
+ 	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec/PQgetResult below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQregisterTupleAdder(conn, addTuple, &storeinfo);
+ 
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
  		res = PQgetResult(conn);
  
! 	finishStoreInfo(&storeinfo);
  
! 	/* NULL res from async get means we're all done with the results */
! 	if (res || !is_async)
  	{
! 		if (freeconn)
! 			PQfinish(conn);
! 
! 		if (!res ||
! 			(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 			 PQresultStatus(res) != PGRES_TUPLES_OK))
! 		{
! 			/* This is only for backward compatibility */
! 			if (storeinfo.nummismatch)
! 			{
! 				ereport(ERROR,
! 						(errcode(ERRCODE_DATATYPE_MISMATCH),
! 						 errmsg("remote query result rowtype does not match "
! 								"the specified FROM clause rowtype")));
! 			}
! 			dblink_res_error(conname, res, "could not execute query", fail);
! 			return (Datum) 0;
! 		}
  	}
  
  	return (Datum) 0;
  }
  
  static void
! initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc	tupdesc;
+ 	int i;
+ 	
+ 	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+ 	{
+ 		case TYPEFUNC_COMPOSITE:
+ 			/* success */
+ 			break;
+ 		case TYPEFUNC_RECORD:
+ 			/* failed to determine actual type of RECORD */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("function returning record called in context "
+ 							"that cannot accept type record")));
+ 			break;
+ 		default:
+ 			/* result type isn't composite */
+ 			elog(ERROR, "return type must be a row type");
+ 			break;
+ 	}
+ 	
+ 	sinfo->oldcontext = MemoryContextSwitchTo(
+ 		rsinfo->econtext->ecxt_per_query_memory);
  
! 	/* make sure we have a persistent copy of the tupdesc */
! 	tupdesc = CreateTupleDescCopy(tupdesc);
  
! 	sinfo->error_occurred = FALSE;
! 	sinfo->nummismatch = FALSE;
! 	sinfo->nattrs = tupdesc->natts;
! 	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
! 	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 	sinfo->valbuf = (void **)malloc(sinfo->nattrs * sizeof(void *));
! 	sinfo->valbufsize = (size_t *)malloc(sinfo->nattrs * sizeof(size_t));
! 	for (i = 0 ; i < sinfo->nattrs ; i++)
  	{
! 		sinfo->valbuf[i] = NULL;
! 		sinfo->valbufsize[i] = 0;
! 	}
  
! 	/* Preallocate memory of same size with PGresAttDesc array for values. */
! 	sinfo->attrvalbuf = (char *) malloc(sinfo->nattrs * sizeof(PGresAttValue));
  
! 	rsinfo->setResult = sinfo->tuplestore;
! 	rsinfo->setDesc = tupdesc;
! }
  
! static void
! finishStoreInfo(storeInfo *sinfo)
! {
! 	int i;
  
! 	for (i = 0 ; i < sinfo->nattrs ; i++)
! 	{
! 		if (sinfo->valbuf[i])
! 		{
! 			free(sinfo->valbuf[i]);
! 			sinfo->valbuf[i] = NULL;
  		}
+ 	}
+ 	if (sinfo->attrvalbuf)
+ 		free(sinfo->attrvalbuf);
+ 	sinfo->attrvalbuf = NULL;
+ 	MemoryContextSwitchTo(sinfo->oldcontext);
+ }
  
! static void *
! addTuple(PGresult *res, AddTupFunc  func, int id, size_t size)
! {
! 	storeInfo *sinfo = (storeInfo *)PQgetAddTupleParam(res);
! 	HeapTuple	tuple;
! 	int fields = PQnfields(res);
! 	int i;
! 	PGresAttValue *attval;
! 	char        **cstrs;
  
! 	if (sinfo->error_occurred)
! 		return NULL;
  
! 	switch (func)
! 	{
! 		case ADDTUP_ALLOC_TEXT:
! 		case ADDTUP_ALLOC_BINARY:
! 			if (id == -1)
! 				return sinfo->attrvalbuf;
  
! 			if (id < 0 || id >= sinfo->nattrs)
! 				return NULL;
  
! 			if (sinfo->valbufsize[id] < size)
  			{
! 				if (sinfo->valbuf[id] == NULL)
! 					sinfo->valbuf[id] = malloc(size);
! 				else
! 					sinfo->valbuf[id] = realloc(sinfo->valbuf[id], size);
! 				sinfo->valbufsize[id] = size;
! 			}
! 			return sinfo->valbuf[id];
  
! 		case ADDTUP_ADD_TUPLE:
! 			break;   /* Go through */
! 		default:
! 			/* Ignore */
! 			break;
! 	}
  
! 	if (sinfo->nattrs != fields)
! 	{
! 		sinfo->error_occurred = TRUE;
! 		sinfo->nummismatch = TRUE;
! 		finishStoreInfo(sinfo);
  
! 		PQsetAddTupleErrMes(res,
! 							strdup("function returning record called in "
! 								   "context that cannot accept type record"));
! 		return NULL;
! 	}
  
! 	/*
! 	 * Rewrite PGresAttDesc[] to char(*)[] in-place.
! 	 */
! 	Assert(sizeof(char*) <= sizeof(PGresAttValue));
! 	attval = (PGresAttValue *)sinfo->attrvalbuf;
! 	cstrs   = (char **)sinfo->attrvalbuf;
! 	for(i = 0 ; i < fields ; i++)
! 		cstrs[i] = PQgetAsCstring(attval++);
  
! 	PG_TRY();
! 	{
! 		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
! 		tuplestore_puttuple(sinfo->tuplestore, tuple);
  	}
  	PG_CATCH();
  	{
! 		/*
! 		 * Return the error message in the exception to the caller and
! 		 * cancel the exception.
! 		 */
! 		ErrorData *edata;
! 
! 		sinfo->error_occurred = TRUE;
! 		sinfo->nummismatch = TRUE;
! 
! 		finishStoreInfo(sinfo);
! 
! 		edata = CopyErrorData();
! 		FlushErrorState();
! 
! 		PQsetAddTupleErrMes(res, strdup(edata->message));
! 		return NULL;
  	}
  	PG_END_TRY();
+ 
+ 	return sinfo->attrvalbuf;
  }
  
  /*
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..af90952 100644
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
*************** int PQisthreadsafe();
*** 7233,7238 ****
--- 7233,7557 ----
   </sect1>
  
  
+  <sect1 id="libpq-alterstorage">
+   <title>Alternative result storage</title>
+ 
+   <indexterm zone="libpq-alterstorage">
+    <primary>PGresult</primary>
+    <secondary>PGconn</secondary>
+   </indexterm>
+ 
+   <para>
+    As the standard usage, users can get the result of command
+    execution from <structname>PGresult</structname> aquired
+    with <function>PGgetResult</function>
+    from <structname>PGConn</structname>. While the memory areas for
+    the PGresult are allocated with malloc() internally within calls of
+    command execution functions such as <function>PQexec</function>
+    and <function>PQgetResult</function>. If you have difficulties to
+    handle the result records in the form of PGresult, you can instruct
+    PGconn to store them into your own storage instead of PGresult.
+   </para>
+ 
+   <variablelist>
+    <varlistentry id="libpq-registertupleadder">
+     <term>
+      <function>PQregisterTupleAdder</function>
+      <indexterm>
+       <primary>PQregisterTupleAdder</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        Sets a function to allocate memory for each tuple and column
+        values, and add the completed tuple into your storage.
+ <synopsis>
+ void PQregisterTupleAdder(PGconn *conn,
+                           addTupleFunction func,
+                           void *param);
+ </synopsis>
+      </para>
+      
+      <para>
+        <variablelist>
+ 	 <varlistentry>
+ 	   <term><parameter>conn</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       The connection object to set the tuple adder
+ 	       function. PGresult created from this connection calles
+ 	       this function to store the result tuples instead of
+ 	       storing into its internal storage.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>func</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       Tuple adder function to set. NULL means to use the
+ 	       default storage.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>param</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       A pointer to contextual parameter passed
+ 	       to <parameter>func</parameter>.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+        </variablelist>
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-addtuplefunction">
+     <term>
+      <type>addTupleFunction</type>
+      <indexterm>
+       <primary>addTupleFunction</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        The type for the callback function to serve memory blocks for
+        each tuple and its column values, and to add the constructed
+        tuple into your own storage.
+ <synopsis>
+ typedef enum 
+ {
+   ADDTUP_ALLOC_TEXT,
+   ADDTUP_ALLOC_BINARY,
+   ADDTUP_ADD_TUPLE
+ } AddTupFunc;
+ 
+ void *(*addTupleFunction)(PGresult *res,
+                           AddTupFunc func,
+                           int id,
+                           size_t size);
+ </synopsis>
+      </para>
+ 
+      <para>
+        Generally this function must return NULL for failure and should
+        set the error message
+        with <function>PGsetAddTupleErrMes</function> if the cause is
+        other than out of memory. This funcion must not throw any
+        exception. This function is called in the sequence following.
+ 
+        <itemizedlist spacing="compact">
+ 	 <listitem>
+ 	   <simpara>Call with <parameter>func</parameter>
+ 	   = <firstterm>ADDTUP_ALLOC_BINARY</firstterm>
+ 	   and <parameter>id</parameter> = -1 to request the memory
+ 	   for tuple used as an array
+ 	   of <type>PGresAttValue</type> </simpara>
+ 	 </listitem>
+ 	 <listitem>
+ 	   <simpara>Call with <parameter>func</parameter>
+ 	   = <firstterm>ADDTUP_ALLOC_TEXT</firstterm>
+ 	   or <firstterm>ADDTUP_ALLOC_TEXT</firstterm>
+ 	   and <parameter>id</parameter> is zero or positive number
+ 	   to request the memory for each column value in current
+ 	   tuple.</simpara>
+ 	 </listitem>
+ 	 <listitem>
+ 	   <simpara>Call with <parameter>func</parameter>
+ 	   = <firstterm>ADDTUP_ADD_TUPLE</firstterm> to request the
+ 	   constructed tuple to store.</simpara>
+ 	 </listitem>
+        </itemizedlist>
+      </para>
+      <para>
+        Calling <type>addTupleFunction</type>
+        with <parameter>func</parameter> =
+        <firstterm>ADDTUP_ALLOC_TEXT</firstterm> is telling to return a
+         memory block with at least <parameter>size</parameter> bytes
+         which may not be aligned to the word boundary.
+        <parameter>id</parameter> is a zero or positive number
+        distinguishes the usage of requested memory block, that is the
+        position of the column for which the memory block is used.
+      </para>
+      <para>
+        When <parameter>func</parameter>
+        = <firstterm>ADDTUP_ALLOC_BINARY</firstterm>, this function is
+        telled to return a memory block with at
+        least <parameter>size</parameter> bytes which is aligned to the
+        word boundary.
+        <parameter>id</parameter> is the identifier distinguishes the
+        usage of requested memory block. -1 means that it is used as an
+        array of <type>PGresAttValue</type> to store the tuple. Zero or
+        positive numbers have the same meanings as for
+        <firstterm>ADDTUP_ALLOC_BINARY</firstterm>.
+      </para>
+      <para>When <parameter>func</parameter>
+        = <firstterm>ADDTUP_ADD_TUPLE</firstterm>, this function is
+        telled to store the <type>PGresAttValue</type> structure
+        constructed by the caller into your storage. The pointer to the
+        tuple structure is not passed so you should memorize the
+        pointer to the memory block passed the caller on
+        <parameter>func</parameter>
+        = <parameter>ADDTUP_ALLOC_BINARY</parameter>
+        with <parameter>id</parameter> is -1. This function must return
+        any non-NULL values for success. You must properly put back the
+        memory blocks passed to the caller for this function if needed.
+      </para>
+      <variablelist>
+        <varlistentry>
+ 	 <term><parameter>res</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     A pointer to the <type>PGresult</type> object.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 	 <term><parameter>func</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     An <type>enum</type> value telling the function to perform.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 	 <term><parameter>param</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     A pointer to contextual parameter passed to func.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+      </variablelist>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqgestasctring">
+     <term>
+      <function>PQgetAsCstring</function>
+      <indexterm>
+       <primary>PQgetAsCstring</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+ 	Get the value of the column pointed
+ 	by <parameter>attval</parameter> in the form of
+ 	zero-terminated C string. Returns NULL if the value is null.
+ <synopsis>
+ char *PQgetAsCstring(PGresAttValue *attval)
+ </synopsis>
+       </para>
+       <para>
+ 	<variablelist>
+ 	  <varlistentry>
+ 	    <term><parameter>attval</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the <type>PGresAttValue</type> object
+ 		to retrieve the value.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	</variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqgetaddtupleparam">
+     <term>
+      <function>PQgetAddTupleParam</function>
+      <indexterm>
+       <primary>PQgetAddTupleParam</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+ 	Get the pointer passed to <function>PQregisterTupleAdder</function>
+ 	as <parameter>param</parameter>.
+ <synopsis>
+ void *PQgetTupleParam(PGresult *res)
+ </synopsis>
+       </para>
+       <para>
+ 	<variablelist>
+ 	  <varlistentry>
+ 	    <term><parameter>res</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the <type>PGresult</type> object.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	</variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetaddtupleerrmes">
+     <term>
+      <function>PQsetAddTupleErrMes</function>
+      <indexterm>
+       <primary>PQsetAddTupleErrMes</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+ 	Set the message for the error occurred in <type>addTupleFunction</type>.
+ 	If this message is not set, the error is assumed to be out of
+ 	memory.
+ <synopsis>
+ void PQsetAddTupleErrMes(PGresult *res, char *mes)
+ </synopsis>
+       </para>
+       <para>
+ 	<variablelist>
+ 	  <varlistentry>
+ 	    <term><parameter>res</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the <type>PGresult</type> object
+ 		in <type>addTupleFunction</type>.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	  <varlistentry>
+ 	    <term><parameter>mes</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the memory block containing the error
+ 		message, which must be allocated by alloc(). The
+ 		memory block will be freed with free() in the caller
+ 		of
+ 		<type>addTupleFunction</type> only if it returns NULL.
+ 	      </para>
+ 	      <para>
+ 		If <parameter>res</parameter> already has a message
+ 		previously set, it is freed and the given message is
+ 		set. Set NULL to cancel the the costom message.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	</variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </sect1>
+ 
+ 
   <sect1 id="libpq-build">
    <title>Building <application>libpq</application> Programs</title>
  
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..a360d78 100644
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
*************** PQconnectStartParams      157
*** 160,162 ****
--- 160,166 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
+ PQregisterTupleAdder	  161
+ PQgetAsCstring		  162
+ PQgetAddTupleParam	  163
+ PQsetAddTupleErrMes	  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d454538..15d6216 100644
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
*************** makeEmptyPGconn(void)
*** 2692,2697 ****
--- 2692,2698 ----
  	conn->allow_ssl_try = true;
  	conn->wait_ssl_try = false;
  #endif
+ 	conn->addTupleFunc = NULL;
  
  	/*
  	 * We try to send at least 8K at a time, which is the usual size of pipe
*************** PQregisterThreadLock(pgthreadlock_t newh
*** 5076,5078 ****
--- 5077,5086 ----
  
  	return prev;
  }
+ 
+ void
+ PQregisterTupleAdder(PGconn *conn, addTupleFunction func, void *param)
+ {
+ 	conn->addTupleFunc = func;
+ 	conn->addTupleFuncParam = param;
+ }
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..3f774fd 100644
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
*************** char	   *const pgresStatus[] = {
*** 48,54 ****
  static int	static_client_encoding = PG_SQL_ASCII;
  static bool static_std_strings = false;
  
- 
  static PGEvent *dupEvents(PGEvent *events, int count);
  static bool PQsendQueryStart(PGconn *conn);
  static int PQsendQueryGuts(PGconn *conn,
--- 48,53 ----
*************** static PGresult *PQexecFinish(PGconn *co
*** 66,72 ****
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
! 
  
  /* ----------------
   * Space management for PGresult.
--- 65,73 ----
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
! static void *pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func,
! 								   int id, size_t len);
! static void *pqAddTuple(PGresult *res, PGresAttValue *tup);
  
  /* ----------------
   * Space management for PGresult.
*************** PQmakeEmptyPGresult(PGconn *conn, ExecSt
*** 160,165 ****
--- 161,169 ----
  	result->curBlock = NULL;
  	result->curOffset = 0;
  	result->spaceLeft = 0;
+ 	result->addTupleFunc = pqDefaultAddTupleFunc;
+ 	result->addTupleFuncParam = NULL;
+ 	result->addTupleFuncErrMes = NULL;
  
  	if (conn)
  	{
*************** PQmakeEmptyPGresult(PGconn *conn, ExecSt
*** 194,199 ****
--- 198,209 ----
  			}
  			result->nEvents = conn->nEvents;
  		}
+ 
+ 		if (conn->addTupleFunc)
+ 		{
+ 			result->addTupleFunc = conn->addTupleFunc;
+ 			result->addTupleFuncParam = conn->addTupleFuncParam;
+ 		}
  	}
  	else
  	{
*************** PQresultAlloc(PGresult *res, size_t nByt
*** 487,492 ****
--- 497,529 ----
  	return pqResultAlloc(res, nBytes, TRUE);
  }
  
+ void *
+ pqDefaultAddTupleFunc(PGresult *res, AddTupFunc func, int id, size_t len)
+ {
+ 	void *p;
+ 
+ 	switch (func)
+ 	{
+ 		case ADDTUP_ALLOC_TEXT:
+ 			return pqResultAlloc(res, len, TRUE);
+ 
+ 		case ADDTUP_ALLOC_BINARY:
+ 			p = pqResultAlloc(res, len, FALSE);
+ 
+ 			if (id == -1)
+ 				res->addTupleFuncParam = p;
+ 
+ 			return p;
+ 
+ 		case ADDTUP_ADD_TUPLE:
+ 			return pqAddTuple(res, res->addTupleFuncParam);
+ 
+ 		default:
+ 			/* Ignore */
+ 			break;
+ 	}
+ 	return NULL;
+ }
  /*
   * pqResultAlloc -
   *		Allocate subsidiary storage for a PGresult.
*************** pqInternalNotice(const PGNoticeHooks *ho
*** 830,838 ****
  /*
   * pqAddTuple
   *	  add a row pointer to the PGresult structure, growing it if necessary
!  *	  Returns TRUE if OK, FALSE if not enough memory to add the row
   */
! int
  pqAddTuple(PGresult *res, PGresAttValue *tup)
  {
  	if (res->ntups >= res->tupArrSize)
--- 867,875 ----
  /*
   * pqAddTuple
   *	  add a row pointer to the PGresult structure, growing it if necessary
!  *	  Returns tup if OK, NULL if not enough memory to add the row.
   */
! static void *
  pqAddTuple(PGresult *res, PGresAttValue *tup)
  {
  	if (res->ntups >= res->tupArrSize)
*************** pqAddTuple(PGresult *res, PGresAttValue
*** 858,870 ****
  			newTuples = (PGresAttValue **)
  				realloc(res->tuples, newSize * sizeof(PGresAttValue *));
  		if (!newTuples)
! 			return FALSE;		/* malloc or realloc failed */
  		res->tupArrSize = newSize;
  		res->tuples = newTuples;
  	}
  	res->tuples[res->ntups] = tup;
  	res->ntups++;
! 	return TRUE;
  }
  
  /*
--- 895,907 ----
  			newTuples = (PGresAttValue **)
  				realloc(res->tuples, newSize * sizeof(PGresAttValue *));
  		if (!newTuples)
! 			return NULL;		/* malloc or realloc failed */
  		res->tupArrSize = newSize;
  		res->tuples = newTuples;
  	}
  	res->tuples[res->ntups] = tup;
  	res->ntups++;
! 	return tup;
  }
  
  /*
*************** PQgetisnull(const PGresult *res, int tup
*** 2822,2827 ****
--- 2859,2901 ----
  		return 0;
  }
  
+ /* PQgetAsCString
+  *	returns the field as C string.
+  */
+ char *
+ PQgetAsCstring(PGresAttValue *attval)
+ {
+ 	return attval->len == NULL_LEN ? NULL : attval->value;
+ }
+ 
+ /* PQgetAddTupleParam
+  *	Get the pointer to the contextual parameter from PGresult which is
+  *	registered to PGconn by PQregisterTupleAdder
+  */
+ void *
+ PQgetAddTupleParam(const PGresult *res)
+ {
+ 	if (!res)
+ 		return NULL;
+ 	return res->addTupleFuncParam;
+ }
+ 
+ /* PQsetAddTupleErrMes
+  *	Set the error message pass back to the caller of addTupleFunc
+  *  mes must be a malloc'ed memory block and it is released by the
+  *  caller of addTupleFunc if set.
+  *  You can replace the previous message by alternative mes, or clear
+  *  it with NULL.
+  */
+ void
+ PQsetAddTupleErrMes(PGresult *res, char *mes)
+ {
+ 	/* Free existing message */
+ 	if (res->addTupleFuncErrMes)
+ 		free(res->addTupleFuncErrMes);
+ 	res->addTupleFuncErrMes = mes;
+ }
+ 
  /* PQnparams:
   *	returns the number of input parameters of a prepared statement.
   */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..721c812 100644
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
*************** getAnotherTuple(PGconn *conn, bool binar
*** 733,741 ****
  	if (conn->curTuple == NULL)
  	{
  		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
  		if (conn->curTuple == NULL)
! 			goto outOfMemory;
  		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
  
  		/*
--- 733,742 ----
  	if (conn->curTuple == NULL)
  	{
  		conn->curTuple = (PGresAttValue *)
! 			result->addTupleFunc(result, ADDTUP_ALLOC_BINARY, -1,
! 								 nfields * sizeof(PGresAttValue));
  		if (conn->curTuple == NULL)
! 			goto addTupleError;
  		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
  
  		/*
*************** getAnotherTuple(PGconn *conn, bool binar
*** 757,763 ****
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto outOfMemory;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
--- 758,764 ----
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto addTupleError;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
*************** getAnotherTuple(PGconn *conn, bool binar
*** 787,795 ****
  				vlen = 0;
  			if (tup[i].value == NULL)
  			{
! 				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
  				if (tup[i].value == NULL)
! 					goto outOfMemory;
  			}
  			tup[i].len = vlen;
  			/* read in the value */
--- 788,799 ----
  				vlen = 0;
  			if (tup[i].value == NULL)
  			{
! 				AddTupFunc func =
! 					(binary ? ADDTUP_ALLOC_BINARY : ADDTUP_ALLOC_TEXT);
! 				tup[i].value =
! 					(char *) result->addTupleFunc(result, func, i, vlen + 1);
  				if (tup[i].value == NULL)
! 					goto addTupleError;
  			}
  			tup[i].len = vlen;
  			/* read in the value */
*************** getAnotherTuple(PGconn *conn, bool binar
*** 812,819 ****
  	}
  
  	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
  	/* and reset for a new message */
  	conn->curTuple = NULL;
  
--- 816,824 ----
  	}
  
  	/* Success!  Store the completed tuple in the result */
! 	if (!result->addTupleFunc(result, ADDTUP_ADD_TUPLE, 0, 0))
! 		goto addTupleError;
! 
  	/* and reset for a new message */
  	conn->curTuple = NULL;
  
*************** getAnotherTuple(PGconn *conn, bool binar
*** 821,827 ****
  		free(bitmap);
  	return 0;
  
! outOfMemory:
  	/* Replace partially constructed result with an error result */
  
  	/*
--- 826,832 ----
  		free(bitmap);
  	return 0;
  
! addTupleError:
  	/* Replace partially constructed result with an error result */
  
  	/*
*************** outOfMemory:
*** 829,836 ****
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  
  	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
--- 834,854 ----
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
! 
! 	/*
! 	 * If error message is passed from addTupleFunc, set it into
! 	 * PGconn, assume out of memory if not.
! 	 */
! 	appendPQExpBufferStr(&conn->errorMessage,
! 						 libpq_gettext(result->addTupleFuncErrMes ?
! 									   result->addTupleFuncErrMes :
! 									   "out of memory for query result\n"));
! 	if (result->addTupleFuncErrMes)
! 	{
! 		free(result->addTupleFuncErrMes);
! 		result->addTupleFuncErrMes = NULL;
! 	}
  
  	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..1417d59 100644
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
*************** getAnotherTuple(PGconn *conn, int msgLen
*** 634,642 ****
  	if (conn->curTuple == NULL)
  	{
  		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
  		if (conn->curTuple == NULL)
! 			goto outOfMemory;
  		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
  	}
  	tup = conn->curTuple;
--- 634,643 ----
  	if (conn->curTuple == NULL)
  	{
  		conn->curTuple = (PGresAttValue *)
! 			result->addTupleFunc(result, ADDTUP_ALLOC_BINARY, -1,
! 								 nfields * sizeof(PGresAttValue));
  		if (conn->curTuple == NULL)
! 			goto addTupleError;
  		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
  	}
  	tup = conn->curTuple;
*************** getAnotherTuple(PGconn *conn, int msgLen
*** 673,683 ****
  			vlen = 0;
  		if (tup[i].value == NULL)
  		{
! 			bool		isbinary = (result->attDescs[i].format != 0);
! 
! 			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
  			if (tup[i].value == NULL)
! 				goto outOfMemory;
  		}
  		tup[i].len = vlen;
  		/* read in the value */
--- 674,685 ----
  			vlen = 0;
  		if (tup[i].value == NULL)
  		{
! 			AddTupFunc func = (result->attDescs[i].format != 0 ?
! 							   ADDTUP_ALLOC_BINARY : ADDTUP_ALLOC_TEXT);
! 			tup[i].value =
! 				(char *) result->addTupleFunc(result, func, i, vlen + 1);
  			if (tup[i].value == NULL)
! 				goto addTupleError;
  		}
  		tup[i].len = vlen;
  		/* read in the value */
*************** getAnotherTuple(PGconn *conn, int msgLen
*** 689,710 ****
  	}
  
  	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
  	/* and reset for a new message */
  	conn->curTuple = NULL;
  
  	return 0;
  
! outOfMemory:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
--- 691,726 ----
  	}
  
  	/* Success!  Store the completed tuple in the result */
! 	if (!result->addTupleFunc(result, ADDTUP_ADD_TUPLE, 0, 0))
! 		goto addTupleError;
! 
  	/* and reset for a new message */
  	conn->curTuple = NULL;
  
  	return 0;
  
! addTupleError:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
! 
! 	/*
! 	 * If error message is passed from addTupleFunc, set it into
! 	 * PGconn, assume out of memory if not.
! 	 */
! 	appendPQExpBufferStr(&conn->errorMessage,
! 						 libpq_gettext(result->addTupleFuncErrMes ?
! 									   result->addTupleFuncErrMes : 
! 									   "out of memory for query result\n"));
! 	if (result->addTupleFuncErrMes)
! 	{
! 		free(result->addTupleFuncErrMes);
! 		result->addTupleFuncErrMes = NULL;
! 	}
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..bfa6556 100644
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
*************** typedef enum
*** 116,121 ****
--- 116,131 ----
  	PQPING_NO_ATTEMPT			/* connection not attempted (bad params) */
  } PGPing;
  
+ /* AddTupFunc is one of the parameters of addTupleFunc that decides
+  * the function of the addTupleFunction. See addTupleFunction for
+  * details */
+ typedef enum 
+ {
+ 	ADDTUP_ALLOC_TEXT,          /* Returns non-aligned memory for text value */
+ 	ADDTUP_ALLOC_BINARY,        /* Returns aligned memory for binary value */
+ 	ADDTUP_ADD_TUPLE            /* Adds tuple data into tuple storage */
+ } AddTupFunc;
+ 
  /* PGconn encapsulates a connection to the backend.
   * The contents of this struct are not supposed to be known to applications.
   */
*************** typedef struct pgresAttDesc
*** 225,230 ****
--- 235,246 ----
  	int			atttypmod;		/* type-specific modifier info */
  } PGresAttDesc;
  
+ typedef struct pgresAttValue
+ {
+ 	int			len;			/* length in bytes of the value */
+ 	char	   *value;			/* actual value, plus terminating zero byte */
+ } PGresAttValue;
+ 
  /* ----------------
   * Exported functions of libpq
   * ----------------
*************** extern PGPing PQping(const char *conninf
*** 416,421 ****
--- 432,483 ----
  extern PGPing PQpingParams(const char *const * keywords,
  			 const char *const * values, int expand_dbname);
  
+ /*
+  * Typedef for tuple storage function.
+  *
+  * This function pointer is used for tuple storage function in
+  * PGresult and PGconn.
+  *
+  * addTupleFunction is called for four types of function designated by
+  * the enum AddTupFunc.
+  *
+  * id is the identifier for allocated memory block. The caller sets -1
+  * for PGresAttValue array, and 0 to number of cols - 1 for each
+  * column.
+  *
+  * ADDTUP_ALLOC_TEXT requests the size bytes memory block for a text
+  * value which may not be alingned to the word boundary.
+  *
+  * ADDTUP_ALLOC_BINARY requests the size bytes memory block for a
+  * binary value which is aligned to the word boundary.
+  *
+  * ADDTUP_ADD_TUPLE requests to add tuple data into storage, and
+  * free the memory blocks allocated by this function if necessary.
+  * id and size are ignored.
+  *
+  * This function must return non-NULL value for success and must
+  * return NULL for failure and may set error message by
+  * PQsetAddTupleErrMes in malloc'ed memory. Assumed by caller as out
+  * of memory if the error message is NULL on failure. This function is
+  * assumed not to throw any exception.
+  */
+ 	typedef void *(*addTupleFunction)(PGresult *res, AddTupFunc func,
+ 									  int id, size_t size);
+ 
+ /*
+  * Register alternative tuple storage function to PGconn.
+  * 
+  * By registering this function, pg_result disables its own tuple
+  * storage and calls it to append rows one by one.
+  *
+  * func is tuple store function. See addTupleFunction.
+  * 
+  * addTupFuncParam is contextual storage that can be get with
+  * PQgetAddTupleParam in func.
+  */
+ extern void PQregisterTupleAdder(PGconn *conn, addTupleFunction func,
+ 								 void *addTupFuncParam);
+ 
  /* Force the write buffer to be written (or at least try) */
  extern int	PQflush(PGconn *conn);
  
*************** extern char *PQcmdTuples(PGresult *res);
*** 454,459 ****
--- 516,524 ----
  extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+ extern char *PQgetAsCstring(PGresAttValue *attdesc);
+ extern void *PQgetAddTupleParam(const PGresult *res);
+ extern void	PQsetAddTupleErrMes(PGresult *res, char *mes);
  extern int	PQnparams(const PGresult *res);
  extern Oid	PQparamtype(const PGresult *res, int param_num);
  
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index d967d60..01e8c3e 100644
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
*************** typedef struct pgresParamDesc
*** 134,145 ****
  
  #define NULL_LEN		(-1)	/* pg_result len for NULL value */
  
- typedef struct pgresAttValue
- {
- 	int			len;			/* length in bytes of the value */
- 	char	   *value;			/* actual value, plus terminating zero byte */
- } PGresAttValue;
- 
  /* Typedef for message-field list entries */
  typedef struct pgMessageField
  {
--- 134,139 ----
*************** struct pg_result
*** 209,214 ****
--- 203,213 ----
  	PGresult_data *curBlock;	/* most recently allocated block */
  	int			curOffset;		/* start offset of free space in block */
  	int			spaceLeft;		/* number of free bytes remaining in block */
+ 
+ 	addTupleFunction addTupleFunc; /* Tuple storage function. See
+ 									* addTupleFunction for details. */
+ 	void *addTupleFuncParam;       /* Contextual parameter for addTupleFunc */
+ 	char *addTupleFuncErrMes;      /* Error message returned from addTupFunc */
  };
  
  /* PGAsyncStatusType defines the state of the query-execution state machine */
*************** struct pg_conn
*** 443,448 ****
--- 442,454 ----
  
  	/* Buffer for receiving various parts of messages */
  	PQExpBufferData workBuffer; /* expansible string */
+ 
+     /* Tuple store function. The two fields below is copied to newly
+ 	 * created PGresult if addTupleFunc is not NULL. Use default
+ 	 * function if addTupleFunc is NULL. */
+ 	addTupleFunction addTupleFunc; /* Tuple storage function. See
+ 									* addTupleFunction for details. */
+ 	void *addTupleFuncParam;       /* Contextual parameter for addTupFunc */
  };
  
  /* PGcancel stores all data necessary to cancel a connection. A copy of this
*************** extern void
*** 507,513 ****
  pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
  /* This lets gcc check the format string for consistency. */
  __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
- extern int	pqAddTuple(PGresult *res, PGresAttValue *tup);
  extern void pqSaveMessageField(PGresult *res, char code,
  				   const char *value);
  extern void pqSaveParameterStatus(PGconn *conn, const char *name,
--- 513,518 ----
#21Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Greg Smith (#20)
3 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, This is revised and rebased version of the patch.

a. Old term `Add Tuple Function' is changed to 'Store
Handler'. The reason why not `storage' is simply length of the
symbols.

b. I couldn't find the place to settle PGgetAsCString() in. It is
removed and storeHandler()@dblink.c touches PGresAttValue
directly in this new patch. Definition of PGresAttValue stays
in lipq-fe.h and provided with comment.

c. Refine error handling of dblink.c. I think it preserves the
previous behavior for column number mismatch and type
conversion exception.

d. Document is revised.

It jumped from 332K tuples/sec to 450K, a 35% gain, and had a
lower memory footprint too. Test methodology and those results
are at
http://archives.postgresql.org/pgsql-hackers/2011-12/msg00008.php

It is a disappointment that I found that the gain had become
lower than that according to the re-measuring.

For CentOS6.2 and other conditions are the same to the previous
testing, the overall performance became hihger and the loss of
libpq patch was 1.8% and the gain of full patch had been fallen
to 5.6%. But the reduction of the memory usage was not changed.

Original : 3.96s 100.0%
w/libpq patch : 4.03s 101.8%
w/libpq+dblink patch : 3.74s 94.4%

The attachments are listed below.

libpq_altstore_20120117.patch
- Allow alternative storage for libpql.

dblink_perf_20120117.patch
- Modify dblink to use alternative storage mechanism.

libpq_altstore_doc_20120117.patch
- Document for libpq_altstore. Shows in "31.19. Alternatie result storage"

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_altstore_20120117.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..83525e1 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQregisterStoreHandler	  161
+PQgetStoreHandlerParam	  163
+PQsetStoreHandlerErrMes	  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d454538..5559f0b 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,6 +2692,7 @@ makeEmptyPGconn(void)
 	conn->allow_ssl_try = true;
 	conn->wait_ssl_try = false;
 #endif
+	conn->storeHandler = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5076,3 +5077,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+void
+PQregisterStoreHandler(PGconn *conn, StoreHandler func, void *param)
+{
+	conn->storeHandler = func;
+	conn->storeHandlerParam = param;
+}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..96e5974 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -67,6 +67,10 @@ static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
 
+static void *pqDefaultStoreHandler(PGresult *res, PQStoreFunc func,
+								   int id, size_t len);
+static void *pqAddTuple(PGresult *res, PGresAttValue *tup);
+
 
 /* ----------------
  * Space management for PGresult.
@@ -160,6 +164,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->storeHandler = pqDefaultStoreHandler;
+	result->storeHandlerParam = NULL;
+	result->storeHandlerErrMes = NULL;
 
 	if (conn)
 	{
@@ -194,6 +201,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 			}
 			result->nEvents = conn->nEvents;
 		}
+
+		if (conn->storeHandler)
+		{
+			result->storeHandler = conn->storeHandler;
+			result->storeHandlerParam = conn->storeHandlerParam;
+		}
 	}
 	else
 	{
@@ -487,6 +500,33 @@ PQresultAlloc(PGresult *res, size_t nBytes)
 	return pqResultAlloc(res, nBytes, TRUE);
 }
 
+void *
+pqDefaultStoreHandler(PGresult *res, PQStoreFunc func, int id, size_t len)
+{
+	void *p;
+
+	switch (func)
+	{
+		case PQSF_ALLOC_TEXT:
+			return pqResultAlloc(res, len, TRUE);
+
+		case PQSF_ALLOC_BINARY:
+			p = pqResultAlloc(res, len, FALSE);
+
+			if (id == -1)
+				res->storeHandlerParam = p;
+
+			return p;
+
+		case PQSF_ADD_TUPLE:
+			return pqAddTuple(res, res->storeHandlerParam);
+
+		default:
+			/* Ignore */
+			break;
+	}
+	return NULL;
+}
 /*
  * pqResultAlloc -
  *		Allocate subsidiary storage for a PGresult.
@@ -830,9 +870,9 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
- *	  Returns TRUE if OK, FALSE if not enough memory to add the row
+ *	  Returns tup if OK, NULL if not enough memory to add the row.
  */
-int
+static void *
 pqAddTuple(PGresult *res, PGresAttValue *tup)
 {
 	if (res->ntups >= res->tupArrSize)
@@ -858,13 +898,13 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
 			newTuples = (PGresAttValue **)
 				realloc(res->tuples, newSize * sizeof(PGresAttValue *));
 		if (!newTuples)
-			return FALSE;		/* malloc or realloc failed */
+			return NULL;		/* malloc or realloc failed */
 		res->tupArrSize = newSize;
 		res->tuples = newTuples;
 	}
 	res->tuples[res->ntups] = tup;
 	res->ntups++;
-	return TRUE;
+	return tup;
 }
 
 /*
@@ -2822,6 +2862,35 @@ PQgetisnull(const PGresult *res, int tup_num, int field_num)
 		return 0;
 }
 
+/* PQgetAddStoreHandlerParam
+ *	Get the pointer to the contextual parameter from PGresult which is
+ *	registered to PGconn by PQregisterStoreHandler
+ */
+void *
+PQgetStoreHandlerParam(const PGresult *res)
+{
+	if (!res)
+		return NULL;
+	return res->storeHandlerParam;
+}
+
+/* PQsetStorHandlerErrMes
+ *	Set the error message pass back to the caller of StoreHandler.
+ *
+ *  mes must be a malloc'ed memory block and it will be released by
+ *  the caller of StoreHandler.  You can replace the previous message
+ *  by alternative mes, or clear it with NULL. The previous one will
+ *  be freed internally.
+ */
+void
+PQsetStoreHandlerErrMes(PGresult *res, char *mes)
+{
+	/* Free existing message */
+	if (res->storeHandlerErrMes)
+		free(res->storeHandlerErrMes);
+	res->storeHandlerErrMes = mes;
+}
+
 /* PQnparams:
  *	returns the number of input parameters of a prepared statement.
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..205502b 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -733,9 +733,10 @@ getAnotherTuple(PGconn *conn, bool binary)
 	if (conn->curTuple == NULL)
 	{
 		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
+			result->storeHandler(result, PQSF_ALLOC_BINARY, -1,
+								 nfields * sizeof(PGresAttValue));
 		if (conn->curTuple == NULL)
-			goto outOfMemory;
+			goto addTupleError;
 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
 
 		/*
@@ -757,7 +758,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+			goto addTupleError;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -787,9 +788,12 @@ getAnotherTuple(PGconn *conn, bool binary)
 				vlen = 0;
 			if (tup[i].value == NULL)
 			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
+				PQStoreFunc func = 
+					(binary ? PQSF_ALLOC_BINARY : PQSF_ALLOC_TEXT);
+				tup[i].value =
+					(char *) result->storeHandler(result, func, i, vlen + 1);
 				if (tup[i].value == NULL)
-					goto outOfMemory;
+					goto addTupleError;
 			}
 			tup[i].len = vlen;
 			/* read in the value */
@@ -812,8 +816,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
+	if (!result->storeHandler(result, PQSF_ADD_TUPLE, 0, 0))
+		goto addTupleError;
+
 	/* and reset for a new message */
 	conn->curTuple = NULL;
 
@@ -821,7 +826,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 		free(bitmap);
 	return 0;
 
-outOfMemory:
+addTupleError:
 	/* Replace partially constructed result with an error result */
 
 	/*
@@ -829,8 +834,21 @@ outOfMemory:
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->storeHandlerErrMes ?
+									   result->storeHandlerErrMes :
+									   "out of memory for query result\n"));
+	if (result->storeHandlerErrMes)
+	{
+		free(result->storeHandlerErrMes);
+		result->storeHandlerErrMes = NULL;
+	}
 
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..117c38a 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -634,9 +634,10 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	if (conn->curTuple == NULL)
 	{
 		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
+			result->storeHandler(result, PQSF_ALLOC_BINARY, -1,
+								 nfields * sizeof(PGresAttValue));
 		if (conn->curTuple == NULL)
-			goto outOfMemory;
+			goto addTupleError;
 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
 	}
 	tup = conn->curTuple;
@@ -673,11 +674,12 @@ getAnotherTuple(PGconn *conn, int msgLength)
 			vlen = 0;
 		if (tup[i].value == NULL)
 		{
-			bool		isbinary = (result->attDescs[i].format != 0);
-
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
+			PQStoreFunc func = (result->attDescs[i].format != 0 ?
+								PQSF_ALLOC_BINARY : PQSF_ALLOC_TEXT);
+			tup[i].value =
+				(char *) result->storeHandler(result, func, i, vlen + 1);
 			if (tup[i].value == NULL)
-				goto outOfMemory;
+				goto addTupleError;
 		}
 		tup[i].len = vlen;
 		/* read in the value */
@@ -689,22 +691,36 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
+	if (!result->storeHandler(result, PQSF_ADD_TUPLE, 0, 0))
+		goto addTupleError;
+	
 	/* and reset for a new message */
 	conn->curTuple = NULL;
 
 	return 0;
 
-outOfMemory:
+addTupleError:
 
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->storeHandlerErrMes ?
+									   result->storeHandlerErrMes : 
+									   "out of memory for query result\n"));
+	if (result->storeHandlerErrMes)
+	{
+		free(result->storeHandlerErrMes);
+		result->storeHandlerErrMes = NULL;
+	}
 	pqSaveErrorResult(conn);
 
 	/* Discard the failed message by pretending we read it */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..6d86fa0 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -116,6 +116,16 @@ typedef enum
 	PQPING_NO_ATTEMPT			/* connection not attempted (bad params) */
 } PGPing;
 
+/* PQStoreFunc is the enum for one of the parameters of storeHandler
+ * that decides what to do. See the typedef StoreHandler for
+ * details */
+typedef enum 
+{
+	PQSF_ALLOC_TEXT,          /* Requested non-aligned memory for text value */
+	PQSF_ALLOC_BINARY,        /* Requested aligned memory for binary value */
+	PQSF_ADD_TUPLE            /* Requested to add tuple data into store */
+} PQStoreFunc;
+
 /* PGconn encapsulates a connection to the backend.
  * The contents of this struct are not supposed to be known to applications.
  */
@@ -149,6 +159,15 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGresAttValue represents a value of one tuple field in string form.
+   NULL is represented as len < 0. Otherwise value points to a null
+   terminated C string with the length of len. */
+typedef struct pgresAttValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, plus terminating zero byte */
+} PGresAttValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +435,52 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative result store handler.
+ *
+ * This function pointer is used for alternative result store handler
+ * callback in PGresult and PGconn.
+ *
+ * StoreHandler is called for three functions designated by the enum
+ * PQStoreFunc.
+ *
+ * id is the identifier for allocated memory block. The caller sets -1
+ * for PGresAttValue array, and 0 to number of cols - 1 for each
+ * column.
+ *
+ * PQSF_ALLOC_TEXT requests the size bytes memory block for a text
+ * value which may not be alingned to the word boundary.
+ *
+ * PQSF_ALLOC_BINARY requests the size bytes memory block for a binary
+ * value which is aligned to the word boundary.
+ *
+ * PQSF_ADD_TUPLE requests to add tuple data into the result store,
+ * and free the memory blocks allocated by this function if necessary.
+ * id and size are to be ignored for this function.
+ *
+ * This function must return non-NULL value for success and must
+ * return NULL for failure and may set error message by
+ * PQsetStoreHandlerErrMes. It is assumed by caller as out of memory
+ * when the error message is NULL on failure. This function is assumed
+ * not to throw any exception.
+ */
+typedef void *(*StoreHandler)(PGresult *res, PQStoreFunc func,
+							  int id, size_t size);
+
+/*
+ * Register alternative result store function to PGconn.
+ * 
+ * By registering this function, pg_result disables its own result
+ * store and calls it to append rows one by one.
+ *
+ * func is tuple store function. See the typedef StoreHandler.
+ * 
+ * storeHandlerParam is the contextual variable that can be get with
+ * PQgetStoreHandlerParam in StoreHandler.
+ */
+extern void PQregisterStoreHandler(PGconn *conn, StoreHandler func,
+								   void *storeHandlerParam);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +519,8 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void *PQgetStoreHandlerParam(const PGresult *res);
+extern void	PQsetStoreHandlerErrMes(PGresult *res, char *mes);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index d967d60..e28e712 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -134,12 +134,6 @@ typedef struct pgresParamDesc
 
 #define NULL_LEN		(-1)	/* pg_result len for NULL value */
 
-typedef struct pgresAttValue
-{
-	int			len;			/* length in bytes of the value */
-	char	   *value;			/* actual value, plus terminating zero byte */
-} PGresAttValue;
-
 /* Typedef for message-field list entries */
 typedef struct pgMessageField
 {
@@ -209,6 +203,11 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
+
+	StoreHandler storeHandler;  /* Result store handler. See
+								 * StoreHandler for details. */
+	void *storeHandlerParam;    /* Contextual parameter for storeHandler */
+	char *storeHandlerErrMes;   /* Error message from storeHandler */
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -443,6 +442,13 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+    /* Tuple store handler. The two fields below is copied to newly
+	 * created PGresult if tupStoreHandler is not NULL. Use default
+	 * function if NULL. */
+	StoreHandler storeHandler;   /* Result store handler. See
+								  * StoreHandler for details. */
+	void *storeHandlerParam;  /* Contextual parameter for storeHandler */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -507,7 +513,6 @@ extern void
 pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /* This lets gcc check the format string for consistency. */
 __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
-extern int	pqAddTuple(PGresult *res, PGresAttValue *tup);
 extern void pqSaveMessageField(PGresult *res, char code,
 				   const char *value);
 extern void pqSaveParameterStatus(PGconn *conn, const char *name,
dblink_perf_20120117.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..a8685a9 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,24 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	AttInMetadata *attinmeta;
+	MemoryContext oldcontext;
+	char *attrvalbuf;
+	void **valbuf;
+	size_t *valbufsize;
+	bool error_occurred;
+	bool nummismatch;
+	ErrorData *edata;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +103,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static void *storeHandler(PGresult *res, PQStoreFunc func, int id, size_t size);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +520,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +577,36 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterStoreHandler(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+		else if (storeinfo.edata)
+			ReThrowError(storeinfo.edata);
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +618,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +679,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +755,213 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterStoreHandler(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			else if (storeinfo.edata)
+				ReThrowError(storeinfo.edata);
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	TupleDesc	tupdesc;
+	int i;
+	
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
+	
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->edata = NULL;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->valbuf = (void **)malloc(sinfo->nattrs * sizeof(void *));
+	sinfo->valbufsize = (size_t *)malloc(sinfo->nattrs * sizeof(size_t));
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbufsize[i] = 0;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	/* Preallocate memory of same size with PGresAttDesc array for values. */
+	sinfo->attrvalbuf = (char *) malloc(sinfo->nattrs * sizeof(PGresAttValue));
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		if (sinfo->valbuf[i])
+		{
+			free(sinfo->valbuf[i]);
+			sinfo->valbuf[i] = NULL;
 		}
+	}
+	if (sinfo->attrvalbuf)
+		free(sinfo->attrvalbuf);
+	sinfo->attrvalbuf = NULL;
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+static void *
+storeHandler(PGresult *res, PQStoreFunc  func, int id, size_t size)
+{
+	storeInfo *sinfo = (storeInfo *)PQgetStoreHandlerParam(res);
+	HeapTuple	tuple;
+	int fields = PQnfields(res);
+	int i;
+	PGresAttValue *attval;
+	char        **cstrs;
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->error_occurred)
+		return NULL;
+
+	switch (func)
+	{
+		case PQSF_ALLOC_TEXT:
+		case PQSF_ALLOC_BINARY:
+			if (id == -1)
+				return sinfo->attrvalbuf;
 
-			values = (char **) palloc(nfields * sizeof(char *));
+			if (id < 0 || id >= sinfo->nattrs)
+				return NULL;
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
+			if (sinfo->valbufsize[id] < size)
 			{
-				HeapTuple	tuple;
+				if (sinfo->valbuf[id] == NULL)
+					sinfo->valbuf[id] = malloc(size);
+				else
+					sinfo->valbuf[id] = realloc(sinfo->valbuf[id], size);
+				sinfo->valbufsize[id] = size;
+			}
+			return sinfo->valbuf[id];
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+		case PQSF_ADD_TUPLE:
+			break;   /* Go through */
+		default:
+			/* Ignore */
+			break;
+	}
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
-			}
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return NULL;
+	}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+	/*
+	 * Rewrite PGresAttValue[] to char(*)[] in-place.
+	 */
+	Assert(sizeof(char*) <= sizeof(PGresAttValue));
 
-		PQclear(res);
+	attval = (PGresAttValue *)sinfo->attrvalbuf;
+	cstrs   = (char **)sinfo->attrvalbuf;
+	for(i = 0 ; i < fields ; i++)
+	{
+		if (attval->len < 0)
+			cstrs[i] = NULL;
+		else
+			cstrs[i] = attval->value;
+	}
+
+	PG_TRY();
+	{
+		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+		tuplestore_puttuple(sinfo->tuplestore, tuple);
 	}
 	PG_CATCH();
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		MemoryContext context;
+		/*
+		 * Store exception for later ReThrow and cancel the exception.
+		 */
+		sinfo->error_occurred = TRUE;
+		context = MemoryContextSwitchTo(sinfo->oldcontext);
+		sinfo->edata = CopyErrorData();
+		MemoryContextSwitchTo(context);
+		FlushErrorState();
+
+		return NULL;
 	}
 	PG_END_TRY();
+
+	return sinfo->attrvalbuf;
 }
 
 /*
libpq_altstore_doc_20120117.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..8803999 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,293 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-alterstorage">
+  <title>Alternative result storage</title>
+
+  <indexterm zone="libpq-alterstorage">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, users can get the result of command
+   execution from <structname>PGresult</structname> aquired
+   with <function>PGgetResult</function>
+   from <structname>PGConn</structname>. While the memory areas for
+   the PGresult are allocated with malloc() internally within calls of
+   command execution functions such as <function>PQexec</function>
+   and <function>PQgetResult</function>. If you have difficulties to
+   handle the result records in the form of PGresult, you can instruct
+   PGconn to store them into your own storage instead of PGresult.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-registerstorehandler">
+    <term>
+     <function>PQregisterStoreHandler</function>
+     <indexterm>
+      <primary>PQregisterStoreHandler</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to allocate memory for each tuple and
+       column values, and add the complete tuple into the alternative
+       result storage.
+<synopsis>
+void PQregisterStoreHandler(PGconn *conn,
+                            StoreHandler func,
+                            void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the storage handler
+	       function. PGresult created from this connection calls this
+	       function to store the result instead of storing into its
+	       internal storage.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default storage.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>. You can get this poiner
+	       in <type>StoreHandler</type>
+	       by <function>PQgetStoreHandlerParam</function>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-storehandler">
+    <term>
+     <type>Storehandler</type>
+     <indexterm>
+      <primary>StoreHandler</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the storage handler callback function.
+<synopsis>
+typedef enum 
+{
+  PQSF_ALLOC_TEXT,
+  PQSF_ALLOC_BINARY,
+  PQSF_ADD_TUPLE
+} PQStoreFunc;
+
+void *(*StoreHandler)(PGresult *res,
+                      PQStoreFunc func,
+                      int id,
+                      size_t size);
+</synopsis>
+     </para>
+
+     <para>
+       Generally this function must return NULL for failure and should
+       set the error message
+       with <function>PGsetStoreHandlerErrMes</function> if the cause
+       is other than out of memory. This funcion must not throw any
+       exception. This function is called in the sequence following.
+
+       <itemizedlist spacing="compact">
+	 <listitem>
+	   <simpara>Call with <parameter>func</parameter>
+	   = <firstterm>PQSF_ALLOC_BINARY</firstterm>
+	   and <parameter>id</parameter> = -1 to request the memory
+	   for a tuple to be used as an array
+	   of <type>PGresAttValue</type>. </simpara>
+	 </listitem>
+	 <listitem>
+	   <simpara>Call with <parameter>func</parameter>
+	   = <firstterm>PQSF_ALLOC_TEXT</firstterm>
+	   or <firstterm>PQSF_ALLOC_BINARY</firstterm>
+	   and <parameter>id</parameter> is zero to the number of columns
+	   - 1 to request the memory for each column value in current
+	   tuple.</simpara>
+	 </listitem>
+	 <listitem>
+	   <simpara>Call with <parameter>func</parameter>
+	   = <firstterm>PQSF_ADD_TUPLE</firstterm> to request the
+	   constructed tuple to be stored.</simpara>
+	 </listitem>
+       </itemizedlist>
+     </para>
+     <para>
+       Calling <type>StoreHandler</type>
+       with <parameter>func</parameter> =
+       <firstterm>PQSF_ALLOC_TEXT</firstterm> is telling to return a
+        memory block with at least <parameter>size</parameter> bytes
+        which may not be aligned to the word boundary.
+       <parameter>id</parameter> is a zero or positive number
+       distinguishes the usage of requested memory block, that is the
+       position of the column for which the memory block is used.
+     </para>
+     <para>
+       When <parameter>func</parameter>
+       = <firstterm>PQSF_ALLOC_BINARY</firstterm>, this function is
+       telled to return a memory block with at
+       least <parameter>size</parameter> bytes which is aligned to the
+       word boundary.
+       <parameter>id</parameter> is the identifier distinguishes the
+       usage of requested memory block. -1 means that it is used as an
+       array of <type>PGresAttValue</type> to store the tuple. Zero or
+       positive numbers have the same meanings as for
+       <firstterm>PQSF_ALLOC_BINARY</firstterm>.
+     </para>
+     <para>When <parameter>func</parameter>
+       = <firstterm>PQSF_ADD_TUPLE</firstterm>, this function is
+       telled to store the <type>PGresAttValue</type> structure
+       constructed by the caller into your storage. The pointer to the
+       tuple structure is not passed so you should memorize the
+       pointer to the memory block passed back the caller on
+       <parameter>func</parameter>
+       = <parameter>PQSF_ALLOC_BINARY</parameter>
+       with <parameter>id</parameter> is -1. This function must return
+       any non-NULL values for success. You must properly put back the
+       memory blocks passed to the caller in this function if needed.
+     </para>
+     <variablelist>
+       <varlistentry>
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+	 <term><parameter>func</parameter></term>
+	 <listitem>
+	   <para>
+	     An <type>enum</type> value telling the function to perform.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to contextual parameter passed to func.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetstorehandlerparam">
+    <term>
+     <function>PQgetStoreHandlerParam</function>
+     <indexterm>
+      <primary>PQgetStoreHandlerParam</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Get the pointer passed to <function>PQregisterStoreHandler</function>
+	as <parameter>param</parameter>.
+<synopsis>
+void *PQgetStoreHandlerParam(PGresult *res)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetstorehandlererrmes">
+    <term>
+     <function>PQsetStoreHandlerErrMes</function>
+     <indexterm>
+      <primary>PQsetStoreHandlerErrMes</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>StoreHandler</type>.  If this message is not set, the
+	caller assumes the error to be out of memory.
+<synopsis>
+void PQsetStoreHandlerErrMes(PGresult *res, char *mes)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>StoreHandler</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>mes</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the memory block containing the error
+		message, which is allocated
+		by <function>malloc()</function>. The memory block
+		will be freed with <function>free()</function> in the
+		caller of
+		<type>StoreHandler</type> only if it returns NULL.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it is freed and then the given message is set. Set NULL
+		to cancel the the costom message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
#22Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#21)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Jan 17, 2012 at 05:53:33PM +0900, Kyotaro HORIGUCHI wrote:

Hello, This is revised and rebased version of the patch.

a. Old term `Add Tuple Function' is changed to 'Store
Handler'. The reason why not `storage' is simply length of the
symbols.

b. I couldn't find the place to settle PGgetAsCString() in. It is
removed and storeHandler()@dblink.c touches PGresAttValue
directly in this new patch. Definition of PGresAttValue stays
in lipq-fe.h and provided with comment.

c. Refine error handling of dblink.c. I think it preserves the
previous behavior for column number mismatch and type
conversion exception.

d. Document is revised.

First, my priority is one-the-fly result processing,
not the allocation optimizing. And this patch seems to make
it possible, I can process results row-by-row, without the
need to buffer all of them in PQresult. Which is great!

But the current API seems clumsy, I guess its because the
patch grew from trying to replace the low-level allocator.

I would like to propose better one-shot API with:

void *(*RowStoreHandler)(PGresult *res, PGresAttValue *columns);

where the PGresAttValue * is allocated once, inside PQresult.
And the pointers inside point directly to network buffer.
Ofcourse this requires replacing the current per-column malloc+copy
pattern with per-row parse+handle pattern, but I think resulting
API will be better:

1) Pass-through processing do not need to care about unnecessary
per-row allocations.

2) Handlers that want to copy of the row (like regular libpq),
can optimize allocations by having "global" view of the row.
(Eg. One allocation for row header + data).

This also optimizes call patterns - first libpq parses packet,
then row handler processes row, no unnecessary back-and-forth.

Summary - current API has various assumptions how the row is
processed, let's remove those.

--
marko

#23Marc Mamin
M.Mamin@intershop.de
In reply to: Marko Kreen (#22)
Re: Speed dblink using alternate libpq tuple storage

c. Refine error handling of dblink.c. I think it preserves the
previous behavior for column number mismatch and type
conversion exception.

Hello,

I don't know if this cover following issue.
I just mention it for the case you didn't notice it and would like to
handle this rather cosmetic issue as well.

http://archives.postgresql.org/pgsql-bugs/2011-08/msg00113.php

best regards,

Marc Mamin

#24Marko Kreen
markokr@gmail.com
In reply to: Marc Mamin (#23)
Re: Speed dblink using alternate libpq tuple storage

On Sat, Jan 21, 2012 at 1:52 PM, Marc Mamin <M.Mamin@intershop.de> wrote:

c. Refine error handling of dblink.c. I think it preserves the
   previous behavior for column number mismatch and type
   conversion exception.

Hello,

I don't know if this cover following issue.
I just mention it for the case you didn't notice it and would like to
handle this rather cosmetic issue as well.

http://archives.postgresql.org/pgsql-bugs/2011-08/msg00113.php

It is not relevant to this thread, but seems good idea to implement indeed.
It should be simple matter of creating handler that uses dblink_res_error()
to report the notice.

Perhaps you could create and submit the patch by yourself?

For reference, here it the full flow in PL/Proxy:

1) PQsetNoticeReceiver:
https://github.com/markokr/plproxy-dev/blob/master/src/execute.c#L422
2) handle_notice:
https://github.com/markokr/plproxy-dev/blob/master/src/execute.c#L370
3) plproxy_remote_error:
https://github.com/markokr/plproxy-dev/blob/master/src/main.c#L82

--
marko

#25Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#22)
Re: Speed dblink using alternate libpq tuple storage

Thank you for the comment,

First, my priority is one-the-fly result processing,
not the allocation optimizing. And this patch seems to make
it possible, I can process results row-by-row, without the
need to buffer all of them in PQresult. Which is great!

But the current API seems clumsy, I guess its because the
patch grew from trying to replace the low-level allocator.

Exactly.

I would like to propose better one-shot API with:

void *(*RowStoreHandler)(PGresult *res, PGresAttValue *columns);

where the PGresAttValue * is allocated once, inside PQresult.
And the pointers inside point directly to network buffer.

Good catch, thank you. The patch is dragging too much from the
old implementation. It is no need to copy the data inside
getAnotherTuple to do it, as you say.

Ofcourse this requires replacing the current per-column malloc+copy
pattern with per-row parse+handle pattern, but I think resulting
API will be better:

1) Pass-through processing do not need to care about unnecessary
per-row allocations.

2) Handlers that want to copy of the row (like regular libpq),
can optimize allocations by having "global" view of the row.
(Eg. One allocation for row header + data).

This also optimizes call patterns - first libpq parses packet,
then row handler processes row, no unnecessary back-and-forth.

Summary - current API has various assumptions how the row is
processed, let's remove those.

Thank you, I rewrite the patch to make it realize.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#26Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#25)
3 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, This is a new version of the patch formerly known as
'alternative storage for libpq'.

- Changed the concept to 'Alternative Row Processor' from
'Storage handler'. Symbol names are also changed.

- Callback function is modified following to the comment.

- From the restriction of time, I did minimum check for this
patch. The purpose of this patch is to show the new implement.

- Proformance is not measured for this patch for the same
reason. I will do that on next monday.

- The meaning of PGresAttValue is changed. The field 'value' now
contains a value withOUT terminating zero. This change seems to
have no effect on any other portion within the whole source
tree of postgresql from what I've seen.

I would like to propose better one-shot API with:

void *(*RowStoreHandler)(PGresult *res, PGresAttValue *columns);

...

1) Pass-through processing do not need to care about unnecessary
per-row allocations.

2) Handlers that want to copy of the row (like regular libpq),
can optimize allocations by having "global" view of the row.
(Eg. One allocation for row header + data).

I expect the new implementation is far more better than the
orignal.

regargs,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120127.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..c47af3a 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQregisterRowProcessor	  161
+PQgetRowProcessorParam	  163
+PQsetRowProcessorErrMes	  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d454538..93803d5 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,6 +2692,7 @@ makeEmptyPGconn(void)
 	conn->allow_ssl_try = true;
 	conn->wait_ssl_try = false;
 #endif
+	conn->rowProcessor = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5076,3 +5077,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+void
+PQregisterRowProcessor(PGconn *conn, RowProcessor func, void *param)
+{
+	conn->rowProcessor = func;
+	conn->rowProcessorParam = param;
+}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..5d78b39 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,7 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
-
+static void *pqAddTuple(PGresult *res, PGresAttValue *columns);
 
 /* ----------------
  * Space management for PGresult.
@@ -160,6 +160,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->rowProcessor = pqAddTuple;
+	result->rowProcessorParam = NULL;
+	result->rowProcessorErrMes = NULL;
 
 	if (conn)
 	{
@@ -194,6 +197,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 			}
 			result->nEvents = conn->nEvents;
 		}
+
+		if (conn->rowProcessor)
+		{
+			result->rowProcessor = conn->rowProcessor;
+			result->rowProcessorParam = conn->rowProcessorParam;
+		}
 	}
 	else
 	{
@@ -445,7 +454,7 @@ PQsetvalue(PGresult *res, int tup_num, int field_num, char *value, int len)
 		}
 
 		/* add it to the array */
-		if (!pqAddTuple(res, tup))
+		if (pqAddTuple(res, tup) == NULL)
 			return FALSE;
 	}
 
@@ -701,7 +710,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +764,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -829,12 +836,17 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 
 /*
  * pqAddTuple
- *	  add a row pointer to the PGresult structure, growing it if necessary
- *	  Returns TRUE if OK, FALSE if not enough memory to add the row
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns the pointer to the new tuple if OK, NULL if not enough
+ *	  memory to add the row.
  */
-int
-pqAddTuple(PGresult *res, PGresAttValue *tup)
+void *
+pqAddTuple(PGresult *res, PGresAttValue *columns)
 {
+	PGresAttValue *tup;
+	int nfields = res->numAttributes;
+	int i;
+
 	if (res->ntups >= res->tupArrSize)
 	{
 		/*
@@ -858,13 +870,39 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
 			newTuples = (PGresAttValue **)
 				realloc(res->tuples, newSize * sizeof(PGresAttValue *));
 		if (!newTuples)
-			return FALSE;		/* malloc or realloc failed */
+			return NULL;		/* malloc or realloc failed */
 		res->tupArrSize = newSize;
 		res->tuples = newTuples;
 	}
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL) return NULL;
+	memcpy(tup, columns, nfields * sizeof(PGresAttValue));
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value =
+				(char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+				return NULL;
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
 	res->tuples[res->ntups] = tup;
 	res->ntups++;
-	return TRUE;
+	return tup;
 }
 
 /*
@@ -1223,7 +1261,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -2822,6 +2859,35 @@ PQgetisnull(const PGresult *res, int tup_num, int field_num)
 		return 0;
 }
 
+/* PQgetAddRowProcessorParam
+ *	Get the pointer to the contextual parameter from PGresult which is
+ *	registered to PGconn by PQregisterRowProcessor
+ */
+void *
+PQgetRowProcessorParam(const PGresult *res)
+{
+	if (!res)
+		return NULL;
+	return res->rowProcessorParam;
+}
+
+/* PQsetRowProcessorErrMes
+ *	Set the error message pass back to the caller of RowProcessor.
+ *
+ *  mes must be a malloc'ed memory block and it will be released by
+ *  the caller of RowProcessor.  You can replace the previous message
+ *  by alternative mes, or clear it with NULL. The previous one will
+ *  be freed internally.
+ */
+void
+PQsetRowProcessorErrMes(PGresult *res, char *mes)
+{
+	/* Free existing message */
+	if (res->rowProcessorErrMes)
+		free(res->rowProcessorErrMes);
+	res->rowProcessorErrMes = mes;
+}
+
 /* PQnparams:
  *	returns the number of input parameters of a prepared statement.
  */
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..546534a 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
@@ -238,6 +257,7 @@ pqPutnchar(const char *s, size_t len, PGconn *conn)
 	return 0;
 }
 
+
 /*
  * pqGetInt
  *	read a 2 or 4 byte integer and convert from network byte order
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..9abbb29 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -715,7 +715,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGresAttValue tup[result->numAttributes];
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
@@ -729,26 +729,11 @@ getAnotherTuple(PGconn *conn, bool binary)
 
 	result->binary = binary;
 
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	if (binary)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
-		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
-		}
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,7 +742,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+			goto rowProcessError;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -785,19 +770,17 @@ getAnotherTuple(PGconn *conn, bool binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
+
+			/*
+			 * Buffer content may be shifted on reloading data. So we must
+			 * set the pointer to the value on every scan.
+			 */
+			tup[i].value = conn->inBuffer + conn->inCursor;
 			tup[i].len = vlen;
-			/* read in the value */
+			/* Skip the value */
 			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
+				if (pqSkipnchar(vlen, conn))
 					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
 		/* advance the bitmap stuff */
 		bitcnt++;
@@ -812,16 +795,15 @@ getAnotherTuple(PGconn *conn, bool binary)
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	if (!result->rowProcessor(result, tup))
+		goto rowProcessError;
 
 	if (bitmap != std_bitmap)
 		free(bitmap);
 	return 0;
 
-outOfMemory:
+rowProcessError:
+	
 	/* Replace partially constructed result with an error result */
 
 	/*
@@ -829,8 +811,21 @@ outOfMemory:
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->rowProcessorErrMes ?
+									   result->rowProcessorErrMes :
+									   "out of memory for query result\n"));
+	if (result->rowProcessorErrMes)
+	{
+		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
 
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..18342c7 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -625,22 +625,12 @@ getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGresAttValue tup[result->numAttributes];
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
 
 	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
-
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
 		return EOF;
@@ -671,40 +661,46 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		}
 		if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
-		}
-		tup[i].len = vlen;
-		/* read in the value */
+		/*
+		 * Buffer content may be shifted on reloading data. So we must
+		 * set the pointer to the value every scan.
+		 */
+		tup[i].value = conn->inBuffer + conn->inCursor;
+ 		tup[i].len = vlen;
 		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
+			if (pqSkipnchar(vlen, conn))
 				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
 	}
 
 	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
-
+	if (!result->rowProcessor(result, tup))
+		goto rowProcessError;
+	
 	return 0;
 
-outOfMemory:
+rowProcessError:
 
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->rowProcessorErrMes ?
+									   result->rowProcessorErrMes : 
+									   "out of memory for query result\n"));
+	if (result->rowProcessorErrMes)
+	{
+		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
 	pqSaveErrorResult(conn);
 
 	/* Discard the failed message by pretending we read it */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..0931211 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,15 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGresAttValue represents a value of one tuple field in string form.
+   NULL is represented as len < 0. Otherwise value points to a string
+   without null termination of the length of len. */
+typedef struct pgresAttValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGresAttValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +425,31 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * This function must return non-NULL value for success and must
+ * return NULL for failure and may set error message by
+ * PQsetRowProcessorErrMes. It is assumed by caller as out of memory
+ * when the error message is NULL on failure. This function is assumed
+ * not to throw any exception.
+ */
+typedef void *(*RowProcessor)(PGresult *res, PGresAttValue *columns);
+
+/*
+ * Register alternative result store function to PGconn.
+ * 
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ * 
+ * rowProcessorParam is the contextual variable that can be get with
+ * PQgetRowProcessorParam in RowProcessor.
+ */
+extern void PQregisterRowProcessor(PGconn *conn, RowProcessor func,
+								   void *rowProcessorParam);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +488,8 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void *PQgetRowProcessorParam(const PGresult *res);
+extern void	PQsetRowProcessorErrMes(PGresult *res, char *mes);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index d967d60..51ac927 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -134,12 +134,6 @@ typedef struct pgresParamDesc
 
 #define NULL_LEN		(-1)	/* pg_result len for NULL value */
 
-typedef struct pgresAttValue
-{
-	int			len;			/* length in bytes of the value */
-	char	   *value;			/* actual value, plus terminating zero byte */
-} PGresAttValue;
-
 /* Typedef for message-field list entries */
 typedef struct pgMessageField
 {
@@ -209,6 +203,11 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
+
+	RowProcessor rowProcessor;  /* Result row processor handler. See
+								 * RowProcessor for details. */
+	void *rowProcessorParam;    /* Contextual parameter for rowProcessor */
+	char *rowProcessorErrMes;   /* Error message from rowProcessor */
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -398,7 +397,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +441,13 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+    /* Tuple store handler. The two fields below is copied to newly
+	 * created PGresult if rowProcessor is not NULL. Use default
+	 * function if NULL. */
+	RowProcessor rowProcessor;   /* Result row processor. See
+								  * RowProcessor for details. */
+	void *rowProcessorParam;     /* Contextual parameter for rowProcessor */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -507,7 +512,6 @@ extern void
 pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /* This lets gcc check the format string for consistency. */
 __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
-extern int	pqAddTuple(PGresult *res, PGresAttValue *tup);
 extern void pqSaveMessageField(PGresult *res, char code,
 				   const char *value);
 extern void pqSaveParameterStatus(PGconn *conn, const char *name,
@@ -560,6 +564,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120127.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..9ad3bfd 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,215 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-alterrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-alterrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, users can get the result of command
+   execution from <structname>PGresult</structname> aquired
+   with <function>PGgetResult</function>
+   from <structname>PGConn</structname>. While the memory areas for
+   the PGresult are allocated with malloc() internally within calls of
+   command execution functions such as <function>PQexec</function>
+   and <function>PQgetResult</function>. If you have difficulties to
+   handle the result records in the form of PGresult, you can instruct
+   PGconn to pass every row to your own row processor instead of
+   storing into PGresult.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-registerrowprocessor">
+    <term>
+     <function>PQregisterRowProcessor</function>
+     <indexterm>
+      <primary>PQregisterRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQregisterRowProcessor(PGconn *conn,
+                            RowProcessor func,
+                            void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the storage handler
+	       function. PGresult created from this connection calls this
+	       function to process each row.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>. You can get this pointer
+	       in <type>RowProcessor</type>
+	       by <function>PQgetRowProcessorParam</function>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-rowprocessor">
+    <term>
+     <type>RowProcessor</type>
+     <indexterm>
+      <primary>RowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+void *(*RowProcessor)(PGresult *res,
+                      PGresAttValue *columns);
+</synopsis>
+     </para>
+
+     <para>
+       Generally this function must return NULL for failure and should
+       set the error message
+       with <function>PGsetRowProcessorErrMes</function> if the cause
+       is other than out of memory. This funcion must not throw any
+       exception.
+     </para>
+     <variablelist>
+       <varlistentry>
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     An column values of the row to process.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprocessorparam">
+    <term>
+     <function>PQgetRowProcessorParam</function>
+     <indexterm>
+      <primary>PQgetRowProcessorParam</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Get the pointer passed to <function>PQregisterRowProcessor</function>
+	as <parameter>param</parameter>.
+<synopsis>
+void *PQgetRowProcessorParam(PGresult *res)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessorerrmes">
+    <term>
+     <function>PQsetRowProcessorErrMes</function>
+     <indexterm>
+      <primary>PQsetRowProcessorErrMes</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>RowProcessor</type>.  If this message is not set, the
+	caller assumes the error to be out of memory.
+<synopsis>
+void PQsetRowProcessorErrMes(PGresult *res, char *mes)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>RowProcessor</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>mes</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer  to the memory  block containing the  error message,
+		which  is   allocated  by  <function>malloc()</function>.  The
+		memory block will be freed with <function>free()</function> in
+		the caller of <type>RowProcessor</type> only if it returns NULL.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it is freed and then the given message is set. Set NULL
+		to cancel the the costom message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120127.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..195ad21 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	bool error_occurred;
+	bool nummismatch;
+	ErrorData *edata;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static void *storeHandler(PGresult *res, PGresAttValue *columns);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+		else if (storeinfo.edata)
+			ReThrowError(storeinfo.edata);
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +754,205 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			else if (storeinfo.edata)
+				ReThrowError(storeinfo.edata);
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
-
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	TupleDesc	tupdesc;
+	int i;
+	
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
+	
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->edata = NULL;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **) malloc(sinfo->nattrs * sizeof(char*));
+	sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			is_sql_cmd = false;
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
 			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
+				free(sinfo->valbuf[i]);
+				sinfo->valbuf[i] = NULL;
 			}
-
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
-
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+static void *
+storeHandler(PGresult *res, PGresAttValue *columns)
+{
+	storeInfo *sinfo = (storeInfo *)PQgetRowProcessorParam(res);
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      *cstrs[PQnfields(res)];
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+	if (sinfo->error_occurred)
+		return NULL;
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return NULL;
+	}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+	/*
+	 * value input functions assumes that the value string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			if (sinfo->valbuf[i] == NULL)
+			{
+				sinfo->valbuf[i] = (char *)malloc(len + 1);
+				sinfo->valbuflen[i] = len + 1;
+			}
+			else if (sinfo->valbuflen[i] < len + 1)
+			{
+				sinfo->valbuf[i] = (char *)realloc(sinfo->valbuf[i], len + 1);
+				sinfo->valbuflen[i] = len + 1;
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
 		}
+	}
 
-		PQclear(res);
+	PG_TRY();
+	{
+		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+		tuplestore_puttuple(sinfo->tuplestore, tuple);
 	}
 	PG_CATCH();
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		MemoryContext context;
+		/*
+		 * Store exception for later ReThrow and cancel the exception.
+		 */
+		sinfo->error_occurred = TRUE;
+		context = MemoryContextSwitchTo(sinfo->oldcontext);
+		sinfo->edata = CopyErrorData();
+		MemoryContextSwitchTo(context);
+		FlushErrorState();
+
+		return NULL;
 	}
 	PG_END_TRY();
+
+	return columns;
 }
 
 /*
#27Merlin Moncure
mmoncure@gmail.com
In reply to: Kyotaro HORIGUCHI (#26)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Jan 27, 2012 at 2:57 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:

Hello, This is a new version of the patch formerly known as
'alternative storage for libpq'.

I took a quick look at the patch and the docs. Looks good and agree
with rationale and implementation. I see you covered the pqsetvalue
case which is nice. I expect libpq C api clients coded for
performance will immediately gravitate to this api.

- The meaning of PGresAttValue is changed. The field 'value' now
 contains a value withOUT terminating zero. This change seems to
 have no effect on any other portion within the whole source
 tree of postgresql from what I've seen.

This is a minor point of concern. This function was exposed to
support libpqtypes (which your stuff compliments very nicely by the
way) and I quickly confirmed removal of the null terminator didn't
cause any problems there. I doubt anyone else is inspecting the
structure directly (also searched the archives and didn't find
anything).

This needs to be advertised very loudly in the docs -- I understand
why this was done but it's a pretty big change in the way the api
works.

merlin

#28Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#26)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Jan 27, 2012 at 05:57:01PM +0900, Kyotaro HORIGUCHI wrote:

Hello, This is a new version of the patch formerly known as
'alternative storage for libpq'.

- Changed the concept to 'Alternative Row Processor' from
'Storage handler'. Symbol names are also changed.

- Callback function is modified following to the comment.

- From the restriction of time, I did minimum check for this
patch. The purpose of this patch is to show the new implement.

- Proformance is not measured for this patch for the same
reason. I will do that on next monday.

- The meaning of PGresAttValue is changed. The field 'value' now
contains a value withOUT terminating zero. This change seems to
have no effect on any other portion within the whole source
tree of postgresql from what I've seen.

I think we have general structure in place. Good.

Minor notes:

= rowhandler api =

* It returns bool, so void* is wrong. Instead libpq style is to use int,
with 1=OK, 0=Failure. Seems that was also old pqAddTuple() convention.

* Drop PQgetRowProcessorParam(), instead give param as argument.

* PQsetRowProcessorErrMes() should strdup() the message. That gets rid of
allocator requirements in API. This also makes safe to pass static
strings there. If strdup() fails, fall back to generic no-mem message.

* Create new struct to replace PGresAttValue for rowhandler usage.
RowHandler API is pretty unique and self-contained. It should have
it's own struct. Main reason is that it allows to properly document it.
Otherwise the minor details get lost as they are different from
libpq-internal usage. Also this allows two structs to be
improved separately. (PGresRawValue?)

* Stop storing null_value into ->value. It's libpq internal detail.
Instead the ->value should always point into buffer where the value
info is located, even for NULL. This makes safe to simply subtract
pointers to get row size estimate. Seems pqAddTuple() already
does null_value logic, so no need to do it in rowhandler api.

= libpq =

Currently its confusing whether rowProcessor can be NULL, and what
should be done if so. I think its better to fix usage so that
it is always set.

* PQregisterRowProcessor() should use default func if func==NULL.
and set default handler if so.
* Never set rowProcessor directly, always via PQregisterRowProcessor()
* Drop all if(rowProcessor) checks.

= dblink =

* There are malloc failure checks missing in initStoreInfo() & storeHandler().

--
marko

PS. You did not hear it from me, but most raw values are actually
nul-terminated in protocol. Think big-endian. And those which
are not, you can make so, as the data is not touched anymore.
You cannot do it for last value, as next byte may not be allocated.
But you could memmove() it lower address so you can null-terminate.

I'm not suggesting it for official patch, but it would be fun to know
if such hack is benchmarkable, and benchmarkable on realistic load.

#29Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#27)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Jan 27, 2012 at 09:35:04AM -0600, Merlin Moncure wrote:

On Fri, Jan 27, 2012 at 2:57 AM, Kyotaro HORIGUCHI

- The meaning of PGresAttValue is changed. The field 'value' now
 contains a value withOUT terminating zero. This change seems to
 have no effect on any other portion within the whole source
 tree of postgresql from what I've seen.

This is a minor point of concern. This function was exposed to
support libpqtypes (which your stuff compliments very nicely by the
way) and I quickly confirmed removal of the null terminator didn't
cause any problems there. I doubt anyone else is inspecting the
structure directly (also searched the archives and didn't find
anything).

This needs to be advertised very loudly in the docs -- I understand
why this was done but it's a pretty big change in the way the api
works.

Note that the non-NUL-terminated PGresAttValue is only used for row
handler. So no existing usage is affected.

But I agree using same struct in different situations is confusing,
thus the request for separate struct for row handler usage.

--
marko

#30Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#29)
3 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Thank you for comments, this is revised version of the patch.

The gain of performance is more than expected. Measure script now
does query via dblink ten times for stability of measuring, so
the figures become about ten times longer than the previous ones.

sec % to Original
Original : 31.5 100.0%
RowProcessor patch : 31.3 99.4%
dblink patch : 24.6 78.1%

RowProcessor patch alone makes no loss or very-little gain, and
full patch gives us 22% gain for the benchmark(*1).

The modifications are listed below.

- No more use of PGresAttValue for this mechanism, and added
PGrowValue instead. PGresAttValue has been put back to
libpq-int.h

- pqAddTuple() is restored as original and new function
paAddRow() to use as RowProcessor. (Previous pqAddTuple
implement had been buggily mixed the two usage of
PGresAttValue)

- PQgetRowProcessorParam has been dropped. Contextual parameter
is passed as one of the parameters of RowProcessor().

- RowProcessor() returns int (as bool, is that libpq convension?)
instead of void *. (Actually, void * had already become useless
as of previous patch)

- PQsetRowProcessorErrMes() is changed to do strdup internally.

- The callers of RowProcessor() no more set null_field to
PGrowValue.value. Plus, the PGrowValue[] which RowProcessor()
receives has nfields + 1 elements to be able to make rough
estimate by cols->value[nfields].value - cols->value[0].value -
something. The somthing here is 4 * nfields for protocol3 and
4 * (non-null fields) for protocol2. I fear that this applies
only for textual transfer usage...

- PQregisterRowProcessor() sets the default handler when given
NULL. (pg_conn|pg_result).rowProcessor cannot be NULL for its
lifetime.

- initStoreInfo() and storeHandler() has been provided with
malloc error handling.

And more..

- getAnotherTuple()@fe-protocol2.c is not tested utterly.

- The uniformity of the size of columns in the test data prevents
realloc from execution in dblink... More test should be done.

regards,

=====
(*1) The benchmark is done as follows,

==test.sql
select dblink_connect('c', 'host=localhost dbname=test');
select * from dblink('c', 'select a,c from foo limit 2000000') as (a text b bytea) limit 1;
...(repeat 9 times more)
select dblink_disconnect('c');
==

$ for i in $(seq 1 10); do time psql test -f t.sql; done

The environment is
CentOS 6.2 on VirtualBox on Core i7 965 3.2GHz
# of processor 1
Allocated mem 2GB

Test DB schema is
Column | Type | Modifiers
--------+-------+-----------
a | text |
b | text |
c | bytea |
Indexes:
"foo_a_bt" btree (a)
"foo_c_bt" btree (c)

test=# select count(*),
min(length(a)) as a_min, max(length(a)) as a_max,
min(length(c)) as c_min, max(length(c)) as c_max from foo;

count | a_min | a_max | c_min | c_max
---------+-------+-------+-------+-------
2000000 | 29 | 29 | 29 | 29
(1 row)

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120130.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..5ed083c 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,5 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQregisterRowProcessor	  161
+PQsetRowProcessorErrMes	  162
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index d454538..4fe2f41 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2692,6 +2692,8 @@ makeEmptyPGconn(void)
 	conn->allow_ssl_try = true;
 	conn->wait_ssl_try = false;
 #endif
+	conn->rowProcessor = pqAddRow;
+	conn->rowProcessorParam = NULL;
 
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
@@ -5076,3 +5078,10 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+void
+PQregisterRowProcessor(PGconn *conn, RowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..82914fd 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddTuple(PGresult *res, PGresAttValue *tup);
 
 
 /* ----------------
@@ -160,6 +161,9 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->rowProcessor = pqAddRow;
+	result->rowProcessorParam = NULL;
+	result->rowProcessorErrMes = NULL;
 
 	if (conn)
 	{
@@ -194,6 +198,10 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 			}
 			result->nEvents = conn->nEvents;
 		}
+
+		/* copy row processor settings */
+		result->rowProcessor = conn->rowProcessor;
+		result->rowProcessorParam = conn->rowProcessorParam;
 	}
 	else
 	{
@@ -701,7 +709,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +763,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,9 +834,52 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+int
+pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+{
+	PGresAttValue *tup;
+	int nfields = res->numAttributes;
+	int i;
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL) return FALSE;
+
+	memcpy(tup, columns, nfields * sizeof(PGresAttValue));
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value =
+				(char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+				return FALSE;
+
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
+	return pqAddTuple(res, tup);
+}
+
+/*
  * pqAddTuple
- *	  add a row pointer to the PGresult structure, growing it if necessary
- *	  Returns TRUE if OK, FALSE if not enough memory to add the row
+ *	  add a row POINTER to the PGresult structure, growing it if
+ *	  necessary Returns TRUE if OK, FALSE if not enough memory to add
+ *	  the row.
  */
 int
 pqAddTuple(PGresult *res, PGresAttValue *tup)
@@ -1223,7 +1272,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -2822,6 +2870,30 @@ PQgetisnull(const PGresult *res, int tup_num, int field_num)
 		return 0;
 }
 
+/* PQsetRowProcessorErrMes
+ *	Set the error message pass back to the caller of RowProcessor.
+ *
+ *  You can replace the previous message by alternative mes, or clear
+ *  it with NULL.
+ */
+void
+PQsetRowProcessorErrMes(PGresult *res, char *mes)
+{
+	/* Free existing message */
+	if (res->rowProcessorErrMes)
+		free(res->rowProcessorErrMes);
+
+	/*
+	 * Set the duped message if mes is not NULL. Failure of strdup
+	 * will be handled as 'Out of memory' by the caller of the
+	 * RowProcessor.
+	 */
+	if (mes)
+		res->rowProcessorErrMes = strdup(mes);
+	else
+		res->rowProcessorErrMes = NULL;
+}
+
 /* PQnparams:
  *	returns the number of input parameters of a prepared statement.
  */
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..496c42e 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -715,7 +715,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  rowval[result->numAttributes + 1];
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
@@ -729,26 +729,11 @@ getAnotherTuple(PGconn *conn, bool binary)
 
 	result->binary = binary;
 
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	if (binary)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
-		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
-		}
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,7 +742,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+			goto rowProcessError;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +756,31 @@ getAnotherTuple(PGconn *conn, bool binary)
 	for (i = 0; i < nfields; i++)
 	{
 		if (!(bmap & 0200))
-		{
-			/* if the field value is absent, make it a null string */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-		}
+			vlen = NULL_LEN;
+		else if (pqGetInt(&vlen, 4, conn))
+				goto EOFexit;
 		else
 		{
-			/* get the value length (the first four bytes are for length) */
-			if (pqGetInt(&vlen, 4, conn))
-				goto EOFexit;
 			if (!binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
-			tup[i].len = vlen;
-			/* read in the value */
-			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
+
+		/*
+		 * Buffer content may be shifted on reloading additional
+		 * data. So we must set all pointers on every scan.
+		 *
+		 * rowval[i].value always points to the next address of the
+		 * length field even if the value length is zero or the value
+		 * is NULL for the access safety.
+		 */
+		rowval[i].value = conn->inBuffer + conn->inCursor;
+		rowval[i].len = vlen;
+		/* Skip the value */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto EOFexit;
+
 		/* advance the bitmap stuff */
 		bitcnt++;
 		if (bitcnt == BITS_PER_BYTE)
@@ -811,17 +793,33 @@ getAnotherTuple(PGconn *conn, bool binary)
 			bmap <<= 1;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/*
+	 * Set rowval[nfields] for the access safety. We can estimate the
+	 * length of the buffer to store by
+	 *
+     *    rowval[nfields].value - rowval[0].value - 4 * (# of non-nulls)).
+	 */
+	rowval[nfields].value = conn->inBuffer + conn->inCursor;
+	rowval[nfields].len = NULL_LEN;
+
+	/* Success!  Pass the completed row values to rowProcessor */
+	if (!result->rowProcessor(result, result->rowProcessorParam, rowval))
+		goto rowProcessError;
+
+	/* Free garbage message. */
+	if (result->rowProcessorErrMes)
+	{
+		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
 
 	if (bitmap != std_bitmap)
 		free(bitmap);
+
 	return 0;
 
-outOfMemory:
+rowProcessError:
+	
 	/* Replace partially constructed result with an error result */
 
 	/*
@@ -829,8 +827,21 @@ outOfMemory:
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from RowProcessor, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->rowProcessorErrMes ?
+									   result->rowProcessorErrMes :
+									   "out of memory for query result\n"));
+	if (result->rowProcessorErrMes)
+	{
+		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
 
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..b7c6118 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -625,22 +625,12 @@ getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  rowval[result->numAttributes + 1];
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
 
 	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
-
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
 		return EOF;
@@ -663,48 +653,70 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		if (pqGetInt(&vlen, 4, conn))
 			return EOF;
 		if (vlen == -1)
-		{
-			/* null field */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-			continue;
-		}
-		if (vlen < 0)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
+		
+		/*
+		 * Buffer content may be shifted on reloading additional
+		 * data. So we must set all pointers on every scan.
+		 * 
+		 * rowval[i].value always points to the next address of the
+		 * length field even if the value length is zero or the value
+		 * is NULL for the access safety.
+		 */
+		rowval[i].value = conn->inBuffer + conn->inCursor;
+ 		rowval[i].len = vlen;
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
-		}
-		tup[i].len = vlen;
-		/* read in the value */
-		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			return EOF;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/*
+	 * Set rowval[nfields] for the access safety. We can estimate the
+	 * length of the buffer to store by
+	 *
+     *    rowval[nfields].value - rowval[0].value - 4 * nfields.
+	 */
+	rowval[nfields].value = conn->inBuffer + conn->inCursor;
+	rowval[nfields].len = NULL_LEN;
+
+	/* Success!  Pass the completed row values to rowProcessor */
+	if (!result->rowProcessor(result, result->rowProcessorParam, rowval))
+		goto rowProcessError;
+	
+	/* Free garbage error message. */
+	if (result->rowProcessorErrMes)
+	{
+		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
 
 	return 0;
 
-outOfMemory:
+rowProcessError:
 
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->rowProcessorErrMes ?
+									   result->rowProcessorErrMes : 
+									   "out of memory for query result\n"));
+	if (result->rowProcessorErrMes)
+	{
+		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
 	pqSaveErrorResult(conn);
 
 	/* Discard the failed message by pretending we read it */
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..27ef007 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,16 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGrowValue represents a value of one tuple field in string form,
+   used by RowProcessor. NULL is represented as len < 0. Otherwise
+   value points to a string without null termination of the length of
+   len. */
+typedef struct pgRowValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGrowValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +426,32 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQsetRowProcessorErrMes.  It
+ * is assumed by caller as out of memory when the error message is not
+ * set on failure. This function is assumed not to throw any
+ * exception.
+ */
+	typedef int (*RowProcessor)(PGresult *res, void *param,
+								PGrowValue *columns);
+	
+/*
+ * Register alternative result store function to PGconn.
+ * 
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ * 
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQregisterRowProcessor(PGconn *conn, RowProcessor func,
+								   void *rowProcessorParam);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +490,7 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void	PQsetRowProcessorErrMes(PGresult *res, char *mes);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index d967d60..06d8b26 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -209,6 +209,11 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
+
+	RowProcessor rowProcessor;  /* Result row processor handler. See
+								 * RowProcessor for details. */
+	void *rowProcessorParam;    /* Contextual parameter for rowProcessor */
+	char *rowProcessorErrMes;   /* Error message from rowProcessor */
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -398,7 +403,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +447,13 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+    /* Tuple store handler. The two fields below is copied to newly
+	 * created PGresult if rowProcessor is not NULL. Use default
+	 * function if NULL. */
+	RowProcessor rowProcessor;   /* Result row processor. See
+								  * RowProcessor for details. */
+	void *rowProcessorParam;     /* Contextual parameter for rowProcessor */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -507,7 +518,7 @@ extern void
 pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 /* This lets gcc check the format string for consistency. */
 __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
-extern int	pqAddTuple(PGresult *res, PGresAttValue *tup);
+extern int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
 extern void pqSaveMessageField(PGresult *res, char code,
 				   const char *value);
 extern void pqSaveParameterStatus(PGconn *conn, const char *name,
@@ -560,6 +571,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120130.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..5417df1 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,199 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-alterrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-alterrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, users can get the result of command
+   execution from <structname>PGresult</structname> aquired
+   with <function>PGgetResult</function>
+   from <structname>PGConn</structname>. While the memory areas for
+   the PGresult are allocated with malloc() internally within calls of
+   command execution functions such as <function>PQexec</function>
+   and <function>PQgetResult</function>. If you have difficulties to
+   handle the result records in the form of PGresult, you can instruct
+   PGconn to pass every row to your own row processor instead of
+   storing into PGresult.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-registerrowprocessor">
+    <term>
+     <function>PQregisterRowProcessor</function>
+     <indexterm>
+      <primary>PQregisterRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQregisterRowProcessor(PGconn *conn,
+                            RowProcessor func,
+                            void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the storage handler
+	       function. PGresult created from this connection calls this
+	       function to process each row.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-rowprocessor">
+    <term>
+     <type>RowProcessor</type>
+     <indexterm>
+      <primary>RowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+bool (*RowProcessor)(PGresult   *res,
+                     void       *param,
+                     PGrowValue *columns);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+
+</synopsis>
+     </para>
+
+     <para>
+       This function must return TRUE for success, and FALSE for
+       failure. On failure this function should set the error message
+       with <function>PGsetRowProcessorErrMes</function> if the cause
+       is other than out of memory. This funcion must not throw any
+       exception.
+     </para>
+     <variablelist>
+       <varlistentry>
+
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to contextual parameter which is registered
+	     by <function>PQregisterRowProcessor</function>.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     Column values of the row to process.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessorerrmes">
+    <term>
+     <function>PQsetRowProcessorErrMes</function>
+     <indexterm>
+      <primary>PQsetRowProcessorErrMes</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>RowProcessor</type>.  If this message is not set, the
+	caller assumes the error to be out of memory.
+<synopsis>
+void PQsetRowProcessorErrMes(PGresult *res, char *mes)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>RowProcessor</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>mes</parameter></term>
+	    <listitem>
+	      <para>
+		Error message. This will be copied internally so there is
+		no need to care of the scope.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it will be overritten. Set NULL to cancel the the costom
+		message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120130.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..e6edcd5 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	bool error_occurred;
+	bool nummismatch;
+	ErrorData *edata;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+		else if (storeinfo.edata)
+			ReThrowError(storeinfo.edata);
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +754,214 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			else if (storeinfo.edata)
+				ReThrowError(storeinfo.edata);
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
+	
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
+	
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->edata = NULL;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **) malloc(sinfo->nattrs * sizeof(char*));
+	sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuf == NULL || sinfo->valbuflen == NULL)
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	for (i = 0 ; i < sinfo->nattrs ; i++)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			is_sql_cmd = false;
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
 			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
+				free(sinfo->valbuf[i]);
+				sinfo->valbuf[i] = NULL;
 			}
-
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
-
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
-
-			values = (char **) palloc(nfields * sizeof(char *));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      *cstrs[PQnfields(res)];
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->error_occurred)
+		return FALSE;
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return FALSE;
+	}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			if (sinfo->valbuf[i] == NULL)
+			{
+				sinfo->valbuf[i] = (char *)malloc(len + 1);
+				sinfo->valbuflen[i] = len + 1;
+			}
+			else if (sinfo->valbuflen[i] < len + 1)
+			{
+				sinfo->valbuf[i] = (char *)realloc(sinfo->valbuf[i], len + 1);
+				sinfo->valbuflen[i] = len + 1;
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
+			if (sinfo->valbuf[i] == NULL)
+				ereport(ERROR,
+						(errcode(ERRCODE_OUT_OF_MEMORY),
+						 errmsg("out of memory")));
+
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
 		}
+	}
 
-		PQclear(res);
+	PG_TRY();
+	{
+		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+		tuplestore_puttuple(sinfo->tuplestore, tuple);
 	}
 	PG_CATCH();
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		MemoryContext context;
+		/*
+		 * Store exception for later ReThrow and cancel the exception.
+		 */
+		sinfo->error_occurred = TRUE;
+		context = MemoryContextSwitchTo(sinfo->oldcontext);
+		sinfo->edata = CopyErrorData();
+		MemoryContextSwitchTo(context);
+		FlushErrorState();
+		return FALSE;
 	}
 	PG_END_TRY();
+
+	return TRUE;
 }
 
 /*
#31Noname
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#30)
Re: Speed dblink using alternate libpq tuple storage

I'm sorry.

Thank you for comments, this is revised version of the patch.

The malloc error handling in dblink.c of the patch is broken. It
is leaking memory and breaking control.

i'll re-send the properly fixed patch for dblink.c later.

# This severe back pain should have made me stupid :-p

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#32Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#30)
Re: Speed dblink using alternate libpq tuple storage

On Mon, Jan 30, 2012 at 06:06:57PM +0900, Kyotaro HORIGUCHI wrote:

The gain of performance is more than expected. Measure script now
does query via dblink ten times for stability of measuring, so
the figures become about ten times longer than the previous ones.

sec % to Original
Original : 31.5 100.0%
RowProcessor patch : 31.3 99.4%
dblink patch : 24.6 78.1%

RowProcessor patch alone makes no loss or very-little gain, and
full patch gives us 22% gain for the benchmark(*1).

Excellent!

- The callers of RowProcessor() no more set null_field to
PGrowValue.value. Plus, the PGrowValue[] which RowProcessor()
receives has nfields + 1 elements to be able to make rough
estimate by cols->value[nfields].value - cols->value[0].value -
something. The somthing here is 4 * nfields for protocol3 and
4 * (non-null fields) for protocol2. I fear that this applies
only for textual transfer usage...

Excact estimate is not important here. And (nfields + 1) elem
feels bit too much magic, considering that most users probably
do not need it. Without it, the logic would be:

total = last.value - first.value + ((last.len > 0) ? last.len : 0)

which isn't too complex. So I think we can remove it.

= Problems =

* Remove the dubious memcpy() in pqAddRow()

* I think the dynamic arrays in getAnotherTuple() are not portable enough,
please do proper allocation for array. I guess in PQsetResultAttrs()?

= Minor notes =

These can be argued either way, if you don't like some
suggestion, you can drop it.

* Move PQregisterRowProcessor() into fe-exec.c, then we can make
pqAddRow static.

* Should PQclear() free RowProcessor error msg? It seems
it should not get outside from getAnotherTuple(), but
thats not certain. Perhaps it would be clearer to free
it here too.

* Remove the part of comment in getAnotherTuple():
* Buffer content may be shifted on reloading additional
* data. So we must set all pointers on every scan.

It's confusing why it needs to clarify that, as there
is nobody expecting it.

* PGrowValue documentation should mention that ->value pointer
is always valid.

* dblink: Perhaps some of those mallocs() could be replaced
with pallocs() or even StringInfo, which already does
the realloc dance? I'm not familiar with dblink, and
various struct lifetimes there so I don't know it that
actually makes sense or not.

It seems this patch is getting ReadyForCommitter soon...

--
marko

#33Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Noname (#31)
1 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

This is fixed version of dblink.c for row processor.

i'll re-send the properly fixed patch for dblink.c later.

- malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error)

- storeHandler() now returns FALSE on malloc failure. Garbage
cleanup is done in dblink_fetch() or dblink_record_internal().
The behavior that this dblink displays this error as 'unkown
error/could not execute query' on the user session is as it did
before.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

dblink_use_rowproc_20120131.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..7a82ea1 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	bool error_occurred;
+	bool nummismatch;
+	ErrorData *edata;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+		else if (storeinfo.edata)
+			ReThrowError(storeinfo.edata);
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +754,217 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			else if (storeinfo.edata)
+				ReThrowError(storeinfo.edata);
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
+	
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
+	
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->edata = NULL;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **) malloc(sinfo->nattrs * sizeof(char*));
+	sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuf == NULL || sinfo->valbuflen == NULL)
+	{
+		finishStoreInfo(sinfo);
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	for (i = 0 ; i < sinfo->nattrs ; i++)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			is_sql_cmd = false;
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
 			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
+				free(sinfo->valbuf[i]);
+				sinfo->valbuf[i] = NULL;
 			}
-
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
-
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
-
-			values = (char **) palloc(nfields * sizeof(char *));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      *cstrs[PQnfields(res)];
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->error_occurred)
+		return FALSE;
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return FALSE;
+	}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			if (sinfo->valbuf[i] == NULL)
+			{
+				sinfo->valbuf[i] = (char *)malloc(len + 1);
+				sinfo->valbuflen[i] = len + 1;
+			}
+			else if (sinfo->valbuflen[i] < len + 1)
+			{
+				sinfo->valbuf[i] = (char *)realloc(sinfo->valbuf[i], len + 1);
+				sinfo->valbuflen[i] = len + 1;
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
+			if (sinfo->valbuf[i] == NULL)
+				return FALSE;
+
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
 		}
+	}
 
-		PQclear(res);
+	PG_TRY();
+	{
+		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+		tuplestore_puttuple(sinfo->tuplestore, tuple);
 	}
 	PG_CATCH();
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		MemoryContext context;
+		/*
+		 * Store exception for later ReThrow and cancel the exception.
+		 */
+		sinfo->error_occurred = TRUE;
+		context = MemoryContextSwitchTo(sinfo->oldcontext);
+		sinfo->edata = CopyErrorData();
+		MemoryContextSwitchTo(context);
+		FlushErrorState();
+		return FALSE;
 	}
 	PG_END_TRY();
+
+	return TRUE;
 }
 
 /*
#34Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#33)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Jan 31, 2012 at 4:59 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:

This is fixed version of dblink.c for row processor.

i'll re-send the properly fixed patch for dblink.c later.

- malloc error in initStoreInfo throws ERRCODE_OUT_OF_MEMORY. (new error)

- storeHandler() now returns FALSE on malloc failure. Garbage
 cleanup is done in dblink_fetch() or dblink_record_internal().
 The behavior that this dblink displays this error as 'unkown
 error/could not execute query' on the user session is as it did
 before.

Another thing: if realloc() fails, the old pointer stays valid.
So it needs to be assigned to temp variable first, before
overwriting old pointer.

And seems malloc() is preferable to palloc() to avoid
exceptions inside row processor. Although latter
one could be made to work, it might be unnecessary
complexity. (store current pqresult into remoteConn?)

--
marko

#35Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#34)
Re: Speed dblink using alternate libpq tuple storage

Hello,

Another thing: if realloc() fails, the old pointer stays valid.
So it needs to be assigned to temp variable first, before
overwriting old pointer.

mmm. I've misunderstood of the realloc.. I'll fix there in the
next patch.

And seems malloc() is preferable to palloc() to avoid
exceptions inside row processor. Although latter
one could be made to work, it might be unnecessary
complexity. (store current pqresult into remoteConn?)

Hmm.. palloc may throw ERRCODE_OUT_OF_MEMORY so I must catch it
and return NULL. That seems there is no difference to using
malloc after all.. However, the inhibition of throwing exceptions
in RowProcessor is not based on any certain fact, so palloc here
may make sense if we can do that.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#36Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#35)
Re: Speed dblink using alternate libpq tuple storage

On Wed, Feb 1, 2012 at 10:32 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:

Another thing: if realloc() fails, the old pointer stays valid.
So it needs to be assigned to temp variable first, before
overwriting old pointer.

 mmm. I've misunderstood of the realloc.. I'll fix there in the
next patch.

Please wait a moment, I started doing small cleanups,
and now have some larger ones too. I'll send it soon.

OTOH, if you have already done something, you can send it,
I have various states in GIT so it should not be hard
to merge things.

And seems malloc() is preferable to palloc() to avoid
exceptions inside row processor.  Although latter
one could be made to work, it might be unnecessary
complexity.  (store current pqresult into remoteConn?)

Hmm.. palloc may throw ERRCODE_OUT_OF_MEMORY so I must catch it
and return NULL. That seems there is no difference to using
malloc after all.. However, the inhibition of throwing exceptions
in RowProcessor is not based on any certain fact, so palloc here
may make sense if we can do that.

No, I was thinking about storing the result in connection
struct and then letting the exception pass, as the
PGresult can be cleaned later. Thus we could get rid
of TRY/CATCH in per-row handler. (At that point
the PGresult is already under PGconn, so maybe
it's enough to clean that one later?)

But for now, as the TRY is already there, it should be
also simple to move palloc usage inside it.

--
marko

#37Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#36)
5 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

On Wed, Feb 01, 2012 at 11:52:27AM +0200, Marko Kreen wrote:

Please wait a moment, I started doing small cleanups,
and now have some larger ones too. I'll send it soon.

I started doing small comment cleanups, but then the changes
spread out a bit...

Kyotaro-san, please review. If you don't find anything to fix,
then it's ready for commiter, I think.

Changes:

== dblink ==

- Increase area in bigger steps

- Fix realloc leak on failure

== libpq ==

- Replace dynamic stack array with malloced rowBuf on PGconn.
It is only increased, never decreased.

- Made PGresult.rowProcessorErrMsg allocated with pqResultStrdup.
Seems more proper. Although it would be even better to
use ->errmsg for it, but it's lifetime and usage
are unclear to me.

- Removed the rowProcessor, rowProcessorParam from PGresult.
They are unnecessary there.

- Move conditional msg outside from libpq_gettext()
- Removed the unnecessary memcpy() from pqAddRow

- Moved PQregisterRowProcessor to fe-exec.c, made pqAddRow static.

- Restored pqAddTuple export.

- Renamed few symbols:
PQregisterRowProcessor -> PQsetRowProcessor
RowProcessor -> PQrowProcessor
Mes -> Msg (more common abbreviation)

- Updated some comments

- Updated sgml doc

== Benchmark ==

I did try to benchmark whether the patch affects stock libpq usage.
But, uh, could not draw any conclusions. It *seems* that patch is
bit faster with few columns, bit slower with many, but the difference
seems smaller than test noise. OTOH, the test noise is pretty big,
so maybe I don't have stable-enough setup to properly benchmark.

As the test-app does not actually touch resultset, it seems probable
that application that actually does something with resultset,
will no see the difference.

It would be nice if anybody who has stable pgbench setup could
run few tests to see whether the patch moves things either way.

Just in case, I attached the minimal test files.

--
marko

Attachments:

libpq_rowproc_2012-02-01.difftext/x-diff; charset=us-asciiDownload
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***************
*** 160,162 **** PQconnectStartParams      157
--- 160,164 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
+ PQsetRowProcessor	  161
+ PQsetRowProcessorErrMsg	  162
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***************
*** 2693,2698 **** makeEmptyPGconn(void)
--- 2693,2701 ----
  	conn->wait_ssl_try = false;
  #endif
  
+ 	/* set default row processor */
+ 	PQsetRowProcessor(conn, NULL, NULL);
+ 
  	/*
  	 * We try to send at least 8K at a time, which is the usual size of pipe
  	 * buffers on Unix systems.  That way, when we are sending a large amount
***************
*** 2711,2718 **** makeEmptyPGconn(void)
--- 2714,2726 ----
  	initPQExpBuffer(&conn->errorMessage);
  	initPQExpBuffer(&conn->workBuffer);
  
+ 	/* set up initial row buffer */
+ 	conn->rowBufLen = 32;
+ 	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+ 
  	if (conn->inBuffer == NULL ||
  		conn->outBuffer == NULL ||
+ 		conn->rowBuf == NULL ||
  		PQExpBufferBroken(&conn->errorMessage) ||
  		PQExpBufferBroken(&conn->workBuffer))
  	{
***************
*** 2812,2817 **** freePGconn(PGconn *conn)
--- 2820,2827 ----
  		free(conn->inBuffer);
  	if (conn->outBuffer)
  		free(conn->outBuffer);
+ 	if (conn->rowBuf)
+ 		free(conn->rowBuf);
  	termPQExpBuffer(&conn->errorMessage);
  	termPQExpBuffer(&conn->workBuffer);
  
***************
*** 5076,5078 **** PQregisterThreadLock(pgthreadlock_t newhandler)
--- 5086,5089 ----
  
  	return prev;
  }
+ 
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***************
*** 66,71 **** static PGresult *PQexecFinish(PGconn *conn);
--- 66,72 ----
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
+ static int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
  
  
  /* ----------------
***************
*** 160,165 **** PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
--- 161,167 ----
  	result->curBlock = NULL;
  	result->curOffset = 0;
  	result->spaceLeft = 0;
+ 	result->rowProcessorErrMsg = NULL;
  
  	if (conn)
  	{
***************
*** 701,707 **** pqClearAsyncResult(PGconn *conn)
  	if (conn->result)
  		PQclear(conn->result);
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  }
  
  /*
--- 703,708 ----
***************
*** 756,762 **** pqPrepareAsyncResult(PGconn *conn)
  	 */
  	res = conn->result;
  	conn->result = NULL;		/* handing over ownership to caller */
- 	conn->curTuple = NULL;		/* just in case */
  	if (!res)
  		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	else
--- 757,762 ----
***************
*** 828,833 **** pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
--- 828,900 ----
  }
  
  /*
+  * PQsetRowProcessor
+  *   Set function that copies column data out from network buffer.
+  */
+ void
+ PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+ {
+ 	conn->rowProcessor = (func ? func : pqAddRow);
+ 	conn->rowProcessorParam = param;
+ }
+ 
+ /*
+  * PQsetRowProcessorErrMsg
+  *    Set the error message pass back to the caller of RowProcessor.
+  *
+  *    You can replace the previous message by alternative mes, or clear
+  *    it with NULL.
+  */
+ void
+ PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ {
+ 	if (msg)
+ 		res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+ 	else
+ 		res->rowProcessorErrMsg = NULL;
+ }
+ 
+ /*
+  * pqAddRow
+  *	  add a row to the PGresult structure, growing it if necessary
+  *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+  */
+ static int
+ pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+ {
+ 	PGresAttValue *tup;
+ 	int nfields = res->numAttributes;
+ 	int i;
+ 
+ 	tup = (PGresAttValue *)
+ 		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+ 	if (tup == NULL) return FALSE;
+ 
+ 	for (i = 0 ; i < nfields ; i++)
+ 	{
+ 		tup[i].len = columns[i].len;
+ 		if (tup[i].len == NULL_LEN)
+ 		{
+ 			tup[i].value = res->null_field;
+ 		}
+ 		else
+ 		{
+ 			bool isbinary = (res->attDescs[i].format != 0);
+ 			tup[i].value =
+ 				(char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+ 			if (tup[i].value == NULL)
+ 				return FALSE;
+ 
+ 			memcpy(tup[i].value, columns[i].value, tup[i].len);
+ 			/* We have to terminate this ourselves */
+ 			tup[i].value[tup[i].len] = '\0';
+ 		}
+ 	}
+ 
+ 	return pqAddTuple(res, tup);
+ }
+ 
+ /*
   * pqAddTuple
   *	  add a row pointer to the PGresult structure, growing it if necessary
   *	  Returns TRUE if OK, FALSE if not enough memory to add the row
***************
*** 1223,1229 **** PQsendQueryStart(PGconn *conn)
  
  	/* initialize async result-accumulation state */
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  
  	/* ready to send command message */
  	return true;
--- 1290,1295 ----
*** a/src/interfaces/libpq/fe-misc.c
--- b/src/interfaces/libpq/fe-misc.c
***************
*** 219,224 **** pqGetnchar(char *s, size_t len, PGconn *conn)
--- 219,243 ----
  }
  
  /*
+  * pqGetnchar:
+  *	skip len bytes in input buffer.
+  */
+ int
+ pqSkipnchar(size_t len, PGconn *conn)
+ {
+ 	if (len > (size_t) (conn->inEnd - conn->inCursor))
+ 		return EOF;
+ 
+ 	conn->inCursor += len;
+ 
+ 	if (conn->Pfdebug)
+ 		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+ 				(unsigned long) len);
+ 
+ 	return 0;
+ }
+ 
+ /*
   * pqPutnchar:
   *	write exactly len bytes to the current message
   */
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
***************
*** 703,721 **** failure:
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
--- 703,720 ----
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
***************
*** 727,754 **** getAnotherTuple(PGconn *conn, bool binary)
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
  	result->binary = binary;
  
! 	/* Allocate tuple space if first time for this data message */
! 	if (conn->curTuple == NULL)
  	{
! 		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
! 		if (conn->curTuple == NULL)
! 			goto outOfMemory;
! 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
! 
! 		/*
! 		 * If it's binary, fix the column format indicators.  We assume the
! 		 * backend will consistently send either B or D, not a mix.
! 		 */
! 		if (binary)
! 		{
! 			for (i = 0; i < nfields; i++)
! 				result->attDescs[i].format = 1;
! 		}
  	}
- 	tup = conn->curTuple;
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
--- 726,749 ----
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
+ 	/* resize row buffer if needed */
+ 	if (nfields > conn->rowBufLen) {
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 			goto rowProcessError;
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	} else {
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
  	result->binary = binary;
  
! 	if (binary)
  	{
! 		for (i = 0; i < nfields; i++)
! 			result->attDescs[i].format = 1;
  	}
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
***************
*** 757,763 **** getAnotherTuple(PGconn *conn, bool binary)
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto outOfMemory;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
--- 752,758 ----
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto rowProcessError;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
***************
*** 771,804 **** getAnotherTuple(PGconn *conn, bool binary)
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 		{
! 			/* if the field value is absent, make it a null string */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 		}
  		else
  		{
- 			/* get the value length (the first four bytes are for length) */
- 			if (pqGetInt(&vlen, 4, conn))
- 				goto EOFexit;
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
- 			if (tup[i].value == NULL)
- 			{
- 				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
- 				if (tup[i].value == NULL)
- 					goto outOfMemory;
- 			}
- 			tup[i].len = vlen;
- 			/* read in the value */
- 			if (vlen > 0)
- 				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
- 					goto EOFexit;
- 			/* we have to terminate this ourselves */
- 			tup[i].value[vlen] = '\0';
  		}
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
--- 766,794 ----
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 			vlen = NULL_LEN;
! 		else if (pqGetInt(&vlen, 4, conn))
! 				goto EOFexit;
  		else
  		{
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
  		}
+ 
+ 		/*
+ 		 * rowbuf[i].value always points to the next address of the
+ 		 * length field even if the value is NULL, to allow safe
+ 		 * size estimates and data copy.
+ 		 */
+ 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+ 		rowbuf[i].len = vlen;
+ 
+ 		/* Skip the value */
+ 		if (vlen > 0 && pqSkipnchar(vlen, conn))
+ 			goto EOFexit;
+ 
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
***************
*** 811,827 **** getAnotherTuple(PGconn *conn, bool binary)
  			bmap <<= 1;
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
  	return 0;
  
! outOfMemory:
  	/* Replace partially constructed result with an error result */
  
  	/*
--- 801,817 ----
  			bmap <<= 1;
  	}
  
! 	/* Success!  Pass the completed row values to rowProcessor */
! 	if (!conn->rowProcessor(result, conn->rowProcessorParam, rowbuf))
! 		goto rowProcessError;
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
+ 
  	return 0;
  
! rowProcessError:
! 
  	/* Replace partially constructed result with an error result */
  
  	/*
***************
*** 829,838 **** outOfMemory:
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  
  	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
--- 819,835 ----
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
  
  	/*
+ 	 * If error message is passed from RowProcessor, set it into
+ 	 * PGconn, assume out of memory if not.
+ 	 */
+ 	appendPQExpBufferStr(&conn->errorMessage,
+ 						 result->rowProcessorErrMsg ?
+ 						 result->rowProcessorErrMsg :
+ 						 libpq_gettext("out of memory for query result\n"));
+ 	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
***************
*** 613,646 **** failure:
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
  
- 	/* Allocate tuple space if first time for this data message */
- 	if (conn->curTuple == NULL)
- 	{
- 		conn->curTuple = (PGresAttValue *)
- 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
- 		if (conn->curTuple == NULL)
- 			goto outOfMemory;
- 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
- 	}
- 	tup = conn->curTuple;
- 
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
  		return EOF;
--- 613,634 ----
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
  
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
  		return EOF;
***************
*** 656,661 **** getAnotherTuple(PGconn *conn, int msgLength)
--- 644,660 ----
  		return 0;
  	}
  
+ 	/* resize row buffer if needed */
+ 	if (nfields > conn->rowBufLen) {
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 			goto rowProcessError;
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	} else {
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
  	/* Scan the fields */
  	for (i = 0; i < nfields; i++)
  	{
***************
*** 663,710 **** getAnotherTuple(PGconn *conn, int msgLength)
  		if (pqGetInt(&vlen, 4, conn))
  			return EOF;
  		if (vlen == -1)
! 		{
! 			/* null field */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 			continue;
! 		}
! 		if (vlen < 0)
  			vlen = 0;
- 		if (tup[i].value == NULL)
- 		{
- 			bool		isbinary = (result->attDescs[i].format != 0);
  
! 			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
! 			if (tup[i].value == NULL)
! 				goto outOfMemory;
! 		}
! 		tup[i].len = vlen;
! 		/* read in the value */
! 		if (vlen > 0)
! 			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
! 				return EOF;
! 		/* we have to terminate this ourselves */
! 		tup[i].value[vlen] = '\0';
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
  	return 0;
  
! outOfMemory:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
--- 662,707 ----
  		if (pqGetInt(&vlen, 4, conn))
  			return EOF;
  		if (vlen == -1)
! 			vlen = NULL_LEN;
! 		else if (vlen < 0)
  			vlen = 0;
  
! 		/*
! 		 * rowbuf[i].value always points to the next address of the
! 		 * length field even if the value is NULL, to allow safe
! 		 * size estimates and data copy.
! 		 */
! 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
! 		rowbuf[i].len = vlen;
! 
! 		/* Skip to the next length field */
! 		if (vlen > 0 && pqSkipnchar(vlen, conn))
! 			return EOF;
  	}
  
! 	/* Success!  Pass the completed row values to rowProcessor */
! 	if (!conn->rowProcessor(result, conn->rowProcessorParam, rowbuf))
! 		goto rowProcessError;
  
  	return 0;
  
! rowProcessError:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
! 
! 	/*
! 	 * If error message is passed from addTupleFunc, set it into
! 	 * PGconn, assume out of memory if not.
! 	 */
! 	appendPQExpBufferStr(&conn->errorMessage,
! 						 result->rowProcessorErrMsg ?
! 						 result->rowProcessorErrMsg :
! 						 libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
***************
*** 149,154 **** typedef struct pgNotify
--- 149,165 ----
  	struct pgNotify *next;		/* list link */
  } PGnotify;
  
+ /* PGrowValue points a column value of in network buffer.
+  * Value is a string without null termination and length len.
+  * NULL is represented as len < 0, value points then to place
+  * where value would have been.
+  */
+ typedef struct pgRowValue
+ {
+ 	int			len;			/* length in bytes of the value */
+ 	char	   *value;			/* actual value, without null termination */
+ } PGrowValue;
+ 
  /* Function types for notice-handling callbacks */
  typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
  typedef void (*PQnoticeProcessor) (void *arg, const char *message);
***************
*** 416,421 **** extern PGPing PQping(const char *conninfo);
--- 427,463 ----
  extern PGPing PQpingParams(const char *const * keywords,
  			 const char *const * values, int expand_dbname);
  
+ /*
+  * Typedef for alternative row processor.
+  *
+  * Columns array will contain PQnfields() entries, each one
+  * pointing to particular column data in network buffer.
+  * This function is supposed to copy data out from there
+  * and store somewhere.  NULL is signified with len<0.
+  *
+  * This function must return 1 for success and must return 0 for
+  * failure and may set error message by PQsetRowProcessorErrMsg.  It
+  * is assumed by caller as out of memory when the error message is not
+  * set on failure. This function is assumed not to throw any
+  * exception.
+  */
+ typedef int (*PQrowProcessor)(PGresult *res, void *param,
+ 								PGrowValue *columns);
+ 
+ /*
+  * Set alternative row data processor for PGconn.
+  *
+  * By registering this function, pg_result disables its own result
+  * store and calls it for rows one by one.
+  *
+  * func is row processor function. See the typedef RowProcessor.
+  *
+  * rowProcessorParam is the contextual variable that passed to
+  * RowProcessor.
+  */
+ extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+ 								   void *rowProcessorParam);
+ 
  /* Force the write buffer to be written (or at least try) */
  extern int	PQflush(PGconn *conn);
  
***************
*** 454,459 **** extern char *PQcmdTuples(PGresult *res);
--- 496,502 ----
  extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+ extern void	PQsetRowProcessorErrMsg(PGresult *res, char *msg);
  extern int	PQnparams(const PGresult *res);
  extern Oid	PQparamtype(const PGresult *res, int param_num);
  
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
***************
*** 209,214 **** struct pg_result
--- 209,216 ----
  	PGresult_data *curBlock;	/* most recently allocated block */
  	int			curOffset;		/* start offset of free space in block */
  	int			spaceLeft;		/* number of free bytes remaining in block */
+ 
+ 	char *rowProcessorErrMsg;
  };
  
  /* PGAsyncStatusType defines the state of the query-execution state machine */
***************
*** 398,404 **** struct pg_conn
  
  	/* Status for asynchronous result construction */
  	PGresult   *result;			/* result being constructed */
- 	PGresAttValue *curTuple;	/* tuple currently being read */
  
  #ifdef USE_SSL
  	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
--- 400,405 ----
***************
*** 443,448 **** struct pg_conn
--- 444,457 ----
  
  	/* Buffer for receiving various parts of messages */
  	PQExpBufferData workBuffer; /* expansible string */
+ 
+ 	/*
+ 	 * Read column data from network buffer.
+ 	 */
+ 	PQrowProcessor rowProcessor;/* Function pointer */
+ 	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+ 	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+ 	int rowBufLen;				/* Number of columns allocated in rowBuf */
  };
  
  /* PGcancel stores all data necessary to cancel a connection. A copy of this
***************
*** 560,565 **** extern int	pqGets(PQExpBuffer buf, PGconn *conn);
--- 569,575 ----
  extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
  extern int	pqPuts(const char *s, PGconn *conn);
  extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+ extern int	pqSkipnchar(size_t len, PGconn *conn);
  extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
  extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
  extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_2012-02-01.difftext/x-diff; charset=us-asciiDownload
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
***************
*** 7233,7238 **** int PQisthreadsafe();
--- 7233,7443 ----
   </sect1>
  
  
+  <sect1 id="libpq-altrowprocessor">
+   <title>Alternative row processor</title>
+ 
+   <indexterm zone="libpq-altrowprocessor">
+    <primary>PGresult</primary>
+    <secondary>PGconn</secondary>
+   </indexterm>
+ 
+   <para>
+    As the standard usage, rows are stored into <type>PQresult</type>
+    until full resultset is received.  Then such completely-filled
+    <type>PQresult</type> is passed to user.  This behaviour can be
+    changed by registering alternative row processor function,
+    that will see each row data as soon as it is received
+    from network.  It has the option of processing the data
+    immediately, or storing it into custom container.
+   </para>
+ 
+   <para>
+    Note - as row processor sees rows as they arrive, it cannot know
+    whether the SQL statement actually finishes successfully on server
+    or not.  So some care must be taken to get proper
+    transactionality.
+   </para>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessor">
+     <term>
+      <function>PQsetRowProcessor</function>
+      <indexterm>
+       <primary>PQsetRowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        Sets a callback function to process each row.
+ <synopsis>
+ void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+ </synopsis>
+      </para>
+      
+      <para>
+        <variablelist>
+ 	 <varlistentry>
+ 	   <term><parameter>conn</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       The connection object to set the row processor function.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>func</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       Storage handler function to set. NULL means to use the
+ 	       default processor.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>param</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       A pointer to contextual parameter passed
+ 	       to <parameter>func</parameter>.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+        </variablelist>
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqrowprocessor">
+     <term>
+      <type>PQrowProcessor</type>
+      <indexterm>
+       <primary>PQrowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        The type for the row processor callback function.
+ <synopsis>
+ int (*PQrowProcessor)(PGresult *res, void *param, PGrowValue *columns);
+ 
+ typedef struct
+ {
+     int         len;            /* length in bytes of the value */
+     char       *value;          /* actual value, without null termination */
+ } PGrowValue;
+ </synopsis>
+      </para>
+ 
+      <para>
+       The <parameter>columns</parameter> array will have PQnfields()
+       elements, each one pointing to column value in network buffer.
+      </para>
+ 
+      <para>
+        This function must process or copy row values away from network
+        buffer before it returns, as next row might overwrite them.
+      </para>
+ 
+      <para>
+        This function must return 1 for success, and 0 for failure.
+        On failure this function should set the error message
+        with <function>PGsetRowProcessorErrMsg</function> if the cause
+        is other than out of memory.  This funcion must not throw any
+        exception.
+      </para>
+      <variablelist>
+        <varlistentry>
+ 
+ 	 <term><parameter>res</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     A pointer to the <type>PGresult</type> object.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>param</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>columns</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Column values of the row to process.  Column values
+ 	     are located in network buffer, the processor must
+ 	     copy them out from there.
+ 	   </para>
+ 	   <para>
+ 	     Column values are not null-terminated, so processor cannot
+ 	     use C string functions on them directly.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+      </variablelist>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+     <term>
+      <function>PQsetRowProcessorErrMsg</function>
+      <indexterm>
+       <primary>PQsetRowProcessorErrMsg</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+ 	Set the message for the error occurred
+ 	in <type>PQrowProcessor</type>.  If this message is not set, the
+ 	caller assumes the error to be out of memory.
+ <synopsis>
+ void PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ </synopsis>
+       </para>
+       <para>
+ 	<variablelist>
+ 	  <varlistentry>
+ 	    <term><parameter>res</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the <type>PGresult</type> object
+ 		passed to <type>PQrowProcessor</type>.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	  <varlistentry>
+ 	    <term><parameter>mes</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		Error message. This will be copied internally so there is
+ 		no need to care of the scope.
+ 	      </para>
+ 	      <para>
+ 		If <parameter>res</parameter> already has a message previously
+ 		set, it will be overritten. Set NULL to cancel the the costom
+ 		message.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	</variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </sect1>
+ 
+ 
   <sect1 id="libpq-build">
    <title>Building <application>libpq</application> Programs</title>
  
dblink_rowproc_2012-02-01.difftext/x-diff; charset=us-asciiDownload
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
***************
*** 63,73 **** typedef struct remoteConn
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
- static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 63,85 ----
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
+ typedef struct storeInfo
+ {
+ 	Tuplestorestate *tuplestore;
+ 	int nattrs;
+ 	MemoryContext oldcontext;
+ 	AttInMetadata *attinmeta;
+ 	char** valbuf;
+ 	int *valbuflen;
+ 	bool error_occurred;
+ 	bool nummismatch;
+ 	ErrorData *edata;
+ } storeInfo;
+ 
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
***************
*** 90,95 **** static char *escape_param_str(const char *from);
--- 102,111 ----
  static void validate_pkattnums(Relation rel,
  				   int2vector *pkattnums_arg, int32 pknumatts_arg,
  				   int **pkattnums, int *pknumatts);
+ static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+ static void finishStoreInfo(storeInfo *sinfo);
+ static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+ 
  
  /* Global */
  static remoteConn *pconn = NULL;
***************
*** 503,508 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 519,525 ----
  	char	   *curname = NULL;
  	int			howmany = 0;
  	bool		fail = true;	/* default to backward compatible */
+ 	storeInfo   storeinfo;
  
  	DBLINK_INIT;
  
***************
*** 559,573 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 576,611 ----
  	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
  
  	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
+ 	/*
  	 * Try to execute the query.  Note that since libpq uses malloc, the
  	 * PGresult will be long-lived even though we are still in a short-lived
  	 * memory context.
  	 */
  	res = PQexec(conn, buf.data);
+ 	finishStoreInfo(&storeinfo);
+ 
  	if (!res ||
  		(PQresultStatus(res) != PGRES_COMMAND_OK &&
  		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
+ 		/* finishStoreInfo saves the fields referred to below. */
+ 		if (storeinfo.nummismatch)
+ 		{
+ 			/* This is only for backward compatibility */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_DATATYPE_MISMATCH),
+ 					 errmsg("remote query result rowtype does not match "
+ 							"the specified FROM clause rowtype")));
+ 		}
+ 		else if (storeinfo.edata)
+ 			ReThrowError(storeinfo.edata);
+ 
  		dblink_res_error(conname, res, "could not fetch from cursor", fail);
  		return (Datum) 0;
  	}
***************
*** 579,586 **** dblink_fetch(PG_FUNCTION_ARGS)
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
--- 617,624 ----
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
***************
*** 640,645 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 678,684 ----
  	remoteConn *rconn = NULL;
  	bool		fail = true;	/* default to backward compatible */
  	bool		freeconn = false;
+ 	storeInfo   storeinfo;
  
  	/* check to see if caller supports us returning a tuplestore */
  	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
***************
*** 715,878 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
- 	{
  		res = PQgetResult(conn);
- 		/* NULL means we're all done with the async results */
- 		if (!res)
- 			return (Datum) 0;
- 	}
  
! 	/* if needed, close the connection to the database and cleanup */
! 	if (freeconn)
! 		PQfinish(conn);
  
! 	if (!res ||
! 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
! 		dblink_res_error(conname, res, "could not execute query", fail);
! 		return (Datum) 0;
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
- /*
-  * Materialize the PGresult to return them as the function result.
-  * The res will be released in this function.
-  */
  static void
! materializeResult(FunctionCallInfo fcinfo, PGresult *res)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
  
! 	Assert(rsinfo->returnMode == SFRM_Materialize);
! 
! 	PG_TRY();
  	{
! 		TupleDesc	tupdesc;
! 		bool		is_sql_cmd = false;
! 		int			ntuples;
! 		int			nfields;
  
! 		if (PQresultStatus(res) == PGRES_COMMAND_OK)
! 		{
! 			is_sql_cmd = true;
! 
! 			/*
! 			 * need a tuple descriptor representing one TEXT column to return
! 			 * the command status string as our result tuple
! 			 */
! 			tupdesc = CreateTemplateTupleDesc(1, false);
! 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
! 							   TEXTOID, -1, 0);
! 			ntuples = 1;
! 			nfields = 1;
! 		}
! 		else
! 		{
! 			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
  
! 			is_sql_cmd = false;
  
! 			/* get a tuple descriptor for our result type */
! 			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
  			{
! 				case TYPEFUNC_COMPOSITE:
! 					/* success */
! 					break;
! 				case TYPEFUNC_RECORD:
! 					/* failed to determine actual type of RECORD */
! 					ereport(ERROR,
! 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 						errmsg("function returning record called in context "
! 							   "that cannot accept type record")));
! 					break;
! 				default:
! 					/* result type isn't composite */
! 					elog(ERROR, "return type must be a row type");
! 					break;
  			}
- 
- 			/* make sure we have a persistent copy of the tupdesc */
- 			tupdesc = CreateTupleDescCopy(tupdesc);
- 			ntuples = PQntuples(res);
- 			nfields = PQnfields(res);
  		}
  
! 		/*
! 		 * check result and tuple descriptor have the same number of columns
! 		 */
! 		if (nfields != tupdesc->natts)
! 			ereport(ERROR,
! 					(errcode(ERRCODE_DATATYPE_MISMATCH),
! 					 errmsg("remote query result rowtype does not match "
! 							"the specified FROM clause rowtype")));
! 
! 		if (ntuples > 0)
! 		{
! 			AttInMetadata *attinmeta;
! 			Tuplestorestate *tupstore;
! 			MemoryContext oldcontext;
! 			int			row;
! 			char	  **values;
! 
! 			attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 
! 			oldcontext = MemoryContextSwitchTo(
! 									rsinfo->econtext->ecxt_per_query_memory);
! 			tupstore = tuplestore_begin_heap(true, false, work_mem);
! 			rsinfo->setResult = tupstore;
! 			rsinfo->setDesc = tupdesc;
! 			MemoryContextSwitchTo(oldcontext);
! 
! 			values = (char **) palloc(nfields * sizeof(char *));
  
! 			/* put all tuples into the tuplestore */
! 			for (row = 0; row < ntuples; row++)
! 			{
! 				HeapTuple	tuple;
  
! 				if (!is_sql_cmd)
! 				{
! 					int			i;
  
! 					for (i = 0; i < nfields; i++)
! 					{
! 						if (PQgetisnull(res, row, i))
! 							values[i] = NULL;
! 						else
! 							values[i] = PQgetvalue(res, row, i);
! 					}
! 				}
! 				else
! 				{
! 					values[0] = PQcmdStatus(res);
! 				}
  
! 				/* build the tuple and put it into the tuplestore. */
! 				tuple = BuildTupleFromCStrings(attinmeta, values);
! 				tuplestore_puttuple(tupstore, tuple);
  			}
  
! 			/* clean up and return the tuplestore */
! 			tuplestore_donestoring(tupstore);
  		}
  
! 		PQclear(res);
  	}
  	PG_CATCH();
  	{
! 		/* be sure to release the libpq result */
! 		PQclear(res);
! 		PG_RE_THROW();
  	}
  	PG_END_TRY();
  }
  
  /*
--- 754,977 ----
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
+ 
+ 	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec/PQgetResult below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
  		res = PQgetResult(conn);
  
! 	finishStoreInfo(&storeinfo);
  
! 	/* NULL res from async get means we're all done with the results */
! 	if (res || !is_async)
  	{
! 		if (freeconn)
! 			PQfinish(conn);
! 
! 		if (!res ||
! 			(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 			 PQresultStatus(res) != PGRES_TUPLES_OK))
! 		{
! 			/* finishStoreInfo saves the fields referred to below. */
! 			if (storeinfo.nummismatch)
! 			{
! 				/* This is only for backward compatibility */
! 				ereport(ERROR,
! 						(errcode(ERRCODE_DATATYPE_MISMATCH),
! 						 errmsg("remote query result rowtype does not match "
! 								"the specified FROM clause rowtype")));
! 			}
! 			else if (storeinfo.edata)
! 				ReThrowError(storeinfo.edata);
! 
! 			dblink_res_error(conname, res, "could not execute query", fail);
! 			return (Datum) 0;
! 		}
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
  static void
! initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc	tupdesc;
+ 	int i;
+ 	
+ 	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+ 	{
+ 		case TYPEFUNC_COMPOSITE:
+ 			/* success */
+ 			break;
+ 		case TYPEFUNC_RECORD:
+ 			/* failed to determine actual type of RECORD */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ 					 errmsg("function returning record called in context "
+ 							"that cannot accept type record")));
+ 			break;
+ 		default:
+ 			/* result type isn't composite */
+ 			elog(ERROR, "return type must be a row type");
+ 			break;
+ 	}
+ 	
+ 	sinfo->oldcontext = MemoryContextSwitchTo(
+ 		rsinfo->econtext->ecxt_per_query_memory);
+ 
+ 	/* make sure we have a persistent copy of the tupdesc */
+ 	tupdesc = CreateTupleDescCopy(tupdesc);
+ 
+ 	sinfo->error_occurred = FALSE;
+ 	sinfo->nummismatch = FALSE;
+ 	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+ 	sinfo->edata = NULL;
+ 	sinfo->nattrs = tupdesc->natts;
+ 	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+ 	sinfo->valbuf = NULL;
+ 	sinfo->valbuflen = NULL;
+ 
+ 	/* Preallocate memory of same size with c string array for values. */
+ 	sinfo->valbuf = (char **) malloc(sinfo->nattrs * sizeof(char*));
+ 	sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+ 	if (sinfo->valbuf == NULL || sinfo->valbuflen == NULL)
+ 	{
+ 		finishStoreInfo(sinfo);
+ 		ereport(ERROR,
+ 				(errcode(ERRCODE_OUT_OF_MEMORY),
+ 				 errmsg("out of memory")));
+ 	}
  
! 	for (i = 0 ; i < sinfo->nattrs ; i++)
  	{
! 		sinfo->valbuf[i] = NULL;
! 		sinfo->valbuflen[i] = -1;
! 	}
  
! 	rsinfo->setResult = sinfo->tuplestore;
! 	rsinfo->setDesc = tupdesc;
! }
  
! static void
! finishStoreInfo(storeInfo *sinfo)
! {
! 	int i;
  
! 	if (sinfo->valbuf)
! 	{
! 		for (i = 0 ; i < sinfo->nattrs ; i++)
! 		{
! 			if (sinfo->valbuf[i])
  			{
! 				free(sinfo->valbuf[i]);
! 				sinfo->valbuf[i] = NULL;
  			}
  		}
+ 		free(sinfo->valbuf);
+ 		sinfo->valbuf = NULL;
+ 	}
  
! 	if (sinfo->valbuflen)
! 	{
! 		free(sinfo->valbuflen);
! 		sinfo->valbuflen = NULL;
! 	}
! 	MemoryContextSwitchTo(sinfo->oldcontext);
! }
  
! static int
! storeHandler(PGresult *res, void *param, PGrowValue *columns)
! {
! 	storeInfo *sinfo = (storeInfo *)param;
! 	HeapTuple  tuple;
! 	int        fields = PQnfields(res);
! 	int        i;
! 	char      *cstrs[PQnfields(res)];
  
! 	if (sinfo->error_occurred)
! 		return FALSE;
  
! 	if (sinfo->nattrs != fields)
! 	{
! 		sinfo->error_occurred = TRUE;
! 		sinfo->nummismatch = TRUE;
! 		finishStoreInfo(sinfo);
! 
! 		/* This error will be processed in
! 		 * dblink_record_internal(). So do not set error message
! 		 * here. */
! 		return FALSE;
! 	}
  
! 	/*
! 	 * value input functions assumes that the input string is
! 	 * terminated by zero. We should make the values to be so.
! 	 */
! 	for(i = 0 ; i < fields ; i++)
! 	{
! 		int len = columns[i].len;
! 		if (len < 0)
! 			cstrs[i] = NULL;
! 		else
! 		{
! 			if (sinfo->valbuf[i] == NULL)
! 			{
! 				int mlen = (len + 1 > 64) ? (len + 1) : 64;
! 				sinfo->valbuf[i] = (char *)malloc(mlen);
! 				if (sinfo->valbuf[i] == NULL)
! 					return FALSE;
! 				sinfo->valbuflen[i] = mlen;
! 			}
! 			else if (sinfo->valbuflen[i] < len + 1)
! 			{
! 				int newlen = sinfo->valbuflen[i] * 2;
! 				char *ptr = sinfo->valbuf[i];
! 				if (newlen < len + 1)
! 					newlen = len + 1;
! 				ptr = (char *)realloc(ptr, newlen);
! 				if (!ptr)
! 					return FALSE;
! 				sinfo->valbuf[i] = ptr;
! 				sinfo->valbuflen[i] = newlen;
  			}
  
! 			cstrs[i] = sinfo->valbuf[i];
! 			memcpy(cstrs[i], columns[i].value, len);
! 			cstrs[i][len] = '\0';
  		}
+ 	}
  
! 	PG_TRY();
! 	{
! 		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
! 		tuplestore_puttuple(sinfo->tuplestore, tuple);
  	}
  	PG_CATCH();
  	{
! 		MemoryContext context;
! 		/*
! 		 * Store exception for later ReThrow and cancel the exception.
! 		 */
! 		sinfo->error_occurred = TRUE;
! 		context = MemoryContextSwitchTo(sinfo->oldcontext);
! 		sinfo->edata = CopyErrorData();
! 		MemoryContextSwitchTo(context);
! 		FlushErrorState();
! 		return FALSE;
  	}
  	PG_END_TRY();
+ 
+ 	return TRUE;
  }
  
  /*
test_prog.ctext/x-csrc; charset=us-asciiDownload
test_schema.sqltext/plain; charset=us-asciiDownload
#38Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#35)
1 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, This is new version of dblink.c

- Memory is properly freed when realloc returns NULL in storeHandler().

- The bug that free() in finishStoreInfo() will be fed with
garbage pointer when malloc for sinfo->valbuflen fails is
fixed.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

dblink_use_rowproc_20120202.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..28c967c 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	bool error_occurred;
+	bool nummismatch;
+	ErrorData *edata;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,36 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+		else if (storeinfo.edata)
+			ReThrowError(storeinfo.edata);
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +617,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +678,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +754,225 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQregisterRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
 	if (!is_async)
 		res = PQexec(conn, sql);
 	else
-	{
 		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
-	}
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			else if (storeinfo.edata)
+				ReThrowError(storeinfo.edata);
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
+	
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
+	
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->edata = NULL;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+	if (sinfo->valbuf)
+		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuflen == NULL)
+	{
+		if (sinfo->valbuf)
+			free(sinfo->valbuf);
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
 
-	PG_TRY();
+	for (i = 0 ; i < sinfo->nattrs ; i++)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
+
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
 		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+			if (sinfo->valbuf[i])
+				free(sinfo->valbuf[i]);
+		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-			is_sql_cmd = false;
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      *cstrs[PQnfields(res)];
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
-		}
+	if (sinfo->error_occurred)
+		return FALSE;
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return FALSE;
+	}
 
-		if (ntuples > 0)
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
 		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
-
-			values = (char **) palloc(nfields * sizeof(char *));
-
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+			char *tmp = NULL;
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+			/*
+			 * Divide calls to malloc and realloc so that things will
+			 * go fine even on the systems of which realloc() does not
+			 * accept NULL as old memory block.
+			 */
+			if (sinfo->valbuf[i] == NULL)
+				tmp = (char *)malloc(len + 1);
+			else
+				tmp = (char *)realloc(sinfo->valbuf[i], len + 1);
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+			/*
+			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
+			 * when realloc returns NULL.
+			 */
+			if (tmp == NULL)
+				return FALSE;
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
-			}
+			sinfo->valbuf[i] = tmp;
+			sinfo->valbuflen[i] = len + 1;
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
 		}
+	}
 
-		PQclear(res);
+	PG_TRY();
+	{
+		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+		tuplestore_puttuple(sinfo->tuplestore, tuple);
 	}
 	PG_CATCH();
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		MemoryContext context;
+		/*
+		 * Store exception for later ReThrow and cancel the exception.
+		 */
+		sinfo->error_occurred = TRUE;
+		context = MemoryContextSwitchTo(sinfo->oldcontext);
+		sinfo->edata = CopyErrorData();
+		MemoryContextSwitchTo(context);
+		FlushErrorState();
+		return FALSE;
 	}
 	PG_END_TRY();
+
+	return TRUE;
 }
 
 /*
#39Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#38)
3 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote:

Hello, This is new version of dblink.c

- Memory is properly freed when realloc returns NULL in storeHandler().

- The bug that free() in finishStoreInfo() will be fed with
garbage pointer when malloc for sinfo->valbuflen fails is
fixed.

Thanks, merged. I also did some minor coding style cleanups.

Tagging it Ready For Committer. I don't see any notable
issues with the patch anymore.

There is some potential for experimenting with more aggressive
optimizations on dblink side, but I'd like to get a nod from
a committer for libpq changes first.

--
marko

Attachments:

libpq_rowproc_2012-02-02-2.difftext/x-diff; charset=us-asciiDownload
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***************
*** 160,162 **** PQconnectStartParams      157
--- 160,164 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
+ PQsetRowProcessor	  161
+ PQsetRowProcessorErrMsg	  162
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***************
*** 2693,2698 **** makeEmptyPGconn(void)
--- 2693,2701 ----
  	conn->wait_ssl_try = false;
  #endif
  
+ 	/* set default row processor */
+ 	PQsetRowProcessor(conn, NULL, NULL);
+ 
  	/*
  	 * We try to send at least 8K at a time, which is the usual size of pipe
  	 * buffers on Unix systems.  That way, when we are sending a large amount
***************
*** 2711,2718 **** makeEmptyPGconn(void)
--- 2714,2726 ----
  	initPQExpBuffer(&conn->errorMessage);
  	initPQExpBuffer(&conn->workBuffer);
  
+ 	/* set up initial row buffer */
+ 	conn->rowBufLen = 32;
+ 	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+ 
  	if (conn->inBuffer == NULL ||
  		conn->outBuffer == NULL ||
+ 		conn->rowBuf == NULL ||
  		PQExpBufferBroken(&conn->errorMessage) ||
  		PQExpBufferBroken(&conn->workBuffer))
  	{
***************
*** 2812,2817 **** freePGconn(PGconn *conn)
--- 2820,2827 ----
  		free(conn->inBuffer);
  	if (conn->outBuffer)
  		free(conn->outBuffer);
+ 	if (conn->rowBuf)
+ 		free(conn->rowBuf);
  	termPQExpBuffer(&conn->errorMessage);
  	termPQExpBuffer(&conn->workBuffer);
  
***************
*** 5076,5078 **** PQregisterThreadLock(pgthreadlock_t newhandler)
--- 5086,5089 ----
  
  	return prev;
  }
+ 
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***************
*** 66,71 **** static PGresult *PQexecFinish(PGconn *conn);
--- 66,72 ----
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
+ static int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
  
  
  /* ----------------
***************
*** 160,165 **** PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
--- 161,167 ----
  	result->curBlock = NULL;
  	result->curOffset = 0;
  	result->spaceLeft = 0;
+ 	result->rowProcessorErrMsg = NULL;
  
  	if (conn)
  	{
***************
*** 701,707 **** pqClearAsyncResult(PGconn *conn)
  	if (conn->result)
  		PQclear(conn->result);
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  }
  
  /*
--- 703,708 ----
***************
*** 756,762 **** pqPrepareAsyncResult(PGconn *conn)
  	 */
  	res = conn->result;
  	conn->result = NULL;		/* handing over ownership to caller */
- 	conn->curTuple = NULL;		/* just in case */
  	if (!res)
  		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	else
--- 757,762 ----
***************
*** 828,833 **** pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
--- 828,900 ----
  }
  
  /*
+  * PQsetRowProcessor
+  *   Set function that copies column data out from network buffer.
+  */
+ void
+ PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+ {
+ 	conn->rowProcessor = (func ? func : pqAddRow);
+ 	conn->rowProcessorParam = param;
+ }
+ 
+ /*
+  * PQsetRowProcessorErrMsg
+  *    Set the error message pass back to the caller of RowProcessor.
+  *
+  *    You can replace the previous message by alternative mes, or clear
+  *    it with NULL.
+  */
+ void
+ PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ {
+ 	if (msg)
+ 		res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+ 	else
+ 		res->rowProcessorErrMsg = NULL;
+ }
+ 
+ /*
+  * pqAddRow
+  *	  add a row to the PGresult structure, growing it if necessary
+  *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+  */
+ static int
+ pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+ {
+ 	PGresAttValue *tup;
+ 	int			nfields = res->numAttributes;
+ 	int			i;
+ 
+ 	tup = (PGresAttValue *)
+ 		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+ 	if (tup == NULL)
+ 		return FALSE;
+ 
+ 	for (i = 0 ; i < nfields ; i++)
+ 	{
+ 		tup[i].len = columns[i].len;
+ 		if (tup[i].len == NULL_LEN)
+ 		{
+ 			tup[i].value = res->null_field;
+ 		}
+ 		else
+ 		{
+ 			bool isbinary = (res->attDescs[i].format != 0);
+ 			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+ 			if (tup[i].value == NULL)
+ 				return FALSE;
+ 
+ 			memcpy(tup[i].value, columns[i].value, tup[i].len);
+ 			/* We have to terminate this ourselves */
+ 			tup[i].value[tup[i].len] = '\0';
+ 		}
+ 	}
+ 
+ 	return pqAddTuple(res, tup);
+ }
+ 
+ /*
   * pqAddTuple
   *	  add a row pointer to the PGresult structure, growing it if necessary
   *	  Returns TRUE if OK, FALSE if not enough memory to add the row
***************
*** 1223,1229 **** PQsendQueryStart(PGconn *conn)
  
  	/* initialize async result-accumulation state */
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  
  	/* ready to send command message */
  	return true;
--- 1290,1295 ----
*** a/src/interfaces/libpq/fe-misc.c
--- b/src/interfaces/libpq/fe-misc.c
***************
*** 219,224 **** pqGetnchar(char *s, size_t len, PGconn *conn)
--- 219,243 ----
  }
  
  /*
+  * pqGetnchar:
+  *	skip len bytes in input buffer.
+  */
+ int
+ pqSkipnchar(size_t len, PGconn *conn)
+ {
+ 	if (len > (size_t) (conn->inEnd - conn->inCursor))
+ 		return EOF;
+ 
+ 	conn->inCursor += len;
+ 
+ 	if (conn->Pfdebug)
+ 		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+ 				(unsigned long) len);
+ 
+ 	return 0;
+ }
+ 
+ /*
   * pqPutnchar:
   *	write exactly len bytes to the current message
   */
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
***************
*** 703,721 **** failure:
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
--- 703,720 ----
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
***************
*** 727,754 **** getAnotherTuple(PGconn *conn, bool binary)
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
  	result->binary = binary;
  
! 	/* Allocate tuple space if first time for this data message */
! 	if (conn->curTuple == NULL)
  	{
! 		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
! 		if (conn->curTuple == NULL)
! 			goto outOfMemory;
! 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
! 
! 		/*
! 		 * If it's binary, fix the column format indicators.  We assume the
! 		 * backend will consistently send either B or D, not a mix.
! 		 */
! 		if (binary)
! 		{
! 			for (i = 0; i < nfields; i++)
! 				result->attDescs[i].format = 1;
! 		}
  	}
- 	tup = conn->curTuple;
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
--- 726,752 ----
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
+ 	/* resize row buffer if needed */
+ 	if (nfields > conn->rowBufLen)
+ 	{
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 			goto rowProcessError;
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	}
+ 	else
+ 	{
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
  	result->binary = binary;
  
! 	if (binary)
  	{
! 		for (i = 0; i < nfields; i++)
! 			result->attDescs[i].format = 1;
  	}
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
***************
*** 757,763 **** getAnotherTuple(PGconn *conn, bool binary)
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto outOfMemory;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
--- 755,761 ----
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto rowProcessError;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
***************
*** 771,804 **** getAnotherTuple(PGconn *conn, bool binary)
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 		{
! 			/* if the field value is absent, make it a null string */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 		}
  		else
  		{
- 			/* get the value length (the first four bytes are for length) */
- 			if (pqGetInt(&vlen, 4, conn))
- 				goto EOFexit;
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
- 			if (tup[i].value == NULL)
- 			{
- 				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
- 				if (tup[i].value == NULL)
- 					goto outOfMemory;
- 			}
- 			tup[i].len = vlen;
- 			/* read in the value */
- 			if (vlen > 0)
- 				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
- 					goto EOFexit;
- 			/* we have to terminate this ourselves */
- 			tup[i].value[vlen] = '\0';
  		}
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
--- 769,797 ----
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 			vlen = NULL_LEN;
! 		else if (pqGetInt(&vlen, 4, conn))
! 				goto EOFexit;
  		else
  		{
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
  		}
+ 
+ 		/*
+ 		 * rowbuf[i].value always points to the next address of the
+ 		 * length field even if the value is NULL, to allow safe
+ 		 * size estimates and data copy.
+ 		 */
+ 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+ 		rowbuf[i].len = vlen;
+ 
+ 		/* Skip the value */
+ 		if (vlen > 0 && pqSkipnchar(vlen, conn))
+ 			goto EOFexit;
+ 
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
***************
*** 811,827 **** getAnotherTuple(PGconn *conn, bool binary)
  			bmap <<= 1;
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
  	return 0;
  
! outOfMemory:
  	/* Replace partially constructed result with an error result */
  
  	/*
--- 804,820 ----
  			bmap <<= 1;
  	}
  
! 	/* Success!  Pass the completed row values to rowProcessor */
! 	if (!conn->rowProcessor(result, conn->rowProcessorParam, rowbuf))
! 		goto rowProcessError;
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
+ 
  	return 0;
  
! rowProcessError:
! 
  	/* Replace partially constructed result with an error result */
  
  	/*
***************
*** 829,838 **** outOfMemory:
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  
  	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
--- 822,838 ----
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
  
  	/*
+ 	 * If error message is passed from RowProcessor, set it into
+ 	 * PGconn, assume out of memory if not.
+ 	 */
+ 	appendPQExpBufferStr(&conn->errorMessage,
+ 						 result->rowProcessorErrMsg ?
+ 						 result->rowProcessorErrMsg :
+ 						 libpq_gettext("out of memory for query result\n"));
+ 	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
***************
*** 613,646 **** failure:
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
  
- 	/* Allocate tuple space if first time for this data message */
- 	if (conn->curTuple == NULL)
- 	{
- 		conn->curTuple = (PGresAttValue *)
- 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
- 		if (conn->curTuple == NULL)
- 			goto outOfMemory;
- 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
- 	}
- 	tup = conn->curTuple;
- 
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
  		return EOF;
--- 613,634 ----
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
  
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
  		return EOF;
***************
*** 656,661 **** getAnotherTuple(PGconn *conn, int msgLength)
--- 644,663 ----
  		return 0;
  	}
  
+ 	/* resize row buffer if needed */
+ 	if (nfields > conn->rowBufLen)
+ 	{
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 			goto rowProcessError;
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	}
+ 	else
+ 	{
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
  	/* Scan the fields */
  	for (i = 0; i < nfields; i++)
  	{
***************
*** 663,710 **** getAnotherTuple(PGconn *conn, int msgLength)
  		if (pqGetInt(&vlen, 4, conn))
  			return EOF;
  		if (vlen == -1)
! 		{
! 			/* null field */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 			continue;
! 		}
! 		if (vlen < 0)
  			vlen = 0;
- 		if (tup[i].value == NULL)
- 		{
- 			bool		isbinary = (result->attDescs[i].format != 0);
  
! 			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
! 			if (tup[i].value == NULL)
! 				goto outOfMemory;
! 		}
! 		tup[i].len = vlen;
! 		/* read in the value */
! 		if (vlen > 0)
! 			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
! 				return EOF;
! 		/* we have to terminate this ourselves */
! 		tup[i].value[vlen] = '\0';
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
  	return 0;
  
! outOfMemory:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
--- 665,710 ----
  		if (pqGetInt(&vlen, 4, conn))
  			return EOF;
  		if (vlen == -1)
! 			vlen = NULL_LEN;
! 		else if (vlen < 0)
  			vlen = 0;
  
! 		/*
! 		 * rowbuf[i].value always points to the next address of the
! 		 * length field even if the value is NULL, to allow safe
! 		 * size estimates and data copy.
! 		 */
! 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
! 		rowbuf[i].len = vlen;
! 
! 		/* Skip to the next length field */
! 		if (vlen > 0 && pqSkipnchar(vlen, conn))
! 			return EOF;
  	}
  
! 	/* Success!  Pass the completed row values to rowProcessor */
! 	if (!conn->rowProcessor(result, conn->rowProcessorParam, rowbuf))
! 		goto rowProcessError;
  
  	return 0;
  
! rowProcessError:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
! 
! 	/*
! 	 * If error message is passed from addTupleFunc, set it into
! 	 * PGconn, assume out of memory if not.
! 	 */
! 	appendPQExpBufferStr(&conn->errorMessage,
! 						 result->rowProcessorErrMsg ?
! 						 result->rowProcessorErrMsg :
! 						 libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
***************
*** 149,154 **** typedef struct pgNotify
--- 149,165 ----
  	struct pgNotify *next;		/* list link */
  } PGnotify;
  
+ /* PGrowValue points a column value of in network buffer.
+  * Value is a string without null termination and length len.
+  * NULL is represented as len < 0, value points then to place
+  * where value would have been.
+  */
+ typedef struct pgRowValue
+ {
+ 	int			len;			/* length in bytes of the value */
+ 	char	   *value;			/* actual value, without null termination */
+ } PGrowValue;
+ 
  /* Function types for notice-handling callbacks */
  typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
  typedef void (*PQnoticeProcessor) (void *arg, const char *message);
***************
*** 416,421 **** extern PGPing PQping(const char *conninfo);
--- 427,463 ----
  extern PGPing PQpingParams(const char *const * keywords,
  			 const char *const * values, int expand_dbname);
  
+ /*
+  * Typedef for alternative row processor.
+  *
+  * Columns array will contain PQnfields() entries, each one
+  * pointing to particular column data in network buffer.
+  * This function is supposed to copy data out from there
+  * and store somewhere.  NULL is signified with len<0.
+  *
+  * This function must return 1 for success and must return 0 for
+  * failure and may set error message by PQsetRowProcessorErrMsg.  It
+  * is assumed by caller as out of memory when the error message is not
+  * set on failure. This function is assumed not to throw any
+  * exception.
+  */
+ typedef int (*PQrowProcessor)(PGresult *res, void *param,
+ 								PGrowValue *columns);
+ 
+ /*
+  * Set alternative row data processor for PGconn.
+  *
+  * By registering this function, pg_result disables its own result
+  * store and calls it for rows one by one.
+  *
+  * func is row processor function. See the typedef RowProcessor.
+  *
+  * rowProcessorParam is the contextual variable that passed to
+  * RowProcessor.
+  */
+ extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+ 								   void *rowProcessorParam);
+ 
  /* Force the write buffer to be written (or at least try) */
  extern int	PQflush(PGconn *conn);
  
***************
*** 454,459 **** extern char *PQcmdTuples(PGresult *res);
--- 496,502 ----
  extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+ extern void	PQsetRowProcessorErrMsg(PGresult *res, char *msg);
  extern int	PQnparams(const PGresult *res);
  extern Oid	PQparamtype(const PGresult *res, int param_num);
  
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
***************
*** 209,214 **** struct pg_result
--- 209,217 ----
  	PGresult_data *curBlock;	/* most recently allocated block */
  	int			curOffset;		/* start offset of free space in block */
  	int			spaceLeft;		/* number of free bytes remaining in block */
+ 
+ 	/* temp etorage for message from row processor callback */
+ 	char	   *rowProcessorErrMsg;
  };
  
  /* PGAsyncStatusType defines the state of the query-execution state machine */
***************
*** 398,404 **** struct pg_conn
  
  	/* Status for asynchronous result construction */
  	PGresult   *result;			/* result being constructed */
- 	PGresAttValue *curTuple;	/* tuple currently being read */
  
  #ifdef USE_SSL
  	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
--- 401,406 ----
***************
*** 443,448 **** struct pg_conn
--- 445,458 ----
  
  	/* Buffer for receiving various parts of messages */
  	PQExpBufferData workBuffer; /* expansible string */
+ 
+ 	/*
+ 	 * Read column data from network buffer.
+ 	 */
+ 	PQrowProcessor rowProcessor;/* Function pointer */
+ 	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+ 	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+ 	int rowBufLen;				/* Number of columns allocated in rowBuf */
  };
  
  /* PGcancel stores all data necessary to cancel a connection. A copy of this
***************
*** 560,565 **** extern int	pqGets(PQExpBuffer buf, PGconn *conn);
--- 570,576 ----
  extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
  extern int	pqPuts(const char *s, PGconn *conn);
  extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+ extern int	pqSkipnchar(size_t len, PGconn *conn);
  extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
  extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
  extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_2012-02-02-2.difftext/x-diff; charset=us-asciiDownload
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
***************
*** 7233,7238 **** int PQisthreadsafe();
--- 7233,7443 ----
   </sect1>
  
  
+  <sect1 id="libpq-altrowprocessor">
+   <title>Alternative row processor</title>
+ 
+   <indexterm zone="libpq-altrowprocessor">
+    <primary>PGresult</primary>
+    <secondary>PGconn</secondary>
+   </indexterm>
+ 
+   <para>
+    As the standard usage, rows are stored into <type>PQresult</type>
+    until full resultset is received.  Then such completely-filled
+    <type>PQresult</type> is passed to user.  This behaviour can be
+    changed by registering alternative row processor function,
+    that will see each row data as soon as it is received
+    from network.  It has the option of processing the data
+    immediately, or storing it into custom container.
+   </para>
+ 
+   <para>
+    Note - as row processor sees rows as they arrive, it cannot know
+    whether the SQL statement actually finishes successfully on server
+    or not.  So some care must be taken to get proper
+    transactionality.
+   </para>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessor">
+     <term>
+      <function>PQsetRowProcessor</function>
+      <indexterm>
+       <primary>PQsetRowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        Sets a callback function to process each row.
+ <synopsis>
+ void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+ </synopsis>
+      </para>
+      
+      <para>
+        <variablelist>
+ 	 <varlistentry>
+ 	   <term><parameter>conn</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       The connection object to set the row processor function.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>func</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       Storage handler function to set. NULL means to use the
+ 	       default processor.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>param</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       A pointer to contextual parameter passed
+ 	       to <parameter>func</parameter>.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+        </variablelist>
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqrowprocessor">
+     <term>
+      <type>PQrowProcessor</type>
+      <indexterm>
+       <primary>PQrowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        The type for the row processor callback function.
+ <synopsis>
+ int (*PQrowProcessor)(PGresult *res, void *param, PGrowValue *columns);
+ 
+ typedef struct
+ {
+     int         len;            /* length in bytes of the value */
+     char       *value;          /* actual value, without null termination */
+ } PGrowValue;
+ </synopsis>
+      </para>
+ 
+      <para>
+       The <parameter>columns</parameter> array will have PQnfields()
+       elements, each one pointing to column value in network buffer.
+      </para>
+ 
+      <para>
+        This function must process or copy row values away from network
+        buffer before it returns, as next row might overwrite them.
+      </para>
+ 
+      <para>
+        This function must return 1 for success, and 0 for failure.
+        On failure this function should set the error message
+        with <function>PGsetRowProcessorErrMsg</function> if the cause
+        is other than out of memory.  This funcion must not throw any
+        exception.
+      </para>
+      <variablelist>
+        <varlistentry>
+ 
+ 	 <term><parameter>res</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     A pointer to the <type>PGresult</type> object.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>param</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>columns</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Column values of the row to process.  Column values
+ 	     are located in network buffer, the processor must
+ 	     copy them out from there.
+ 	   </para>
+ 	   <para>
+ 	     Column values are not null-terminated, so processor cannot
+ 	     use C string functions on them directly.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+      </variablelist>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+     <term>
+      <function>PQsetRowProcessorErrMsg</function>
+      <indexterm>
+       <primary>PQsetRowProcessorErrMsg</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+ 	Set the message for the error occurred
+ 	in <type>PQrowProcessor</type>.  If this message is not set, the
+ 	caller assumes the error to be out of memory.
+ <synopsis>
+ void PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ </synopsis>
+       </para>
+       <para>
+ 	<variablelist>
+ 	  <varlistentry>
+ 	    <term><parameter>res</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the <type>PGresult</type> object
+ 		passed to <type>PQrowProcessor</type>.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	  <varlistentry>
+ 	    <term><parameter>mes</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		Error message. This will be copied internally so there is
+ 		no need to care of the scope.
+ 	      </para>
+ 	      <para>
+ 		If <parameter>res</parameter> already has a message previously
+ 		set, it will be overritten. Set NULL to cancel the the costom
+ 		message.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	</variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </sect1>
+ 
+ 
   <sect1 id="libpq-build">
    <title>Building <application>libpq</application> Programs</title>
  
dblink_rowproc_2012-02-02-2.difftext/x-diff; charset=us-asciiDownload
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
***************
*** 63,73 **** typedef struct remoteConn
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
- static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 63,85 ----
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
+ typedef struct storeInfo
+ {
+ 	Tuplestorestate *tuplestore;
+ 	int nattrs;
+ 	MemoryContext oldcontext;
+ 	AttInMetadata *attinmeta;
+ 	char** valbuf;
+ 	int *valbuflen;
+ 	bool error_occurred;
+ 	bool nummismatch;
+ 	ErrorData *edata;
+ } storeInfo;
+ 
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
***************
*** 90,95 **** static char *escape_param_str(const char *from);
--- 102,111 ----
  static void validate_pkattnums(Relation rel,
  				   int2vector *pkattnums_arg, int32 pknumatts_arg,
  				   int **pkattnums, int *pknumatts);
+ static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+ static void finishStoreInfo(storeInfo *sinfo);
+ static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+ 
  
  /* Global */
  static remoteConn *pconn = NULL;
***************
*** 503,508 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 519,525 ----
  	char	   *curname = NULL;
  	int			howmany = 0;
  	bool		fail = true;	/* default to backward compatible */
+ 	storeInfo   storeinfo;
  
  	DBLINK_INIT;
  
***************
*** 559,573 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 576,611 ----
  	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
  
  	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
+ 	/*
  	 * Try to execute the query.  Note that since libpq uses malloc, the
  	 * PGresult will be long-lived even though we are still in a short-lived
  	 * memory context.
  	 */
  	res = PQexec(conn, buf.data);
+ 	finishStoreInfo(&storeinfo);
+ 
  	if (!res ||
  		(PQresultStatus(res) != PGRES_COMMAND_OK &&
  		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
+ 		/* finishStoreInfo saves the fields referred to below. */
+ 		if (storeinfo.nummismatch)
+ 		{
+ 			/* This is only for backward compatibility */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_DATATYPE_MISMATCH),
+ 					 errmsg("remote query result rowtype does not match "
+ 							"the specified FROM clause rowtype")));
+ 		}
+ 		else if (storeinfo.edata)
+ 			ReThrowError(storeinfo.edata);
+ 
  		dblink_res_error(conname, res, "could not fetch from cursor", fail);
  		return (Datum) 0;
  	}
***************
*** 579,586 **** dblink_fetch(PG_FUNCTION_ARGS)
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
--- 617,624 ----
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
***************
*** 640,645 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 678,684 ----
  	remoteConn *rconn = NULL;
  	bool		fail = true;	/* default to backward compatible */
  	bool		freeconn = false;
+ 	storeInfo   storeinfo;
  
  	/* check to see if caller supports us returning a tuplestore */
  	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
***************
*** 715,878 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
- 	{
  		res = PQgetResult(conn);
- 		/* NULL means we're all done with the async results */
- 		if (!res)
- 			return (Datum) 0;
- 	}
  
! 	/* if needed, close the connection to the database and cleanup */
! 	if (freeconn)
! 		PQfinish(conn);
  
! 	if (!res ||
! 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
! 		dblink_res_error(conname, res, "could not execute query", fail);
! 		return (Datum) 0;
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
- /*
-  * Materialize the PGresult to return them as the function result.
-  * The res will be released in this function.
-  */
  static void
! materializeResult(FunctionCallInfo fcinfo, PGresult *res)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
  
! 	Assert(rsinfo->returnMode == SFRM_Materialize);
  
! 	PG_TRY();
  	{
! 		TupleDesc	tupdesc;
! 		bool		is_sql_cmd = false;
! 		int			ntuples;
! 		int			nfields;
  
! 		if (PQresultStatus(res) == PGRES_COMMAND_OK)
! 		{
! 			is_sql_cmd = true;
  
! 			/*
! 			 * need a tuple descriptor representing one TEXT column to return
! 			 * the command status string as our result tuple
! 			 */
! 			tupdesc = CreateTemplateTupleDesc(1, false);
! 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
! 							   TEXTOID, -1, 0);
! 			ntuples = 1;
! 			nfields = 1;
! 		}
! 		else
! 		{
! 			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
  
! 			is_sql_cmd = false;
  
! 			/* get a tuple descriptor for our result type */
! 			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 			{
! 				case TYPEFUNC_COMPOSITE:
! 					/* success */
! 					break;
! 				case TYPEFUNC_RECORD:
! 					/* failed to determine actual type of RECORD */
! 					ereport(ERROR,
! 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 						errmsg("function returning record called in context "
! 							   "that cannot accept type record")));
! 					break;
! 				default:
! 					/* result type isn't composite */
! 					elog(ERROR, "return type must be a row type");
! 					break;
! 			}
  
! 			/* make sure we have a persistent copy of the tupdesc */
! 			tupdesc = CreateTupleDescCopy(tupdesc);
! 			ntuples = PQntuples(res);
! 			nfields = PQnfields(res);
  		}
  
! 		/*
! 		 * check result and tuple descriptor have the same number of columns
! 		 */
! 		if (nfields != tupdesc->natts)
! 			ereport(ERROR,
! 					(errcode(ERRCODE_DATATYPE_MISMATCH),
! 					 errmsg("remote query result rowtype does not match "
! 							"the specified FROM clause rowtype")));
  
! 		if (ntuples > 0)
! 		{
! 			AttInMetadata *attinmeta;
! 			Tuplestorestate *tupstore;
! 			MemoryContext oldcontext;
! 			int			row;
! 			char	  **values;
! 
! 			attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 
! 			oldcontext = MemoryContextSwitchTo(
! 									rsinfo->econtext->ecxt_per_query_memory);
! 			tupstore = tuplestore_begin_heap(true, false, work_mem);
! 			rsinfo->setResult = tupstore;
! 			rsinfo->setDesc = tupdesc;
! 			MemoryContextSwitchTo(oldcontext);
  
! 			values = (char **) palloc(nfields * sizeof(char *));
  
! 			/* put all tuples into the tuplestore */
! 			for (row = 0; row < ntuples; row++)
! 			{
! 				HeapTuple	tuple;
  
! 				if (!is_sql_cmd)
! 				{
! 					int			i;
  
! 					for (i = 0; i < nfields; i++)
! 					{
! 						if (PQgetisnull(res, row, i))
! 							values[i] = NULL;
! 						else
! 							values[i] = PQgetvalue(res, row, i);
! 					}
! 				}
  				else
! 				{
! 					values[0] = PQcmdStatus(res);
! 				}
! 
! 				/* build the tuple and put it into the tuplestore. */
! 				tuple = BuildTupleFromCStrings(attinmeta, values);
! 				tuplestore_puttuple(tupstore, tuple);
  			}
  
! 			/* clean up and return the tuplestore */
! 			tuplestore_donestoring(tupstore);
  		}
  
! 		PQclear(res);
  	}
  	PG_CATCH();
  	{
! 		/* be sure to release the libpq result */
! 		PQclear(res);
! 		PG_RE_THROW();
  	}
  	PG_END_TRY();
  }
  
  /*
--- 754,993 ----
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
+ 
+ 	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec/PQgetResult below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
  		res = PQgetResult(conn);
  
! 	finishStoreInfo(&storeinfo);
  
! 	/* NULL res from async get means we're all done with the results */
! 	if (res || !is_async)
  	{
! 		if (freeconn)
! 			PQfinish(conn);
! 
! 		if (!res ||
! 			(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 			 PQresultStatus(res) != PGRES_TUPLES_OK))
! 		{
! 			/* finishStoreInfo saves the fields referred to below. */
! 			if (storeinfo.nummismatch)
! 			{
! 				/* This is only for backward compatibility */
! 				ereport(ERROR,
! 						(errcode(ERRCODE_DATATYPE_MISMATCH),
! 						 errmsg("remote query result rowtype does not match "
! 								"the specified FROM clause rowtype")));
! 			}
! 			else if (storeinfo.edata)
! 				ReThrowError(storeinfo.edata);
! 
! 			dblink_res_error(conname, res, "could not execute query", fail);
! 			return (Datum) 0;
! 		}
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
  static void
! initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc	tupdesc;
+ 	int i;
  
! 	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 	{
! 		case TYPEFUNC_COMPOSITE:
! 			/* success */
! 			break;
! 		case TYPEFUNC_RECORD:
! 			/* failed to determine actual type of RECORD */
! 			ereport(ERROR,
! 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 					 errmsg("function returning record called in context "
! 							"that cannot accept type record")));
! 			break;
! 		default:
! 			/* result type isn't composite */
! 			elog(ERROR, "return type must be a row type");
! 			break;
! 	}
  
! 	sinfo->oldcontext = MemoryContextSwitchTo(
! 		rsinfo->econtext->ecxt_per_query_memory);
! 
! 	/* make sure we have a persistent copy of the tupdesc */
! 	tupdesc = CreateTupleDescCopy(tupdesc);
! 
! 	sinfo->error_occurred = FALSE;
! 	sinfo->nummismatch = FALSE;
! 	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 	sinfo->edata = NULL;
! 	sinfo->nattrs = tupdesc->natts;
! 	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
! 	sinfo->valbuf = NULL;
! 	sinfo->valbuflen = NULL;
! 
! 	/* Preallocate memory of same size with c string array for values. */
! 	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
! 	if (sinfo->valbuf)
! 		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
! 	if (sinfo->valbuflen == NULL)
  	{
! 		if (sinfo->valbuf)
! 			free(sinfo->valbuf);
  
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OUT_OF_MEMORY),
! 				 errmsg("out of memory")));
! 	}
  
! 	for (i = 0 ; i < sinfo->nattrs ; i++)
! 	{
! 		sinfo->valbuf[i] = NULL;
! 		sinfo->valbuflen[i] = -1;
! 	}
  
! 	rsinfo->setResult = sinfo->tuplestore;
! 	rsinfo->setDesc = tupdesc;
! }
  
! static void
! finishStoreInfo(storeInfo *sinfo)
! {
! 	int i;
  
! 	if (sinfo->valbuf)
! 	{
! 		for (i = 0 ; i < sinfo->nattrs ; i++)
! 		{
! 			if (sinfo->valbuf[i])
! 				free(sinfo->valbuf[i]);
  		}
+ 		free(sinfo->valbuf);
+ 		sinfo->valbuf = NULL;
+ 	}
  
! 	if (sinfo->valbuflen)
! 	{
! 		free(sinfo->valbuflen);
! 		sinfo->valbuflen = NULL;
! 	}
! 	MemoryContextSwitchTo(sinfo->oldcontext);
! }
  
! static int
! storeHandler(PGresult *res, void *param, PGrowValue *columns)
! {
! 	storeInfo *sinfo = (storeInfo *)param;
! 	HeapTuple  tuple;
! 	int        fields = PQnfields(res);
! 	int        i;
! 	char      *cstrs[PQnfields(res)];
  
! 	if (sinfo->error_occurred)
! 		return FALSE;
  
! 	if (sinfo->nattrs != fields)
! 	{
! 		sinfo->error_occurred = TRUE;
! 		sinfo->nummismatch = TRUE;
! 		finishStoreInfo(sinfo);
! 
! 		/* This error will be processed in
! 		 * dblink_record_internal(). So do not set error message
! 		 * here. */
! 		return FALSE;
! 	}
  
! 	/*
! 	 * value input functions assumes that the input string is
! 	 * terminated by zero. We should make the values to be so.
! 	 */
! 	for(i = 0 ; i < fields ; i++)
! 	{
! 		int len = columns[i].len;
! 		if (len < 0)
! 			cstrs[i] = NULL;
! 		else
! 		{
! 			char *tmp = sinfo->valbuf[i];
! 			int tmplen = sinfo->valbuflen[i];
  
! 			/*
! 			 * Divide calls to malloc and realloc so that things will
! 			 * go fine even on the systems of which realloc() does not
! 			 * accept NULL as old memory block.
! 			 *
! 			 * Also try to (re)allocate in bigger steps to
! 			 * avoid flood of allocations on weird data.
! 			 */
! 			if (tmp == NULL)
! 			{
! 				tmplen = len + 1;
! 				if (tmplen < 64)
! 					tmplen = 64;
! 				tmp = (char *)malloc(tmplen);
! 			}
! 			else if (tmplen < len + 1)
! 			{
! 				if (len + 1 > tmplen * 2)
! 					tmplen = len + 1;
  				else
! 					tmplen = tmplen * 2;
! 				tmp = (char *)realloc(tmp, tmplen);
  			}
  
! 			/*
! 			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
! 			 * when realloc returns NULL.
! 			 */
! 			if (tmp == NULL)
! 				return FALSE;
! 
! 			sinfo->valbuf[i] = tmp;
! 			sinfo->valbuflen[i] = tmplen;
! 
! 			cstrs[i] = sinfo->valbuf[i];
! 			memcpy(cstrs[i], columns[i].value, len);
! 			cstrs[i][len] = '\0';
  		}
+ 	}
  
! 	PG_TRY();
! 	{
! 		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
! 		tuplestore_puttuple(sinfo->tuplestore, tuple);
  	}
  	PG_CATCH();
  	{
! 		MemoryContext context;
! 		/*
! 		 * Store exception for later ReThrow and cancel the exception.
! 		 */
! 		sinfo->error_occurred = TRUE;
! 		context = MemoryContextSwitchTo(sinfo->oldcontext);
! 		sinfo->edata = CopyErrorData();
! 		MemoryContextSwitchTo(context);
! 		FlushErrorState();
! 		return FALSE;
  	}
  	PG_END_TRY();
+ 
+ 	return TRUE;
  }
  
  /*
#40Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#39)
Re: Speed dblink using alternate libpq tuple storage

Hello,

Thanks, merged. I also did some minor coding style cleanups.

Thank you for editing many comments and some code I'd left
unchanged from my carelessness, and lack of understanding your
comments. I'll be more careful about that...

There is some potential for experimenting with more aggressive
optimizations on dblink side, but I'd like to get a nod from
a committer for libpq changes first.

I'm looking forward to the aggressiveness of that:-)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#41Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Marko Kreen (#39)
Re: Speed dblink using alternate libpq tuple storage

(2012/02/02 23:30), Marko Kreen wrote:

On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote:

Hello, This is new version of dblink.c

- Memory is properly freed when realloc returns NULL in storeHandler().

- The bug that free() in finishStoreInfo() will be fed with
garbage pointer when malloc for sinfo->valbuflen fails is
fixed.

Thanks, merged. I also did some minor coding style cleanups.

Tagging it Ready For Committer. I don't see any notable
issues with the patch anymore.

There is some potential for experimenting with more aggressive
optimizations on dblink side, but I'd like to get a nod from
a committer for libpq changes first.

I tried to use this feature in pgsql_fdw patch, and found some small issues.

- Typos
- mes -> msg
- funcion -> function
- overritten -> overwritten
- costom -> custom
- What is the right (or recommended) way to prevent from throwing
exceptoin in row-processor callback function? When author should use
PG_TRY block to catch exception in the callback function?
- It would be better to describe how to determine whether a column
result is NULL should be described clearly. Perhaps result value is
NULL when PGrowValue.len is less than 0, right?

Regards,
--
Shigeru Hanada

#42Marko Kreen
markokr@gmail.com
In reply to: Shigeru Hanada (#41)
3 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote:

(2012/02/02 23:30), Marko Kreen wrote:

On Thu, Feb 02, 2012 at 04:51:37PM +0900, Kyotaro HORIGUCHI wrote:

Hello, This is new version of dblink.c

- Memory is properly freed when realloc returns NULL in storeHandler().

- The bug that free() in finishStoreInfo() will be fed with
garbage pointer when malloc for sinfo->valbuflen fails is
fixed.

Thanks, merged. I also did some minor coding style cleanups.

Tagging it Ready For Committer. I don't see any notable
issues with the patch anymore.

There is some potential for experimenting with more aggressive
optimizations on dblink side, but I'd like to get a nod from
a committer for libpq changes first.

I tried to use this feature in pgsql_fdw patch, and found some small issues.

- Typos
- mes -> msg
- funcion -> function
- overritten -> overwritten
- costom -> custom

Fixed.

- What is the right (or recommended) way to prevent from throwing
exceptoin in row-processor callback function? When author should use
PG_TRY block to catch exception in the callback function?

When it calls backend functions that can throw exceptions?
As all handlers running in backend will want to convert data
to Datums, that means "always wrap handler code in PG_TRY"?

Although it seems we could allow exceptions, at least when we are
speaking of Postgres backend, as the connection and result are
internally consistent state when the handler is called, and the
partial PGresult is stored under PGconn->result. Even the
connection is in consistent state, as the row packet is
fully processed.

So we have 3 variants, all work, but which one do we want to support?

1) Exceptions are not allowed.
2) Exceptions are allowed, but when it happens, user must call
PQfinish() next, to avoid processing incoming data from
potentially invalid state.
3) Exceptions are allowed, and further row processing can continue
with PQisBusy() / PQgetResult()
3.1) The problematic row is retried. (Current behaviour)
3.2) The problematic row is skipped.

No clue if that is ok for handler written in C++, I have no idea
whether you can throw C++ exception when part of the stack
contains raw C calls.

- It would be better to describe how to determine whether a column
result is NULL should be described clearly. Perhaps result value is
NULL when PGrowValue.len is less than 0, right?

Eh, seems it's documented everywhere except in sgml doc. Fixed.
[ Is it better to document that it is "-1" or "< 0"? ]

Also I removed one remaining dynamic stack array in dblink.c

Current state of patch attached.

--
marko

Attachments:

libpq_rowproc_2012-02-07.difftext/x-diff; charset=us-asciiDownload
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***************
*** 160,162 **** PQconnectStartParams      157
--- 160,164 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
+ PQsetRowProcessor	  161
+ PQsetRowProcessorErrMsg	  162
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***************
*** 2693,2698 **** makeEmptyPGconn(void)
--- 2693,2701 ----
  	conn->wait_ssl_try = false;
  #endif
  
+ 	/* set default row processor */
+ 	PQsetRowProcessor(conn, NULL, NULL);
+ 
  	/*
  	 * We try to send at least 8K at a time, which is the usual size of pipe
  	 * buffers on Unix systems.  That way, when we are sending a large amount
***************
*** 2711,2718 **** makeEmptyPGconn(void)
--- 2714,2726 ----
  	initPQExpBuffer(&conn->errorMessage);
  	initPQExpBuffer(&conn->workBuffer);
  
+ 	/* set up initial row buffer */
+ 	conn->rowBufLen = 32;
+ 	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+ 
  	if (conn->inBuffer == NULL ||
  		conn->outBuffer == NULL ||
+ 		conn->rowBuf == NULL ||
  		PQExpBufferBroken(&conn->errorMessage) ||
  		PQExpBufferBroken(&conn->workBuffer))
  	{
***************
*** 2814,2819 **** freePGconn(PGconn *conn)
--- 2822,2829 ----
  		free(conn->inBuffer);
  	if (conn->outBuffer)
  		free(conn->outBuffer);
+ 	if (conn->rowBuf)
+ 		free(conn->rowBuf);
  	termPQExpBuffer(&conn->errorMessage);
  	termPQExpBuffer(&conn->workBuffer);
  
***************
*** 5078,5080 **** PQregisterThreadLock(pgthreadlock_t newhandler)
--- 5088,5091 ----
  
  	return prev;
  }
+ 
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***************
*** 66,71 **** static PGresult *PQexecFinish(PGconn *conn);
--- 66,72 ----
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
+ static int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
  
  
  /* ----------------
***************
*** 160,165 **** PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
--- 161,167 ----
  	result->curBlock = NULL;
  	result->curOffset = 0;
  	result->spaceLeft = 0;
+ 	result->rowProcessorErrMsg = NULL;
  
  	if (conn)
  	{
***************
*** 701,707 **** pqClearAsyncResult(PGconn *conn)
  	if (conn->result)
  		PQclear(conn->result);
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  }
  
  /*
--- 703,708 ----
***************
*** 756,762 **** pqPrepareAsyncResult(PGconn *conn)
  	 */
  	res = conn->result;
  	conn->result = NULL;		/* handing over ownership to caller */
- 	conn->curTuple = NULL;		/* just in case */
  	if (!res)
  		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	else
--- 757,762 ----
***************
*** 828,833 **** pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
--- 828,900 ----
  }
  
  /*
+  * PQsetRowProcessor
+  *   Set function that copies column data out from network buffer.
+  */
+ void
+ PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+ {
+ 	conn->rowProcessor = (func ? func : pqAddRow);
+ 	conn->rowProcessorParam = param;
+ }
+ 
+ /*
+  * PQsetRowProcessorErrMsg
+  *    Set the error message pass back to the caller of RowProcessor.
+  *
+  *    You can replace the previous message by alternative mes, or clear
+  *    it with NULL.
+  */
+ void
+ PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ {
+ 	if (msg)
+ 		res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+ 	else
+ 		res->rowProcessorErrMsg = NULL;
+ }
+ 
+ /*
+  * pqAddRow
+  *	  add a row to the PGresult structure, growing it if necessary
+  *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+  */
+ static int
+ pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+ {
+ 	PGresAttValue *tup;
+ 	int			nfields = res->numAttributes;
+ 	int			i;
+ 
+ 	tup = (PGresAttValue *)
+ 		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+ 	if (tup == NULL)
+ 		return FALSE;
+ 
+ 	for (i = 0 ; i < nfields ; i++)
+ 	{
+ 		tup[i].len = columns[i].len;
+ 		if (tup[i].len == NULL_LEN)
+ 		{
+ 			tup[i].value = res->null_field;
+ 		}
+ 		else
+ 		{
+ 			bool isbinary = (res->attDescs[i].format != 0);
+ 			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+ 			if (tup[i].value == NULL)
+ 				return FALSE;
+ 
+ 			memcpy(tup[i].value, columns[i].value, tup[i].len);
+ 			/* We have to terminate this ourselves */
+ 			tup[i].value[tup[i].len] = '\0';
+ 		}
+ 	}
+ 
+ 	return pqAddTuple(res, tup);
+ }
+ 
+ /*
   * pqAddTuple
   *	  add a row pointer to the PGresult structure, growing it if necessary
   *	  Returns TRUE if OK, FALSE if not enough memory to add the row
***************
*** 1223,1229 **** PQsendQueryStart(PGconn *conn)
  
  	/* initialize async result-accumulation state */
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  
  	/* ready to send command message */
  	return true;
--- 1290,1295 ----
*** a/src/interfaces/libpq/fe-misc.c
--- b/src/interfaces/libpq/fe-misc.c
***************
*** 219,224 **** pqGetnchar(char *s, size_t len, PGconn *conn)
--- 219,243 ----
  }
  
  /*
+  * pqGetnchar:
+  *	skip len bytes in input buffer.
+  */
+ int
+ pqSkipnchar(size_t len, PGconn *conn)
+ {
+ 	if (len > (size_t) (conn->inEnd - conn->inCursor))
+ 		return EOF;
+ 
+ 	conn->inCursor += len;
+ 
+ 	if (conn->Pfdebug)
+ 		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+ 				(unsigned long) len);
+ 
+ 	return 0;
+ }
+ 
+ /*
   * pqPutnchar:
   *	write exactly len bytes to the current message
   */
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
***************
*** 703,721 **** failure:
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
--- 703,720 ----
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
***************
*** 727,754 **** getAnotherTuple(PGconn *conn, bool binary)
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
  	result->binary = binary;
  
! 	/* Allocate tuple space if first time for this data message */
! 	if (conn->curTuple == NULL)
  	{
! 		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
! 		if (conn->curTuple == NULL)
! 			goto outOfMemory;
! 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
! 
! 		/*
! 		 * If it's binary, fix the column format indicators.  We assume the
! 		 * backend will consistently send either B or D, not a mix.
! 		 */
! 		if (binary)
! 		{
! 			for (i = 0; i < nfields; i++)
! 				result->attDescs[i].format = 1;
! 		}
  	}
- 	tup = conn->curTuple;
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
--- 726,752 ----
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
+ 	/* resize row buffer if needed */
+ 	if (nfields > conn->rowBufLen)
+ 	{
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 			goto rowProcessError;
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	}
+ 	else
+ 	{
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
  	result->binary = binary;
  
! 	if (binary)
  	{
! 		for (i = 0; i < nfields; i++)
! 			result->attDescs[i].format = 1;
  	}
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
***************
*** 757,763 **** getAnotherTuple(PGconn *conn, bool binary)
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto outOfMemory;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
--- 755,761 ----
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto rowProcessError;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
***************
*** 771,804 **** getAnotherTuple(PGconn *conn, bool binary)
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 		{
! 			/* if the field value is absent, make it a null string */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 		}
  		else
  		{
- 			/* get the value length (the first four bytes are for length) */
- 			if (pqGetInt(&vlen, 4, conn))
- 				goto EOFexit;
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
- 			if (tup[i].value == NULL)
- 			{
- 				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
- 				if (tup[i].value == NULL)
- 					goto outOfMemory;
- 			}
- 			tup[i].len = vlen;
- 			/* read in the value */
- 			if (vlen > 0)
- 				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
- 					goto EOFexit;
- 			/* we have to terminate this ourselves */
- 			tup[i].value[vlen] = '\0';
  		}
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
--- 769,797 ----
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 			vlen = NULL_LEN;
! 		else if (pqGetInt(&vlen, 4, conn))
! 				goto EOFexit;
  		else
  		{
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
  		}
+ 
+ 		/*
+ 		 * rowbuf[i].value always points to the next address of the
+ 		 * length field even if the value is NULL, to allow safe
+ 		 * size estimates and data copy.
+ 		 */
+ 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+ 		rowbuf[i].len = vlen;
+ 
+ 		/* Skip the value */
+ 		if (vlen > 0 && pqSkipnchar(vlen, conn))
+ 			goto EOFexit;
+ 
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
***************
*** 811,827 **** getAnotherTuple(PGconn *conn, bool binary)
  			bmap <<= 1;
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
  	return 0;
  
! outOfMemory:
  	/* Replace partially constructed result with an error result */
  
  	/*
--- 804,820 ----
  			bmap <<= 1;
  	}
  
! 	/* Success!  Pass the completed row values to rowProcessor */
! 	if (!conn->rowProcessor(result, conn->rowProcessorParam, rowbuf))
! 		goto rowProcessError;
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
+ 
  	return 0;
  
! rowProcessError:
! 
  	/* Replace partially constructed result with an error result */
  
  	/*
***************
*** 829,838 **** outOfMemory:
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  
  	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
--- 822,838 ----
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
  
  	/*
+ 	 * If error message is passed from RowProcessor, set it into
+ 	 * PGconn, assume out of memory if not.
+ 	 */
+ 	appendPQExpBufferStr(&conn->errorMessage,
+ 						 result->rowProcessorErrMsg ?
+ 						 result->rowProcessorErrMsg :
+ 						 libpq_gettext("out of memory for query result\n"));
+ 	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
***************
*** 613,646 **** failure:
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
  
- 	/* Allocate tuple space if first time for this data message */
- 	if (conn->curTuple == NULL)
- 	{
- 		conn->curTuple = (PGresAttValue *)
- 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
- 		if (conn->curTuple == NULL)
- 			goto outOfMemory;
- 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
- 	}
- 	tup = conn->curTuple;
- 
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
  		return EOF;
--- 613,634 ----
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
  
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
  		return EOF;
***************
*** 656,661 **** getAnotherTuple(PGconn *conn, int msgLength)
--- 644,663 ----
  		return 0;
  	}
  
+ 	/* resize row buffer if needed */
+ 	if (nfields > conn->rowBufLen)
+ 	{
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 			goto rowProcessError;
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	}
+ 	else
+ 	{
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
  	/* Scan the fields */
  	for (i = 0; i < nfields; i++)
  	{
***************
*** 663,710 **** getAnotherTuple(PGconn *conn, int msgLength)
  		if (pqGetInt(&vlen, 4, conn))
  			return EOF;
  		if (vlen == -1)
! 		{
! 			/* null field */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 			continue;
! 		}
! 		if (vlen < 0)
  			vlen = 0;
- 		if (tup[i].value == NULL)
- 		{
- 			bool		isbinary = (result->attDescs[i].format != 0);
  
! 			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
! 			if (tup[i].value == NULL)
! 				goto outOfMemory;
! 		}
! 		tup[i].len = vlen;
! 		/* read in the value */
! 		if (vlen > 0)
! 			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
! 				return EOF;
! 		/* we have to terminate this ourselves */
! 		tup[i].value[vlen] = '\0';
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
  	return 0;
  
! outOfMemory:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
--- 665,710 ----
  		if (pqGetInt(&vlen, 4, conn))
  			return EOF;
  		if (vlen == -1)
! 			vlen = NULL_LEN;
! 		else if (vlen < 0)
  			vlen = 0;
  
! 		/*
! 		 * rowbuf[i].value always points to the next address of the
! 		 * length field even if the value is NULL, to allow safe
! 		 * size estimates and data copy.
! 		 */
! 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
! 		rowbuf[i].len = vlen;
! 
! 		/* Skip to the next length field */
! 		if (vlen > 0 && pqSkipnchar(vlen, conn))
! 			return EOF;
  	}
  
! 	/* Success!  Pass the completed row values to rowProcessor */
! 	if (!conn->rowProcessor(result, conn->rowProcessorParam, rowbuf))
! 		goto rowProcessError;
  
  	return 0;
  
! rowProcessError:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	resetPQExpBuffer(&conn->errorMessage);
! 
! 	/*
! 	 * If error message is passed from addTupleFunc, set it into
! 	 * PGconn, assume out of memory if not.
! 	 */
! 	appendPQExpBufferStr(&conn->errorMessage,
! 						 result->rowProcessorErrMsg ?
! 						 result->rowProcessorErrMsg :
! 						 libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
***************
*** 149,154 **** typedef struct pgNotify
--- 149,165 ----
  	struct pgNotify *next;		/* list link */
  } PGnotify;
  
+ /* PGrowValue points a column value of in network buffer.
+  * Value is a string without null termination and length len.
+  * NULL is represented as len < 0, value points then to place
+  * where value would have been.
+  */
+ typedef struct pgRowValue
+ {
+ 	int			len;			/* length in bytes of the value */
+ 	char	   *value;			/* actual value, without null termination */
+ } PGrowValue;
+ 
  /* Function types for notice-handling callbacks */
  typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
  typedef void (*PQnoticeProcessor) (void *arg, const char *message);
***************
*** 416,421 **** extern PGPing PQping(const char *conninfo);
--- 427,463 ----
  extern PGPing PQpingParams(const char *const * keywords,
  			 const char *const * values, int expand_dbname);
  
+ /*
+  * Typedef for alternative row processor.
+  *
+  * Columns array will contain PQnfields() entries, each one
+  * pointing to particular column data in network buffer.
+  * This function is supposed to copy data out from there
+  * and store somewhere.  NULL is signified with len<0.
+  *
+  * This function must return 1 for success and must return 0 for
+  * failure and may set error message by PQsetRowProcessorErrMsg.  It
+  * is assumed by caller as out of memory when the error message is not
+  * set on failure. This function is assumed not to throw any
+  * exception.
+  */
+ typedef int (*PQrowProcessor)(PGresult *res, void *param,
+ 								PGrowValue *columns);
+ 
+ /*
+  * Set alternative row data processor for PGconn.
+  *
+  * By registering this function, pg_result disables its own result
+  * store and calls it for rows one by one.
+  *
+  * func is row processor function. See the typedef RowProcessor.
+  *
+  * rowProcessorParam is the contextual variable that passed to
+  * RowProcessor.
+  */
+ extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+ 								   void *rowProcessorParam);
+ 
  /* Force the write buffer to be written (or at least try) */
  extern int	PQflush(PGconn *conn);
  
***************
*** 454,459 **** extern char *PQcmdTuples(PGresult *res);
--- 496,502 ----
  extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+ extern void	PQsetRowProcessorErrMsg(PGresult *res, char *msg);
  extern int	PQnparams(const PGresult *res);
  extern Oid	PQparamtype(const PGresult *res, int param_num);
  
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
***************
*** 209,214 **** struct pg_result
--- 209,217 ----
  	PGresult_data *curBlock;	/* most recently allocated block */
  	int			curOffset;		/* start offset of free space in block */
  	int			spaceLeft;		/* number of free bytes remaining in block */
+ 
+ 	/* temp etorage for message from row processor callback */
+ 	char	   *rowProcessorErrMsg;
  };
  
  /* PGAsyncStatusType defines the state of the query-execution state machine */
***************
*** 398,404 **** struct pg_conn
  
  	/* Status for asynchronous result construction */
  	PGresult   *result;			/* result being constructed */
- 	PGresAttValue *curTuple;	/* tuple currently being read */
  
  #ifdef USE_SSL
  	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
--- 401,406 ----
***************
*** 443,448 **** struct pg_conn
--- 445,458 ----
  
  	/* Buffer for receiving various parts of messages */
  	PQExpBufferData workBuffer; /* expansible string */
+ 
+ 	/*
+ 	 * Read column data from network buffer.
+ 	 */
+ 	PQrowProcessor rowProcessor;/* Function pointer */
+ 	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+ 	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+ 	int rowBufLen;				/* Number of columns allocated in rowBuf */
  };
  
  /* PGcancel stores all data necessary to cancel a connection. A copy of this
***************
*** 560,565 **** extern int	pqGets(PQExpBuffer buf, PGconn *conn);
--- 570,576 ----
  extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
  extern int	pqPuts(const char *s, PGconn *conn);
  extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+ extern int	pqSkipnchar(size_t len, PGconn *conn);
  extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
  extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
  extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_2012-02-07.difftext/x-diff; charset=us-asciiDownload
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
***************
*** 7233,7238 **** int PQisthreadsafe();
--- 7233,7448 ----
   </sect1>
  
  
+  <sect1 id="libpq-altrowprocessor">
+   <title>Alternative row processor</title>
+ 
+   <indexterm zone="libpq-altrowprocessor">
+    <primary>PGresult</primary>
+    <secondary>PGconn</secondary>
+   </indexterm>
+ 
+   <para>
+    As the standard usage, rows are stored into <type>PQresult</type>
+    until full resultset is received.  Then such completely-filled
+    <type>PQresult</type> is passed to user.  This behaviour can be
+    changed by registering alternative row processor function,
+    that will see each row data as soon as it is received
+    from network.  It has the option of processing the data
+    immediately, or storing it into custom container.
+   </para>
+ 
+   <para>
+    Note - as row processor sees rows as they arrive, it cannot know
+    whether the SQL statement actually finishes successfully on server
+    or not.  So some care must be taken to get proper
+    transactionality.
+   </para>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessor">
+     <term>
+      <function>PQsetRowProcessor</function>
+      <indexterm>
+       <primary>PQsetRowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        Sets a callback function to process each row.
+ <synopsis>
+ void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+ </synopsis>
+      </para>
+      
+      <para>
+        <variablelist>
+ 	 <varlistentry>
+ 	   <term><parameter>conn</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       The connection object to set the row processor function.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>func</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       Storage handler function to set. NULL means to use the
+ 	       default processor.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>param</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       A pointer to contextual parameter passed
+ 	       to <parameter>func</parameter>.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+        </variablelist>
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqrowprocessor">
+     <term>
+      <type>PQrowProcessor</type>
+      <indexterm>
+       <primary>PQrowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        The type for the row processor callback function.
+ <synopsis>
+ int (*PQrowProcessor)(PGresult *res, void *param, PGrowValue *columns);
+ 
+ typedef struct
+ {
+     int         len;            /* length in bytes of the value, -1 if NULL */
+     char       *value;          /* actual value, without null termination */
+ } PGrowValue;
+ </synopsis>
+      </para>
+ 
+      <para>
+       The <parameter>columns</parameter> array will have PQnfields()
+       elements, each one pointing to column value in network buffer.
+       The <parameter>len</parameter> field will contain number of
+       bytes in value.  If the field value is NULL then
+       <parameter>len</parameter> will be -1 and value will point
+       to position where the value would have been in buffer.
+       This allows estimating row size by pointer arithmetic.
+      </para>
+ 
+      <para>
+        This function must process or copy row values away from network
+        buffer before it returns, as next row might overwrite them.
+      </para>
+ 
+      <para>
+        This function must return 1 for success, and 0 for failure.
+        On failure this function should set the error message
+        with <function>PGsetRowProcessorErrMsg</function> if the cause
+        is other than out of memory.  This function must not throw any
+        exception.
+      </para>
+      <variablelist>
+        <varlistentry>
+ 
+ 	 <term><parameter>res</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     A pointer to the <type>PGresult</type> object.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>param</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>columns</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Column values of the row to process.  Column values
+ 	     are located in network buffer, the processor must
+ 	     copy them out from there.
+ 	   </para>
+ 	   <para>
+ 	     Column values are not null-terminated, so processor cannot
+ 	     use C string functions on them directly.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+      </variablelist>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+     <term>
+      <function>PQsetRowProcessorErrMsg</function>
+      <indexterm>
+       <primary>PQsetRowProcessorErrMsg</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+ 	Set the message for the error occurred
+ 	in <type>PQrowProcessor</type>.  If this message is not set, the
+ 	caller assumes the error to be out of memory.
+ <synopsis>
+ void PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ </synopsis>
+       </para>
+       <para>
+ 	<variablelist>
+ 	  <varlistentry>
+ 	    <term><parameter>res</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the <type>PGresult</type> object
+ 		passed to <type>PQrowProcessor</type>.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	  <varlistentry>
+ 	    <term><parameter>msg</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		Error message. This will be copied internally so there is
+ 		no need to care of the scope.
+ 	      </para>
+ 	      <para>
+ 		If <parameter>res</parameter> already has a message previously
+ 		set, it will be overwritten. Set NULL to cancel the the custom
+ 		message.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	</variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </sect1>
+ 
+ 
   <sect1 id="libpq-build">
    <title>Building <application>libpq</application> Programs</title>
  
dblink_rowproc_2012-02-07.difftext/x-diff; charset=us-asciiDownload
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
***************
*** 63,73 **** typedef struct remoteConn
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
- static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 63,86 ----
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
+ typedef struct storeInfo
+ {
+ 	Tuplestorestate *tuplestore;
+ 	int nattrs;
+ 	MemoryContext oldcontext;
+ 	AttInMetadata *attinmeta;
+ 	char** valbuf;
+ 	int *valbuflen;
+ 	char **cstrs;
+ 	bool error_occurred;
+ 	bool nummismatch;
+ 	ErrorData *edata;
+ } storeInfo;
+ 
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
***************
*** 90,95 **** static char *escape_param_str(const char *from);
--- 103,112 ----
  static void validate_pkattnums(Relation rel,
  				   int2vector *pkattnums_arg, int32 pknumatts_arg,
  				   int **pkattnums, int *pknumatts);
+ static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+ static void finishStoreInfo(storeInfo *sinfo);
+ static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+ 
  
  /* Global */
  static remoteConn *pconn = NULL;
***************
*** 503,508 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 520,526 ----
  	char	   *curname = NULL;
  	int			howmany = 0;
  	bool		fail = true;	/* default to backward compatible */
+ 	storeInfo   storeinfo;
  
  	DBLINK_INIT;
  
***************
*** 559,573 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 577,612 ----
  	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
  
  	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
+ 	/*
  	 * Try to execute the query.  Note that since libpq uses malloc, the
  	 * PGresult will be long-lived even though we are still in a short-lived
  	 * memory context.
  	 */
  	res = PQexec(conn, buf.data);
+ 	finishStoreInfo(&storeinfo);
+ 
  	if (!res ||
  		(PQresultStatus(res) != PGRES_COMMAND_OK &&
  		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
+ 		/* finishStoreInfo saves the fields referred to below. */
+ 		if (storeinfo.nummismatch)
+ 		{
+ 			/* This is only for backward compatibility */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_DATATYPE_MISMATCH),
+ 					 errmsg("remote query result rowtype does not match "
+ 							"the specified FROM clause rowtype")));
+ 		}
+ 		else if (storeinfo.edata)
+ 			ReThrowError(storeinfo.edata);
+ 
  		dblink_res_error(conname, res, "could not fetch from cursor", fail);
  		return (Datum) 0;
  	}
***************
*** 579,586 **** dblink_fetch(PG_FUNCTION_ARGS)
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
--- 618,625 ----
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
***************
*** 640,645 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 679,685 ----
  	remoteConn *rconn = NULL;
  	bool		fail = true;	/* default to backward compatible */
  	bool		freeconn = false;
+ 	storeInfo   storeinfo;
  
  	/* check to see if caller supports us returning a tuplestore */
  	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
***************
*** 715,878 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
- 	{
  		res = PQgetResult(conn);
- 		/* NULL means we're all done with the async results */
- 		if (!res)
- 			return (Datum) 0;
- 	}
  
! 	/* if needed, close the connection to the database and cleanup */
! 	if (freeconn)
! 		PQfinish(conn);
  
! 	if (!res ||
! 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
! 		dblink_res_error(conname, res, "could not execute query", fail);
! 		return (Datum) 0;
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
- /*
-  * Materialize the PGresult to return them as the function result.
-  * The res will be released in this function.
-  */
  static void
! materializeResult(FunctionCallInfo fcinfo, PGresult *res)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
  
! 	Assert(rsinfo->returnMode == SFRM_Materialize);
  
! 	PG_TRY();
  	{
! 		TupleDesc	tupdesc;
! 		bool		is_sql_cmd = false;
! 		int			ntuples;
! 		int			nfields;
  
! 		if (PQresultStatus(res) == PGRES_COMMAND_OK)
! 		{
! 			is_sql_cmd = true;
  
! 			/*
! 			 * need a tuple descriptor representing one TEXT column to return
! 			 * the command status string as our result tuple
! 			 */
! 			tupdesc = CreateTemplateTupleDesc(1, false);
! 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
! 							   TEXTOID, -1, 0);
! 			ntuples = 1;
! 			nfields = 1;
! 		}
! 		else
! 		{
! 			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
  
! 			is_sql_cmd = false;
  
! 			/* get a tuple descriptor for our result type */
! 			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 			{
! 				case TYPEFUNC_COMPOSITE:
! 					/* success */
! 					break;
! 				case TYPEFUNC_RECORD:
! 					/* failed to determine actual type of RECORD */
! 					ereport(ERROR,
! 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 						errmsg("function returning record called in context "
! 							   "that cannot accept type record")));
! 					break;
! 				default:
! 					/* result type isn't composite */
! 					elog(ERROR, "return type must be a row type");
! 					break;
! 			}
  
! 			/* make sure we have a persistent copy of the tupdesc */
! 			tupdesc = CreateTupleDescCopy(tupdesc);
! 			ntuples = PQntuples(res);
! 			nfields = PQnfields(res);
  		}
  
! 		/*
! 		 * check result and tuple descriptor have the same number of columns
! 		 */
! 		if (nfields != tupdesc->natts)
! 			ereport(ERROR,
! 					(errcode(ERRCODE_DATATYPE_MISMATCH),
! 					 errmsg("remote query result rowtype does not match "
! 							"the specified FROM clause rowtype")));
  
! 		if (ntuples > 0)
! 		{
! 			AttInMetadata *attinmeta;
! 			Tuplestorestate *tupstore;
! 			MemoryContext oldcontext;
! 			int			row;
! 			char	  **values;
! 
! 			attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 
! 			oldcontext = MemoryContextSwitchTo(
! 									rsinfo->econtext->ecxt_per_query_memory);
! 			tupstore = tuplestore_begin_heap(true, false, work_mem);
! 			rsinfo->setResult = tupstore;
! 			rsinfo->setDesc = tupdesc;
! 			MemoryContextSwitchTo(oldcontext);
  
! 			values = (char **) palloc(nfields * sizeof(char *));
  
! 			/* put all tuples into the tuplestore */
! 			for (row = 0; row < ntuples; row++)
! 			{
! 				HeapTuple	tuple;
  
! 				if (!is_sql_cmd)
! 				{
! 					int			i;
  
! 					for (i = 0; i < nfields; i++)
! 					{
! 						if (PQgetisnull(res, row, i))
! 							values[i] = NULL;
! 						else
! 							values[i] = PQgetvalue(res, row, i);
! 					}
! 				}
! 				else
! 				{
! 					values[0] = PQcmdStatus(res);
! 				}
  
! 				/* build the tuple and put it into the tuplestore. */
! 				tuple = BuildTupleFromCStrings(attinmeta, values);
! 				tuplestore_puttuple(tupstore, tuple);
  			}
  
! 			/* clean up and return the tuplestore */
! 			tuplestore_donestoring(tupstore);
! 		}
  
! 		PQclear(res);
  	}
  	PG_CATCH();
  	{
! 		/* be sure to release the libpq result */
! 		PQclear(res);
! 		PG_RE_THROW();
  	}
  	PG_END_TRY();
  }
  
  /*
--- 755,1006 ----
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
+ 
+ 	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec/PQgetResult below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
  		res = PQgetResult(conn);
  
! 	finishStoreInfo(&storeinfo);
  
! 	/* NULL res from async get means we're all done with the results */
! 	if (res || !is_async)
  	{
! 		if (freeconn)
! 			PQfinish(conn);
! 
! 		if (!res ||
! 			(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 			 PQresultStatus(res) != PGRES_TUPLES_OK))
! 		{
! 			/* finishStoreInfo saves the fields referred to below. */
! 			if (storeinfo.nummismatch)
! 			{
! 				/* This is only for backward compatibility */
! 				ereport(ERROR,
! 						(errcode(ERRCODE_DATATYPE_MISMATCH),
! 						 errmsg("remote query result rowtype does not match "
! 								"the specified FROM clause rowtype")));
! 			}
! 			else if (storeinfo.edata)
! 				ReThrowError(storeinfo.edata);
! 
! 			dblink_res_error(conname, res, "could not execute query", fail);
! 			return (Datum) 0;
! 		}
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
  static void
! initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc	tupdesc;
+ 	int i;
  
! 	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 	{
! 		case TYPEFUNC_COMPOSITE:
! 			/* success */
! 			break;
! 		case TYPEFUNC_RECORD:
! 			/* failed to determine actual type of RECORD */
! 			ereport(ERROR,
! 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 					 errmsg("function returning record called in context "
! 							"that cannot accept type record")));
! 			break;
! 		default:
! 			/* result type isn't composite */
! 			elog(ERROR, "return type must be a row type");
! 			break;
! 	}
  
! 	sinfo->oldcontext = MemoryContextSwitchTo(
! 		rsinfo->econtext->ecxt_per_query_memory);
! 
! 	/* make sure we have a persistent copy of the tupdesc */
! 	tupdesc = CreateTupleDescCopy(tupdesc);
! 
! 	sinfo->error_occurred = FALSE;
! 	sinfo->nummismatch = FALSE;
! 	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 	sinfo->edata = NULL;
! 	sinfo->nattrs = tupdesc->natts;
! 	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
! 	sinfo->valbuf = NULL;
! 	sinfo->valbuflen = NULL;
! 
! 	/* Preallocate memory of same size with c string array for values. */
! 	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
! 	if (sinfo->valbuf)
! 		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
! 	if (sinfo->valbuflen)
! 		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
! 
! 	if (sinfo->cstrs == NULL)
  	{
! 		if (sinfo->valbuf)
! 			free(sinfo->valbuf);
! 		if (sinfo->valbuflen)
! 			free(sinfo->valbuflen);
  
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OUT_OF_MEMORY),
! 				 errmsg("out of memory")));
! 	}
  
! 	for (i = 0 ; i < sinfo->nattrs ; i++)
! 	{
! 		sinfo->valbuf[i] = NULL;
! 		sinfo->valbuflen[i] = -1;
! 	}
  
! 	rsinfo->setResult = sinfo->tuplestore;
! 	rsinfo->setDesc = tupdesc;
! }
  
! static void
! finishStoreInfo(storeInfo *sinfo)
! {
! 	int i;
  
! 	if (sinfo->valbuf)
! 	{
! 		for (i = 0 ; i < sinfo->nattrs ; i++)
! 		{
! 			if (sinfo->valbuf[i])
! 				free(sinfo->valbuf[i]);
  		}
+ 		free(sinfo->valbuf);
+ 		sinfo->valbuf = NULL;
+ 	}
  
! 	if (sinfo->valbuflen)
! 	{
! 		free(sinfo->valbuflen);
! 		sinfo->valbuflen = NULL;
! 	}
  
! 	if (sinfo->cstrs)
! 	{
! 		free(sinfo->cstrs);
! 		sinfo->cstrs = NULL;
! 	}
  
! 	MemoryContextSwitchTo(sinfo->oldcontext);
! }
  
! static int
! storeHandler(PGresult *res, void *param, PGrowValue *columns)
! {
! 	storeInfo *sinfo = (storeInfo *)param;
! 	HeapTuple  tuple;
! 	int        fields = PQnfields(res);
! 	int        i;
! 	char      **cstrs = sinfo->cstrs;
  
! 	if (sinfo->error_occurred)
! 		return FALSE;
  
! 	if (sinfo->nattrs != fields)
! 	{
! 		sinfo->error_occurred = TRUE;
! 		sinfo->nummismatch = TRUE;
! 		finishStoreInfo(sinfo);
! 
! 		/* This error will be processed in
! 		 * dblink_record_internal(). So do not set error message
! 		 * here. */
! 		return FALSE;
! 	}
  
! 	/*
! 	 * value input functions assumes that the input string is
! 	 * terminated by zero. We should make the values to be so.
! 	 */
! 	for(i = 0 ; i < fields ; i++)
! 	{
! 		int len = columns[i].len;
! 		if (len < 0)
! 			cstrs[i] = NULL;
! 		else
! 		{
! 			char *tmp = sinfo->valbuf[i];
! 			int tmplen = sinfo->valbuflen[i];
! 
! 			/*
! 			 * Divide calls to malloc and realloc so that things will
! 			 * go fine even on the systems of which realloc() does not
! 			 * accept NULL as old memory block.
! 			 *
! 			 * Also try to (re)allocate in bigger steps to
! 			 * avoid flood of allocations on weird data.
! 			 */
! 			if (tmp == NULL)
! 			{
! 				tmplen = len + 1;
! 				if (tmplen < 64)
! 					tmplen = 64;
! 				tmp = (char *)malloc(tmplen);
! 			}
! 			else if (tmplen < len + 1)
! 			{
! 				if (len + 1 > tmplen * 2)
! 					tmplen = len + 1;
! 				else
! 					tmplen = tmplen * 2;
! 				tmp = (char *)realloc(tmp, tmplen);
  			}
  
! 			/*
! 			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
! 			 * when realloc returns NULL.
! 			 */
! 			if (tmp == NULL)
! 				return FALSE;
  
! 			sinfo->valbuf[i] = tmp;
! 			sinfo->valbuflen[i] = tmplen;
! 
! 			cstrs[i] = sinfo->valbuf[i];
! 			memcpy(cstrs[i], columns[i].value, len);
! 			cstrs[i][len] = '\0';
! 		}
! 	}
! done:
! 	PG_TRY();
! 	{
! 		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
! 		tuplestore_puttuple(sinfo->tuplestore, tuple);
  	}
  	PG_CATCH();
  	{
! 		MemoryContext context;
! 		/*
! 		 * Store exception for later ReThrow and cancel the exception.
! 		 */
! 		sinfo->error_occurred = TRUE;
! 		context = MemoryContextSwitchTo(sinfo->oldcontext);
! 		sinfo->edata = CopyErrorData();
! 		MemoryContextSwitchTo(context);
! 		FlushErrorState();
! 		return FALSE;
  	}
  	PG_END_TRY();
+ 
+ 	return TRUE;
  }
  
  /*
#43Robert Haas
robertmhaas@gmail.com
In reply to: Marko Kreen (#42)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 7, 2012 at 9:44 AM, Marko Kreen <markokr@gmail.com> wrote:

- What is the right (or recommended) way to prevent from throwing
exceptoin in row-processor callback function?  When author should use
PG_TRY block to catch exception in the callback function?

When it calls backend functions that can throw exceptions?
As all handlers running in backend will want to convert data
to Datums, that means "always wrap handler code in PG_TRY"?

I would hate to have such a rule. PG_TRY isn't free, and it's prone
to subtle bugs, like failing to mark enough stuff in the same function
"volatile".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#44Shigeru Hanada
shigeru.hanada@gmail.com
In reply to: Marko Kreen (#42)
Re: Speed dblink using alternate libpq tuple storage

(2012/02/07 23:44), Marko Kreen wrote:

On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote:

- What is the right (or recommended) way to prevent from throwing
exceptoin in row-processor callback function? When author should use
PG_TRY block to catch exception in the callback function?

When it calls backend functions that can throw exceptions?
As all handlers running in backend will want to convert data
to Datums, that means "always wrap handler code in PG_TRY"?

ISTM that telling a) what happens to PGresult and PGconn when row
processor ends without return, and b) how to recover them would be
necessary. We can't assume much about caller because libpq is a client
library. IMHO, it's most important to tell authors of row processor
clearly what should be done on error case.

In such case, must we call PQfinish, or is calling PQgetResult until it
returns NULL enough to reuse the connection? IIUC calling
pqClearAsyncResult seems sufficient, but it's not exported for client...

Regards,
--
Shigeru Hanada

#45Marko Kreen
markokr@gmail.com
In reply to: Shigeru Hanada (#44)
Re: Speed dblink using alternate libpq tuple storage

On Wed, Feb 08, 2012 at 02:44:13PM +0900, Shigeru Hanada wrote:

(2012/02/07 23:44), Marko Kreen wrote:

On Tue, Feb 07, 2012 at 07:25:14PM +0900, Shigeru Hanada wrote:

- What is the right (or recommended) way to prevent from throwing
exceptoin in row-processor callback function? When author should use
PG_TRY block to catch exception in the callback function?

When it calls backend functions that can throw exceptions?
As all handlers running in backend will want to convert data
to Datums, that means "always wrap handler code in PG_TRY"?

ISTM that telling a) what happens to PGresult and PGconn when row
processor ends without return, and b) how to recover them would be
necessary. We can't assume much about caller because libpq is a client
library. IMHO, it's most important to tell authors of row processor
clearly what should be done on error case.

Yes.

In such case, must we call PQfinish, or is calling PQgetResult until it
returns NULL enough to reuse the connection? IIUC calling
pqClearAsyncResult seems sufficient, but it's not exported for client...

Simply dropping ->result is not useful as there are still rows
coming in. Now you cannot process them anymore.

The rule of "after exception it's valid to close the connection with PQfinish()
or continue processing with PQgetResult()/PQisBusy()/PQconsumeInput()" seems
quite simple and clear. Perhaps only clarification whats valid on sync
connection and whats valid on async connection would be useful.

--
marko

#46Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#42)
4 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 07, 2012 at 04:44:09PM +0200, Marko Kreen wrote:

Although it seems we could allow exceptions, at least when we are
speaking of Postgres backend, as the connection and result are
internally consistent state when the handler is called, and the
partial PGresult is stored under PGconn->result. Even the
connection is in consistent state, as the row packet is
fully processed.

So we have 3 variants, all work, but which one do we want to support?

1) Exceptions are not allowed.
2) Exceptions are allowed, but when it happens, user must call
PQfinish() next, to avoid processing incoming data from
potentially invalid state.
3) Exceptions are allowed, and further row processing can continue
with PQisBusy() / PQgetResult()
3.1) The problematic row is retried. (Current behaviour)
3.2) The problematic row is skipped.

I converted the patch to support 3.2), that is - skip row on exception.
That required some cleanups of getAnotherTuple() API, plus I made some
other cleanups. Details below.

But during this I also started to think what happens if the user does
*not* throw exceptions. Eg. Python does exceptions via regular return
values, which means complications when passing them upwards.

The main case I'm thinking of is actually resultset iterator in high-level
language. Current callback-only style API requires that rows are
stuffed into temporary buffer until the network blocks and user code
gets control again. This feels clumsy for a performance-oriented API.
Another case is user code that wants to do complex processing. Doing
lot of stuff under callback seems dubious. And what if some part of
code calls PQfinish() during some error recovery?

I tried imaging some sort of getFoo() style API for fetching in-flight
row data, but I always ended up with "rewrite libpq" step, so I feel
it's not productive to go there.

Instead I added simple feature: rowProcessor can return '2',
in which case getAnotherTuple() does early exit without setting
any error state. In user side it appears as PQisBusy() returned
with TRUE result. All pointers stay valid, so callback can just
stuff them into some temp area. ATM there is not indication though
whether the exit was due to callback or other reasons, so user
must detect it based on whether new temp pointers appeares,
which means those must be cleaned before calling PQisBusy() again.
This actually feels like feature, those must not stay around
after single call.

It's included in main patch, but I also attached it as separate patch
so that it can be examined separately and reverted if not acceptable.

But as it's really simple, I recommend including it.

It's usage might now be obvious though, should we include
example code in doc?

New feature:

* Row processor can return 2, then PQisBusy() does early exit.
Supportde only when connection is in non-blocking mode.

Cleanups:

* Row data is tagged as processed when rowProcessor is called,
so exceptions will skip the row. This simplifies non-exceptional
handling as well.

* Converted 'return EOF' in V3 getAnotherTuple() to set error instead.
AFAICS those EOFs are remnants from V2 getAnotherTuple()
not something that is coded deliberately. Note that when
v3 getAnotherTuple() is called the row packet is fully in buffer.
The EOF on broken packet to signify 'give me more data' does not
result in any useful behaviour, instead you can simply get
into infinite loop.

Fix bugs in my previous changes:

* Split the OOM error handling from custom error message handling,
previously the error message in PGresult was lost due to OOM logic
early free of PGresult.

* Drop unused goto label from experimental debug code.

--
marko

Attachments:

libpq_rowproc_2012-02-14.difftext/x-diff; charset=us-asciiDownload
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***************
*** 160,162 **** PQconnectStartParams      157
--- 160,164 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
+ PQsetRowProcessor	  161
+ PQsetRowProcessorErrMsg	  162
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***************
*** 2693,2698 **** makeEmptyPGconn(void)
--- 2693,2701 ----
  	conn->wait_ssl_try = false;
  #endif
  
+ 	/* set default row processor */
+ 	PQsetRowProcessor(conn, NULL, NULL);
+ 
  	/*
  	 * We try to send at least 8K at a time, which is the usual size of pipe
  	 * buffers on Unix systems.  That way, when we are sending a large amount
***************
*** 2711,2718 **** makeEmptyPGconn(void)
--- 2714,2726 ----
  	initPQExpBuffer(&conn->errorMessage);
  	initPQExpBuffer(&conn->workBuffer);
  
+ 	/* set up initial row buffer */
+ 	conn->rowBufLen = 32;
+ 	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+ 
  	if (conn->inBuffer == NULL ||
  		conn->outBuffer == NULL ||
+ 		conn->rowBuf == NULL ||
  		PQExpBufferBroken(&conn->errorMessage) ||
  		PQExpBufferBroken(&conn->workBuffer))
  	{
***************
*** 2814,2819 **** freePGconn(PGconn *conn)
--- 2822,2829 ----
  		free(conn->inBuffer);
  	if (conn->outBuffer)
  		free(conn->outBuffer);
+ 	if (conn->rowBuf)
+ 		free(conn->rowBuf);
  	termPQExpBuffer(&conn->errorMessage);
  	termPQExpBuffer(&conn->workBuffer);
  
***************
*** 5078,5080 **** PQregisterThreadLock(pgthreadlock_t newhandler)
--- 5088,5091 ----
  
  	return prev;
  }
+ 
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***************
*** 66,71 **** static PGresult *PQexecFinish(PGconn *conn);
--- 66,72 ----
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
+ static int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
  
  
  /* ----------------
***************
*** 160,165 **** PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
--- 161,167 ----
  	result->curBlock = NULL;
  	result->curOffset = 0;
  	result->spaceLeft = 0;
+ 	result->rowProcessorErrMsg = NULL;
  
  	if (conn)
  	{
***************
*** 701,707 **** pqClearAsyncResult(PGconn *conn)
  	if (conn->result)
  		PQclear(conn->result);
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  }
  
  /*
--- 703,708 ----
***************
*** 756,762 **** pqPrepareAsyncResult(PGconn *conn)
  	 */
  	res = conn->result;
  	conn->result = NULL;		/* handing over ownership to caller */
- 	conn->curTuple = NULL;		/* just in case */
  	if (!res)
  		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	else
--- 757,762 ----
***************
*** 828,833 **** pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
--- 828,900 ----
  }
  
  /*
+  * PQsetRowProcessor
+  *   Set function that copies column data out from network buffer.
+  */
+ void
+ PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+ {
+ 	conn->rowProcessor = (func ? func : pqAddRow);
+ 	conn->rowProcessorParam = param;
+ }
+ 
+ /*
+  * PQsetRowProcessorErrMsg
+  *    Set the error message pass back to the caller of RowProcessor.
+  *
+  *    You can replace the previous message by alternative mes, or clear
+  *    it with NULL.
+  */
+ void
+ PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ {
+ 	if (msg)
+ 		res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+ 	else
+ 		res->rowProcessorErrMsg = NULL;
+ }
+ 
+ /*
+  * pqAddRow
+  *	  add a row to the PGresult structure, growing it if necessary
+  *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+  */
+ static int
+ pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+ {
+ 	PGresAttValue *tup;
+ 	int			nfields = res->numAttributes;
+ 	int			i;
+ 
+ 	tup = (PGresAttValue *)
+ 		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+ 	if (tup == NULL)
+ 		return FALSE;
+ 
+ 	for (i = 0 ; i < nfields ; i++)
+ 	{
+ 		tup[i].len = columns[i].len;
+ 		if (tup[i].len == NULL_LEN)
+ 		{
+ 			tup[i].value = res->null_field;
+ 		}
+ 		else
+ 		{
+ 			bool isbinary = (res->attDescs[i].format != 0);
+ 			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+ 			if (tup[i].value == NULL)
+ 				return FALSE;
+ 
+ 			memcpy(tup[i].value, columns[i].value, tup[i].len);
+ 			/* We have to terminate this ourselves */
+ 			tup[i].value[tup[i].len] = '\0';
+ 		}
+ 	}
+ 
+ 	return pqAddTuple(res, tup);
+ }
+ 
+ /*
   * pqAddTuple
   *	  add a row pointer to the PGresult structure, growing it if necessary
   *	  Returns TRUE if OK, FALSE if not enough memory to add the row
***************
*** 1223,1229 **** PQsendQueryStart(PGconn *conn)
  
  	/* initialize async result-accumulation state */
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  
  	/* ready to send command message */
  	return true;
--- 1290,1295 ----
*** a/src/interfaces/libpq/fe-misc.c
--- b/src/interfaces/libpq/fe-misc.c
***************
*** 219,224 **** pqGetnchar(char *s, size_t len, PGconn *conn)
--- 219,243 ----
  }
  
  /*
+  * pqGetnchar:
+  *	skip len bytes in input buffer.
+  */
+ int
+ pqSkipnchar(size_t len, PGconn *conn)
+ {
+ 	if (len > (size_t) (conn->inEnd - conn->inCursor))
+ 		return EOF;
+ 
+ 	conn->inCursor += len;
+ 
+ 	if (conn->Pfdebug)
+ 		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+ 				(unsigned long) len);
+ 
+ 	return 0;
+ }
+ 
+ /*
   * pqPutnchar:
   *	write exactly len bytes to the current message
   */
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
***************
*** 569,574 **** pqParseInput2(PGconn *conn)
--- 569,576 ----
  						/* Read another tuple of a normal query response */
  						if (getAnotherTuple(conn, FALSE))
  							return;
+ 						/* getAnotherTuple moves inStart itself */
+ 						continue;
  					}
  					else
  					{
***************
*** 585,590 **** pqParseInput2(PGconn *conn)
--- 587,594 ----
  						/* Read another tuple of a normal query response */
  						if (getAnotherTuple(conn, TRUE))
  							return;
+ 						/* getAnotherTuple moves inStart itself */
+ 						continue;
  					}
  					else
  					{
***************
*** 703,754 **** failure:
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
  	char	   *bitmap = std_bitmap;
  	int			i;
  	size_t		nbytes;			/* the number of bytes in bitmap  */
  	char		bmap;			/* One byte of the bitmap */
  	int			bitmap_index;	/* Its index */
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
  	result->binary = binary;
  
! 	/* Allocate tuple space if first time for this data message */
! 	if (conn->curTuple == NULL)
  	{
! 		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
! 		if (conn->curTuple == NULL)
! 			goto outOfMemory;
! 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
! 
! 		/*
! 		 * If it's binary, fix the column format indicators.  We assume the
! 		 * backend will consistently send either B or D, not a mix.
! 		 */
! 		if (binary)
! 		{
! 			for (i = 0; i < nfields; i++)
! 				result->attDescs[i].format = 1;
! 		}
  	}
- 	tup = conn->curTuple;
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
--- 707,757 ----
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
  	char	   *bitmap = std_bitmap;
  	int			i;
+ 	int			rp;
  	size_t		nbytes;			/* the number of bytes in bitmap  */
  	char		bmap;			/* One byte of the bitmap */
  	int			bitmap_index;	/* Its index */
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
+ 	/* resize row buffer if needed */
+ 	if (nfields > conn->rowBufLen)
+ 	{
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 			goto rowProcessError;
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	}
+ 	else
+ 	{
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
  	result->binary = binary;
  
! 	if (binary)
  	{
! 		for (i = 0; i < nfields; i++)
! 			result->attDescs[i].format = 1;
  	}
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
***************
*** 757,763 **** getAnotherTuple(PGconn *conn, bool binary)
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto outOfMemory;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
--- 760,766 ----
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto rowProcessError;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
***************
*** 771,804 **** getAnotherTuple(PGconn *conn, bool binary)
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 		{
! 			/* if the field value is absent, make it a null string */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 		}
  		else
  		{
- 			/* get the value length (the first four bytes are for length) */
- 			if (pqGetInt(&vlen, 4, conn))
- 				goto EOFexit;
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
- 			if (tup[i].value == NULL)
- 			{
- 				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
- 				if (tup[i].value == NULL)
- 					goto outOfMemory;
- 			}
- 			tup[i].len = vlen;
- 			/* read in the value */
- 			if (vlen > 0)
- 				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
- 					goto EOFexit;
- 			/* we have to terminate this ourselves */
- 			tup[i].value[vlen] = '\0';
  		}
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
--- 774,802 ----
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 			vlen = NULL_LEN;
! 		else if (pqGetInt(&vlen, 4, conn))
! 				goto EOFexit;
  		else
  		{
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
  		}
+ 
+ 		/*
+ 		 * rowbuf[i].value always points to the next address of the
+ 		 * length field even if the value is NULL, to allow safe
+ 		 * size estimates and data copy.
+ 		 */
+ 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+ 		rowbuf[i].len = vlen;
+ 
+ 		/* Skip the value */
+ 		if (vlen > 0 && pqSkipnchar(vlen, conn))
+ 			goto EOFexit;
+ 
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
***************
*** 811,843 **** getAnotherTuple(PGconn *conn, bool binary)
  			bmap <<= 1;
  	}
  
- 	/* Success!  Store the completed tuple in the result */
- 	if (!pqAddTuple(result, tup))
- 		goto outOfMemory;
- 	/* and reset for a new message */
- 	conn->curTuple = NULL;
- 
  	if (bitmap != std_bitmap)
  		free(bitmap);
! 	return 0;
  
- outOfMemory:
  	/* Replace partially constructed result with an error result */
  
! 	/*
! 	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
! 	 * there's not enough memory to concatenate messages...
! 	 */
! 	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  
! 	/*
! 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
! 	 * do to recover...
! 	 */
! 	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	conn->asyncStatus = PGASYNC_READY;
  	/* Discard the failed message --- good idea? */
  	conn->inStart = conn->inEnd;
  
--- 809,864 ----
  			bmap <<= 1;
  	}
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
! 	bitmap = NULL;
! 
! 	/* tag the row as parsed */
! 	conn->inStart = conn->inCursor;
! 
! 	/* Pass the completed row values to rowProcessor */
! 	rp= conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
! 	if (rp == 1)
! 		return 0;
! 	else if (rp == 2 && pqIsnonblocking(conn))
! 		/* processor requested early exit */
! 		return EOF;
! 	else if (rp != 0)
! 		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
! 
! rowProcessError:
  
  	/* Replace partially constructed result with an error result */
  
! 	if (result->rowProcessorErrMsg)
! 	{
! 		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
! 		pqSaveErrorResult(conn);
! 	}
! 	else
! 	{
! 		/*
! 		 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
! 		 * there's not enough memory to concatenate messages...
! 		 */
! 		pqClearAsyncResult(conn);
! 		resetPQExpBuffer(&conn->errorMessage);
  
! 		/*
! 		 * If error message is passed from RowProcessor, set it into
! 		 * PGconn, assume out of memory if not.
! 		 */
! 		appendPQExpBufferStr(&conn->errorMessage,
! 							 libpq_gettext("out of memory for query result\n"));
! 
! 		/*
! 		 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
! 		 * do to recover...
! 		 */
! 		conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
! 	}
  	conn->asyncStatus = PGASYNC_READY;
+ 
  	/* Discard the failed message --- good idea? */
  	conn->inStart = conn->inEnd;
  
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
***************
*** 327,332 **** pqParseInput3(PGconn *conn)
--- 327,335 ----
  						/* Read another tuple of a normal query response */
  						if (getAnotherTuple(conn, msgLength))
  							return;
+ 
+ 						/* getAnotherTuple() moves inStart itself */
+ 						continue;
  					}
  					else if (conn->result != NULL &&
  							 conn->result->resultStatus == PGRES_FATAL_ERROR)
***************
*** 613,645 **** failure:
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
! 
! 	/* Allocate tuple space if first time for this data message */
! 	if (conn->curTuple == NULL)
! 	{
! 		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
! 		if (conn->curTuple == NULL)
! 			goto outOfMemory;
! 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
! 	}
! 	tup = conn->curTuple;
  
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
--- 616,637 ----
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
! 	int			rp;
  
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
***************
*** 652,703 **** getAnotherTuple(PGconn *conn, int msgLength)
  				 libpq_gettext("unexpected field count in \"D\" message\n"));
  		pqSaveErrorResult(conn);
  		/* Discard the failed message by pretending we read it */
! 		conn->inCursor = conn->inStart + 5 + msgLength;
  		return 0;
  	}
  
  	/* Scan the fields */
  	for (i = 0; i < nfields; i++)
  	{
  		/* get the value length */
  		if (pqGetInt(&vlen, 4, conn))
! 			return EOF;
  		if (vlen == -1)
! 		{
! 			/* null field */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 			continue;
! 		}
! 		if (vlen < 0)
  			vlen = 0;
- 		if (tup[i].value == NULL)
- 		{
- 			bool		isbinary = (result->attDescs[i].format != 0);
  
! 			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
! 			if (tup[i].value == NULL)
! 				goto outOfMemory;
! 		}
! 		tup[i].len = vlen;
! 		/* read in the value */
! 		if (vlen > 0)
! 			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
! 				return EOF;
! 		/* we have to terminate this ourselves */
! 		tup[i].value[vlen] = '\0';
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
  	return 0;
  
! outOfMemory:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
--- 644,731 ----
  				 libpq_gettext("unexpected field count in \"D\" message\n"));
  		pqSaveErrorResult(conn);
  		/* Discard the failed message by pretending we read it */
! 		conn->inStart += 5 + msgLength;
  		return 0;
  	}
  
+ 	/* resize row buffer if needed */
+ 	rowbuf = conn->rowBuf;
+ 	if (nfields > conn->rowBufLen)
+ 	{
+ 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+ 		if (!rowbuf)
+ 		{
+ 			goto outOfMemory1;
+ 		}
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	}
+ 
  	/* Scan the fields */
  	for (i = 0; i < nfields; i++)
  	{
  		/* get the value length */
  		if (pqGetInt(&vlen, 4, conn))
! 			goto protocolError;
  		if (vlen == -1)
! 			vlen = NULL_LEN;
! 		else if (vlen < 0)
  			vlen = 0;
  
! 		/*
! 		 * rowbuf[i].value always points to the next address of the
! 		 * length field even if the value is NULL, to allow safe
! 		 * size estimates and data copy.
! 		 */
! 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
! 		rowbuf[i].len = vlen;
! 
! 		/* Skip to the next length field */
! 		if (vlen > 0 && pqSkipnchar(vlen, conn))
! 			goto protocolError;
  	}
  
! 	/* tag the row as parsed, check if correctly */
! 	conn->inStart += 5 + msgLength;
! 	if (conn->inCursor != conn->inStart)
! 		goto protocolError;
  
+ 	/* Pass the completed row values to rowProcessor */
+ 	rp = conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
+ 	if (rp == 1)
+ 	{
+ 		/* everything is good */
+ 		return 0;
+ 	}
+ 	if (rp == 2 && pqIsnonblocking(conn))
+ 	{
+ 		/* processor requested early exit */
+ 		return EOF;
+ 	}
+ 
+ 	/* there was some problem */
+ 	if (rp == 0)
+ 	{
+ 		if (result->rowProcessorErrMsg == NULL)
+ 			goto outOfMemory2;
+ 
+ 		/* use supplied error message */
+ 		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+ 	}
+ 	else
+ 	{
+ 		/* row processor messed up */
+ 		printfPQExpBuffer(&conn->errorMessage,
+ 						  libpq_gettext("invalid return value from row processor\n"));
+ 	}
+ 	pqSaveErrorResult(conn);
  	return 0;
  
! outOfMemory1:
! 	/* Discard the failed message by pretending we read it */
! 	conn->inStart += 5 + msgLength;
  
+ outOfMemory2:
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
***************
*** 706,714 **** outOfMemory:
  	printfPQExpBuffer(&conn->errorMessage,
  					  libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
  
  	/* Discard the failed message by pretending we read it */
! 	conn->inCursor = conn->inStart + 5 + msgLength;
  	return 0;
  }
  
--- 734,747 ----
  	printfPQExpBuffer(&conn->errorMessage,
  					  libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
+ 	return 0;
  
+ protocolError:
+ 	printfPQExpBuffer(&conn->errorMessage,
+ 					  libpq_gettext("invalid row contents\n"));
+ 	pqSaveErrorResult(conn);
  	/* Discard the failed message by pretending we read it */
! 	conn->inStart += 5 + msgLength;
  	return 0;
  }
  
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
***************
*** 149,154 **** typedef struct pgNotify
--- 149,165 ----
  	struct pgNotify *next;		/* list link */
  } PGnotify;
  
+ /* PGrowValue points a column value of in network buffer.
+  * Value is a string without null termination and length len.
+  * NULL is represented as len < 0, value points then to place
+  * where value would have been.
+  */
+ typedef struct pgRowValue
+ {
+ 	int			len;			/* length in bytes of the value */
+ 	char	   *value;			/* actual value, without null termination */
+ } PGrowValue;
+ 
  /* Function types for notice-handling callbacks */
  typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
  typedef void (*PQnoticeProcessor) (void *arg, const char *message);
***************
*** 416,421 **** extern PGPing PQping(const char *conninfo);
--- 427,463 ----
  extern PGPing PQpingParams(const char *const * keywords,
  			 const char *const * values, int expand_dbname);
  
+ /*
+  * Typedef for alternative row processor.
+  *
+  * Columns array will contain PQnfields() entries, each one
+  * pointing to particular column data in network buffer.
+  * This function is supposed to copy data out from there
+  * and store somewhere.  NULL is signified with len<0.
+  *
+  * This function must return 1 for success and must return 0 for
+  * failure and may set error message by PQsetRowProcessorErrMsg.  It
+  * is assumed by caller as out of memory when the error message is not
+  * set on failure. This function is assumed not to throw any
+  * exception.
+  */
+ typedef int (*PQrowProcessor)(PGresult *res, void *param,
+ 								PGrowValue *columns);
+ 
+ /*
+  * Set alternative row data processor for PGconn.
+  *
+  * By registering this function, pg_result disables its own result
+  * store and calls it for rows one by one.
+  *
+  * func is row processor function. See the typedef RowProcessor.
+  *
+  * rowProcessorParam is the contextual variable that passed to
+  * RowProcessor.
+  */
+ extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+ 								   void *rowProcessorParam);
+ 
  /* Force the write buffer to be written (or at least try) */
  extern int	PQflush(PGconn *conn);
  
***************
*** 454,459 **** extern char *PQcmdTuples(PGresult *res);
--- 496,502 ----
  extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
  extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+ extern void	PQsetRowProcessorErrMsg(PGresult *res, char *msg);
  extern int	PQnparams(const PGresult *res);
  extern Oid	PQparamtype(const PGresult *res, int param_num);
  
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
***************
*** 209,214 **** struct pg_result
--- 209,217 ----
  	PGresult_data *curBlock;	/* most recently allocated block */
  	int			curOffset;		/* start offset of free space in block */
  	int			spaceLeft;		/* number of free bytes remaining in block */
+ 
+ 	/* temp etorage for message from row processor callback */
+ 	char	   *rowProcessorErrMsg;
  };
  
  /* PGAsyncStatusType defines the state of the query-execution state machine */
***************
*** 398,404 **** struct pg_conn
  
  	/* Status for asynchronous result construction */
  	PGresult   *result;			/* result being constructed */
- 	PGresAttValue *curTuple;	/* tuple currently being read */
  
  #ifdef USE_SSL
  	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
--- 401,406 ----
***************
*** 443,448 **** struct pg_conn
--- 445,458 ----
  
  	/* Buffer for receiving various parts of messages */
  	PQExpBufferData workBuffer; /* expansible string */
+ 
+ 	/*
+ 	 * Read column data from network buffer.
+ 	 */
+ 	PQrowProcessor rowProcessor;/* Function pointer */
+ 	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+ 	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+ 	int rowBufLen;				/* Number of columns allocated in rowBuf */
  };
  
  /* PGcancel stores all data necessary to cancel a connection. A copy of this
***************
*** 560,565 **** extern int	pqGets(PQExpBuffer buf, PGconn *conn);
--- 570,576 ----
  extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
  extern int	pqPuts(const char *s, PGconn *conn);
  extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+ extern int	pqSkipnchar(size_t len, PGconn *conn);
  extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
  extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
  extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_2012-02-14.difftext/x-diff; charset=us-asciiDownload
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
***************
*** 7233,7238 **** int PQisthreadsafe();
--- 7233,7467 ----
   </sect1>
  
  
+  <sect1 id="libpq-altrowprocessor">
+   <title>Alternative row processor</title>
+ 
+   <indexterm zone="libpq-altrowprocessor">
+    <primary>PGresult</primary>
+    <secondary>PGconn</secondary>
+   </indexterm>
+ 
+   <para>
+    As the standard usage, rows are stored into <type>PQresult</type>
+    until full resultset is received.  Then such completely-filled
+    <type>PQresult</type> is passed to user.  This behaviour can be
+    changed by registering alternative row processor function,
+    that will see each row data as soon as it is received
+    from network.  It has the option of processing the data
+    immediately, or storing it into custom container.
+   </para>
+ 
+   <para>
+    Note - as row processor sees rows as they arrive, it cannot know
+    whether the SQL statement actually finishes successfully on server
+    or not.  So some care must be taken to get proper
+    transactionality.
+   </para>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessor">
+     <term>
+      <function>PQsetRowProcessor</function>
+      <indexterm>
+       <primary>PQsetRowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        Sets a callback function to process each row.
+ <synopsis>
+ void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+ </synopsis>
+      </para>
+      
+      <para>
+        <variablelist>
+ 	 <varlistentry>
+ 	   <term><parameter>conn</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       The connection object to set the row processor function.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>func</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       Storage handler function to set. NULL means to use the
+ 	       default processor.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+ 	 <varlistentry>
+ 	   <term><parameter>param</parameter></term>
+ 	   <listitem>
+ 	     <para>
+ 	       A pointer to contextual parameter passed
+ 	       to <parameter>func</parameter>.
+ 	     </para>
+ 	   </listitem>
+ 	 </varlistentry>
+        </variablelist>
+      </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqrowprocessor">
+     <term>
+      <type>PQrowProcessor</type>
+      <indexterm>
+       <primary>PQrowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        The type for the row processor callback function.
+ <synopsis>
+ int (*PQrowProcessor)(PGresult *res, void *param, PGrowValue *columns);
+ 
+ typedef struct
+ {
+     int         len;            /* length in bytes of the value, -1 if NULL */
+     char       *value;          /* actual value, without null termination */
+ } PGrowValue;
+ </synopsis>
+      </para>
+ 
+      <para>
+       The <parameter>columns</parameter> array will have PQnfields()
+       elements, each one pointing to column value in network buffer.
+       The <parameter>len</parameter> field will contain number of
+       bytes in value.  If the field value is NULL then
+       <parameter>len</parameter> will be -1 and value will point
+       to position where the value would have been in buffer.
+       This allows estimating row size by pointer arithmetic.
+      </para>
+ 
+      <para>
+        This function must process or copy row values away from network
+        buffer before it returns, as next row might overwrite them.
+      </para>
+ 
+      <para>
+        This function must return 1 for success, and 0 for failure.
+        On failure this function should set the error message
+        with <function>PGsetRowProcessorErrMsg</function> if the cause
+        is other than out of memory.
+        When non-blocking API is in use, it can also return 2
+        for early exit from <function>PQisBusy</function> function.
+        The supplied <parameter>res</parameter> and <parameter>columns</parameter>
+        values will stay valid so row can be processed outside of callback.
+        Caller is resposible for tracking whether the <parameter>PQisBusy</parameter>
+        returned early from callback or for other reasons.
+        Usually this should happen via setting cached values to NULL
+        before calling <function>PQisBusy</function>.
+      </para>
+ 
+      <para>
+        The function is allowed to exit via exception (setjmp/longjmp).
+        The connection and row are guaranteed to be in valid state.
+        The connection can later be closed via <function>PQfinish</function>.
+        Processing can also be continued without closing the connection,
+        call <function>getResult</function> on syncronous mode,
+        <function>PQisBusy</function> on asynchronous connection.
+        Then processing will continue with new row, previous row
+        that got exception will be skipped.
+      </para>
+ 
+      <variablelist>
+        <varlistentry>
+ 
+ 	 <term><parameter>res</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     A pointer to the <type>PGresult</type> object.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>param</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+        <varlistentry>
+ 
+ 	 <term><parameter>columns</parameter></term>
+ 	 <listitem>
+ 	   <para>
+ 	     Column values of the row to process.  Column values
+ 	     are located in network buffer, the processor must
+ 	     copy them out from there.
+ 	   </para>
+ 	   <para>
+ 	     Column values are not null-terminated, so processor cannot
+ 	     use C string functions on them directly.
+ 	   </para>
+ 	 </listitem>
+        </varlistentry>
+      </variablelist>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+     <term>
+      <function>PQsetRowProcessorErrMsg</function>
+      <indexterm>
+       <primary>PQsetRowProcessorErrMsg</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+ 	Set the message for the error occurred
+ 	in <type>PQrowProcessor</type>.  If this message is not set, the
+ 	caller assumes the error to be out of memory.
+ <synopsis>
+ void PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+ </synopsis>
+       </para>
+       <para>
+ 	<variablelist>
+ 	  <varlistentry>
+ 	    <term><parameter>res</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		A pointer to the <type>PGresult</type> object
+ 		passed to <type>PQrowProcessor</type>.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	  <varlistentry>
+ 	    <term><parameter>msg</parameter></term>
+ 	    <listitem>
+ 	      <para>
+ 		Error message. This will be copied internally so there is
+ 		no need to care of the scope.
+ 	      </para>
+ 	      <para>
+ 		If <parameter>res</parameter> already has a message previously
+ 		set, it will be overwritten. Set NULL to cancel the the custom
+ 		message.
+ 	      </para>
+ 	    </listitem>
+ 	  </varlistentry>
+ 	</variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </sect1>
+ 
+ 
   <sect1 id="libpq-build">
    <title>Building <application>libpq</application> Programs</title>
  
dblink_rowproc_2012-02-14.difftext/x-diff; charset=us-asciiDownload
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
***************
*** 63,73 **** typedef struct remoteConn
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
- static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 63,86 ----
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
+ typedef struct storeInfo
+ {
+ 	Tuplestorestate *tuplestore;
+ 	int nattrs;
+ 	MemoryContext oldcontext;
+ 	AttInMetadata *attinmeta;
+ 	char** valbuf;
+ 	int *valbuflen;
+ 	char **cstrs;
+ 	bool error_occurred;
+ 	bool nummismatch;
+ 	ErrorData *edata;
+ } storeInfo;
+ 
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
***************
*** 90,95 **** static char *escape_param_str(const char *from);
--- 103,112 ----
  static void validate_pkattnums(Relation rel,
  				   int2vector *pkattnums_arg, int32 pknumatts_arg,
  				   int **pkattnums, int *pknumatts);
+ static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+ static void finishStoreInfo(storeInfo *sinfo);
+ static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+ 
  
  /* Global */
  static remoteConn *pconn = NULL;
***************
*** 503,508 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 520,526 ----
  	char	   *curname = NULL;
  	int			howmany = 0;
  	bool		fail = true;	/* default to backward compatible */
+ 	storeInfo   storeinfo;
  
  	DBLINK_INIT;
  
***************
*** 559,573 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 577,612 ----
  	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
  
  	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
+ 	/*
  	 * Try to execute the query.  Note that since libpq uses malloc, the
  	 * PGresult will be long-lived even though we are still in a short-lived
  	 * memory context.
  	 */
  	res = PQexec(conn, buf.data);
+ 	finishStoreInfo(&storeinfo);
+ 
  	if (!res ||
  		(PQresultStatus(res) != PGRES_COMMAND_OK &&
  		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
+ 		/* finishStoreInfo saves the fields referred to below. */
+ 		if (storeinfo.nummismatch)
+ 		{
+ 			/* This is only for backward compatibility */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_DATATYPE_MISMATCH),
+ 					 errmsg("remote query result rowtype does not match "
+ 							"the specified FROM clause rowtype")));
+ 		}
+ 		else if (storeinfo.edata)
+ 			ReThrowError(storeinfo.edata);
+ 
  		dblink_res_error(conname, res, "could not fetch from cursor", fail);
  		return (Datum) 0;
  	}
***************
*** 579,586 **** dblink_fetch(PG_FUNCTION_ARGS)
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
--- 618,625 ----
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
***************
*** 640,645 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 679,685 ----
  	remoteConn *rconn = NULL;
  	bool		fail = true;	/* default to backward compatible */
  	bool		freeconn = false;
+ 	storeInfo   storeinfo;
  
  	/* check to see if caller supports us returning a tuplestore */
  	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
***************
*** 715,878 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
- 	{
  		res = PQgetResult(conn);
- 		/* NULL means we're all done with the async results */
- 		if (!res)
- 			return (Datum) 0;
- 	}
  
! 	/* if needed, close the connection to the database and cleanup */
! 	if (freeconn)
! 		PQfinish(conn);
  
! 	if (!res ||
! 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
! 		dblink_res_error(conname, res, "could not execute query", fail);
! 		return (Datum) 0;
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
- /*
-  * Materialize the PGresult to return them as the function result.
-  * The res will be released in this function.
-  */
  static void
! materializeResult(FunctionCallInfo fcinfo, PGresult *res)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
  
! 	Assert(rsinfo->returnMode == SFRM_Materialize);
  
! 	PG_TRY();
  	{
! 		TupleDesc	tupdesc;
! 		bool		is_sql_cmd = false;
! 		int			ntuples;
! 		int			nfields;
  
! 		if (PQresultStatus(res) == PGRES_COMMAND_OK)
! 		{
! 			is_sql_cmd = true;
  
! 			/*
! 			 * need a tuple descriptor representing one TEXT column to return
! 			 * the command status string as our result tuple
! 			 */
! 			tupdesc = CreateTemplateTupleDesc(1, false);
! 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
! 							   TEXTOID, -1, 0);
! 			ntuples = 1;
! 			nfields = 1;
! 		}
! 		else
! 		{
! 			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
  
! 			is_sql_cmd = false;
  
! 			/* get a tuple descriptor for our result type */
! 			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 			{
! 				case TYPEFUNC_COMPOSITE:
! 					/* success */
! 					break;
! 				case TYPEFUNC_RECORD:
! 					/* failed to determine actual type of RECORD */
! 					ereport(ERROR,
! 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 						errmsg("function returning record called in context "
! 							   "that cannot accept type record")));
! 					break;
! 				default:
! 					/* result type isn't composite */
! 					elog(ERROR, "return type must be a row type");
! 					break;
! 			}
  
! 			/* make sure we have a persistent copy of the tupdesc */
! 			tupdesc = CreateTupleDescCopy(tupdesc);
! 			ntuples = PQntuples(res);
! 			nfields = PQnfields(res);
  		}
  
! 		/*
! 		 * check result and tuple descriptor have the same number of columns
! 		 */
! 		if (nfields != tupdesc->natts)
! 			ereport(ERROR,
! 					(errcode(ERRCODE_DATATYPE_MISMATCH),
! 					 errmsg("remote query result rowtype does not match "
! 							"the specified FROM clause rowtype")));
  
! 		if (ntuples > 0)
! 		{
! 			AttInMetadata *attinmeta;
! 			Tuplestorestate *tupstore;
! 			MemoryContext oldcontext;
! 			int			row;
! 			char	  **values;
! 
! 			attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 
! 			oldcontext = MemoryContextSwitchTo(
! 									rsinfo->econtext->ecxt_per_query_memory);
! 			tupstore = tuplestore_begin_heap(true, false, work_mem);
! 			rsinfo->setResult = tupstore;
! 			rsinfo->setDesc = tupdesc;
! 			MemoryContextSwitchTo(oldcontext);
  
! 			values = (char **) palloc(nfields * sizeof(char *));
  
! 			/* put all tuples into the tuplestore */
! 			for (row = 0; row < ntuples; row++)
! 			{
! 				HeapTuple	tuple;
  
! 				if (!is_sql_cmd)
! 				{
! 					int			i;
  
! 					for (i = 0; i < nfields; i++)
! 					{
! 						if (PQgetisnull(res, row, i))
! 							values[i] = NULL;
! 						else
! 							values[i] = PQgetvalue(res, row, i);
! 					}
! 				}
! 				else
! 				{
! 					values[0] = PQcmdStatus(res);
! 				}
  
! 				/* build the tuple and put it into the tuplestore. */
! 				tuple = BuildTupleFromCStrings(attinmeta, values);
! 				tuplestore_puttuple(tupstore, tuple);
  			}
  
! 			/* clean up and return the tuplestore */
! 			tuplestore_donestoring(tupstore);
  		}
  
! 		PQclear(res);
  	}
  	PG_CATCH();
  	{
! 		/* be sure to release the libpq result */
! 		PQclear(res);
! 		PG_RE_THROW();
  	}
  	PG_END_TRY();
  }
  
  /*
--- 755,1006 ----
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
+ 
+ 	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec/PQgetResult below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
  	/* synchronous query, or async result retrieval */
  	if (!is_async)
  		res = PQexec(conn, sql);
  	else
  		res = PQgetResult(conn);
  
! 	finishStoreInfo(&storeinfo);
  
! 	/* NULL res from async get means we're all done with the results */
! 	if (res || !is_async)
  	{
! 		if (freeconn)
! 			PQfinish(conn);
! 
! 		if (!res ||
! 			(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 			 PQresultStatus(res) != PGRES_TUPLES_OK))
! 		{
! 			/* finishStoreInfo saves the fields referred to below. */
! 			if (storeinfo.nummismatch)
! 			{
! 				/* This is only for backward compatibility */
! 				ereport(ERROR,
! 						(errcode(ERRCODE_DATATYPE_MISMATCH),
! 						 errmsg("remote query result rowtype does not match "
! 								"the specified FROM clause rowtype")));
! 			}
! 			else if (storeinfo.edata)
! 				ReThrowError(storeinfo.edata);
! 
! 			dblink_res_error(conname, res, "could not execute query", fail);
! 			return (Datum) 0;
! 		}
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
  static void
! initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc	tupdesc;
+ 	int i;
  
! 	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 	{
! 		case TYPEFUNC_COMPOSITE:
! 			/* success */
! 			break;
! 		case TYPEFUNC_RECORD:
! 			/* failed to determine actual type of RECORD */
! 			ereport(ERROR,
! 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 					 errmsg("function returning record called in context "
! 							"that cannot accept type record")));
! 			break;
! 		default:
! 			/* result type isn't composite */
! 			elog(ERROR, "return type must be a row type");
! 			break;
! 	}
  
! 	sinfo->oldcontext = MemoryContextSwitchTo(
! 		rsinfo->econtext->ecxt_per_query_memory);
! 
! 	/* make sure we have a persistent copy of the tupdesc */
! 	tupdesc = CreateTupleDescCopy(tupdesc);
! 
! 	sinfo->error_occurred = FALSE;
! 	sinfo->nummismatch = FALSE;
! 	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 	sinfo->edata = NULL;
! 	sinfo->nattrs = tupdesc->natts;
! 	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
! 	sinfo->valbuf = NULL;
! 	sinfo->valbuflen = NULL;
! 
! 	/* Preallocate memory of same size with c string array for values. */
! 	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
! 	if (sinfo->valbuf)
! 		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
! 	if (sinfo->valbuflen)
! 		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
! 
! 	if (sinfo->cstrs == NULL)
  	{
! 		if (sinfo->valbuf)
! 			free(sinfo->valbuf);
! 		if (sinfo->valbuflen)
! 			free(sinfo->valbuflen);
  
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OUT_OF_MEMORY),
! 				 errmsg("out of memory")));
! 	}
  
! 	for (i = 0 ; i < sinfo->nattrs ; i++)
! 	{
! 		sinfo->valbuf[i] = NULL;
! 		sinfo->valbuflen[i] = -1;
! 	}
  
! 	rsinfo->setResult = sinfo->tuplestore;
! 	rsinfo->setDesc = tupdesc;
! }
  
! static void
! finishStoreInfo(storeInfo *sinfo)
! {
! 	int i;
  
! 	if (sinfo->valbuf)
! 	{
! 		for (i = 0 ; i < sinfo->nattrs ; i++)
! 		{
! 			if (sinfo->valbuf[i])
! 				free(sinfo->valbuf[i]);
  		}
+ 		free(sinfo->valbuf);
+ 		sinfo->valbuf = NULL;
+ 	}
  
! 	if (sinfo->valbuflen)
! 	{
! 		free(sinfo->valbuflen);
! 		sinfo->valbuflen = NULL;
! 	}
  
! 	if (sinfo->cstrs)
! 	{
! 		free(sinfo->cstrs);
! 		sinfo->cstrs = NULL;
! 	}
  
! 	MemoryContextSwitchTo(sinfo->oldcontext);
! }
  
! static int
! storeHandler(PGresult *res, void *param, PGrowValue *columns)
! {
! 	storeInfo *sinfo = (storeInfo *)param;
! 	HeapTuple  tuple;
! 	int        fields = PQnfields(res);
! 	int        i;
! 	char      **cstrs = sinfo->cstrs;
  
! 	if (sinfo->error_occurred)
! 		return FALSE;
  
! 	if (sinfo->nattrs != fields)
! 	{
! 		sinfo->error_occurred = TRUE;
! 		sinfo->nummismatch = TRUE;
! 		finishStoreInfo(sinfo);
! 
! 		/* This error will be processed in
! 		 * dblink_record_internal(). So do not set error message
! 		 * here. */
! 		return FALSE;
! 	}
  
! 	/*
! 	 * value input functions assumes that the input string is
! 	 * terminated by zero. We should make the values to be so.
! 	 */
! 	for(i = 0 ; i < fields ; i++)
! 	{
! 		int len = columns[i].len;
! 		if (len < 0)
! 			cstrs[i] = NULL;
! 		else
! 		{
! 			char *tmp = sinfo->valbuf[i];
! 			int tmplen = sinfo->valbuflen[i];
! 
! 			/*
! 			 * Divide calls to malloc and realloc so that things will
! 			 * go fine even on the systems of which realloc() does not
! 			 * accept NULL as old memory block.
! 			 *
! 			 * Also try to (re)allocate in bigger steps to
! 			 * avoid flood of allocations on weird data.
! 			 */
! 			if (tmp == NULL)
! 			{
! 				tmplen = len + 1;
! 				if (tmplen < 64)
! 					tmplen = 64;
! 				tmp = (char *)malloc(tmplen);
! 			}
! 			else if (tmplen < len + 1)
! 			{
! 				if (len + 1 > tmplen * 2)
! 					tmplen = len + 1;
! 				else
! 					tmplen = tmplen * 2;
! 				tmp = (char *)realloc(tmp, tmplen);
  			}
  
! 			/*
! 			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
! 			 * when realloc returns NULL.
! 			 */
! 			if (tmp == NULL)
! 				return FALSE;
! 
! 			sinfo->valbuf[i] = tmp;
! 			sinfo->valbuflen[i] = tmplen;
! 
! 			cstrs[i] = sinfo->valbuf[i];
! 			memcpy(cstrs[i], columns[i].value, len);
! 			cstrs[i][len] = '\0';
  		}
+ 	}
  
! 	PG_TRY();
! 	{
! 		tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
! 		tuplestore_puttuple(sinfo->tuplestore, tuple);
  	}
  	PG_CATCH();
  	{
! 		MemoryContext context;
! 		/*
! 		 * Store exception for later ReThrow and cancel the exception.
! 		 */
! 		sinfo->error_occurred = TRUE;
! 		context = MemoryContextSwitchTo(sinfo->oldcontext);
! 		sinfo->edata = CopyErrorData();
! 		MemoryContextSwitchTo(context);
! 		FlushErrorState();
! 		return FALSE;
  	}
  	PG_END_TRY();
+ 
+ 	return TRUE;
  }
  
  /*
early.exit.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 2dc18e6..6829d52 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7352,6 +7352,14 @@ typedef struct
        On failure this function should set the error message
        with <function>PGsetRowProcessorErrMsg</function> if the cause
        is other than out of memory.
+       When non-blocking API is in use, it can also return 2
+       for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter> and <parameter>columns</parameter>
+       values will stay valid so row can be processed outside of callback.
+       Caller is resposible for tracking whether the <parameter>PQisBusy</parameter>
+       returned early from callback or for other reasons.
+       Usually this should happen via setting cached values to NULL
+       before calling <function>PQisBusy</function>.
      </para>
 
      <para>
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index 7498580..ae4d7b0 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -820,6 +820,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	rp= conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
 	if (rp == 1)
 		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
 	else if (rp != 0)
 		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index a67e3ac..0260ba6 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -697,6 +697,11 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* everything is good */
 		return 0;
 	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
 
 	/* there was some problem */
 	if (rp == 0)
#47Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#46)
Re: Speed dblink using alternate libpq tuple storage

Hello, sorry for long absense.

As far as I see, on an out-of-memory in getAnotherTuple() makes
conn->result->resultStatus = PGRES_FATAL_ERROR and
qpParseInputp[23]() skips succeeding 'D' messages consequently.

When exception raised within row processor, pg_conn->inCursor
always positioned in consistent and result->resultStatus ==
PGRES_TUPLES_OK.

The choices of the libpq user on that point are,

- Continue to read succeeding tuples.

Call PQgetResult() to read 'D' messages and hand it to row
processor succeedingly.

- Throw away the remaining results.

Call pqClearAsyncResult() and pqSaveErrorResult(), then call
PQgetResult() to skip over the succeeding 'D' messages. (Of
course the user can't do that on current implement.)

To make the users able to select the second choice (I think this
is rather major), we should only provide and export the new PQ*
function to do that, I think.

void
PQskipRemainingResult(PGconn *conn)
{
pqClearAsyncResult(conn);

/* conn->result is always NULL here */
pqSaveErrorResult(conn);

/* Skip over remaining 'D' messages. * /
PQgetResult(conn);
}

User may write code with this function.

...
PG_TRY();
{
...
res = PQexec(conn, "....");
...
}
PG_CATCH();
{
PQskipRemainingResult(conn);
goto error;
}
PG_END_TRY();

Of cource, this is applicable to C++ client in the same manner.

try {
...
res = PQexec(conn, "....");
...
} catch (const myexcep& ex) {
PQskipRemainingResult(conn);
throw ex;
}

By the way, where should I insert this function ?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#48Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#47)
Re: Speed dblink using alternate libpq tuple storage

On Thu, Feb 16, 2012 at 02:24:19PM +0900, Kyotaro HORIGUCHI wrote:

As far as I see, on an out-of-memory in getAnotherTuple() makes
conn->result->resultStatus = PGRES_FATAL_ERROR and
qpParseInputp[23]() skips succeeding 'D' messages consequently.

When exception raised within row processor, pg_conn->inCursor
always positioned in consistent and result->resultStatus ==
PGRES_TUPLES_OK.

The choices of the libpq user on that point are,

- Continue to read succeeding tuples.

Call PQgetResult() to read 'D' messages and hand it to row
processor succeedingly.

- Throw away the remaining results.

Call pqClearAsyncResult() and pqSaveErrorResult(), then call
PQgetResult() to skip over the succeeding 'D' messages. (Of
course the user can't do that on current implement.)

There is also third choice, which may be even more popular than
those ones - PQfinish().

To make the users able to select the second choice (I think this
is rather major), we should only provide and export the new PQ*
function to do that, I think.

I think we already have such function - PQsetRowProcessor().
Considering the user can use that to install skipping callback
or simply set some flag in it's own per-connection state,
I suspect the need is not that big.

But if you want to set error state for skipping, I would instead
generalize PQsetRowProcessorErrMsg() to support setting error state
outside of callback. That would also help the external processing with
'return 2'. But I would keep the requirement that row processing must
be ongoing, standalone error setting does not make sense. Thus the name
can stay.

There seems to be 2 ways to do it:

1) Replace the PGresult under PGconn. This seems ot require that
PQsetRowProcessorErrMsg takes PGconn as argument instead of
PGresult. This also means callback should get PGconn as
argument. Kind of makes sense even.

2) Convert current partial PGresult to error state. That also
makes sense, current use ->rowProcessorErrMsg to transport
the message to later set the error in caller feels weird.

I guess I slightly prefer 2) to 1).

--
marko

#49Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#46)
4 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello,

I added the function PQskipRemainingResult() and use it in
dblink. This reduces the number of executing try-catch block from
the number of rows to one per query in dblink.

- fe-exec.c : new function PQskipRemainingResult.
- dblink.c : using PQskipRemainingResult in dblink_record_internal().
- libpq.sgml: documentation for PQskipRemainingResult and related
change in RowProcessor.

Instead I added simple feature: rowProcessor can return '2',
in which case getAnotherTuple() does early exit without setting
any error state. In user side it appears as PQisBusy() returned
with TRUE result. All pointers stay valid, so callback can just
stuff them into some temp area.

...

It's included in main patch, but I also attached it as separate patch
so that it can be examined separately and reverted if not acceptable.

This patch is based on the patch above and composed in the same
manner - main three patches include all modifications and the '2'
patch separately.

This patch is not rebased to the HEAD because the HEAD yields
error about the symbol LEAKPROOF...

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120216.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/#fe-protocol3.c# b/src/interfaces/libpq/#fe-protocol3.c#
new file mode 100644
index 0000000..8b7eed2
--- /dev/null
+++ b/src/interfaces/libpq/#fe-protocol3.c#
@@ -0,0 +1,1967 @@
+/*-------------------------------------------------------------------------
+ *
+ * fe-protocol3.c
+ *	  functions that are specific to frontend/backend protocol version 3
+ *
+ * Portions Copyright (c) 1996-2012, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *	  src/interfaces/libpq/fe-protocol3.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres_fe.h"
+
+#include <ctype.h>
+#include <fcntl.h>
+
+#include "libpq-fe.h"
+#include "libpq-int.h"
+
+#include "mb/pg_wchar.h"
+
+#ifdef WIN32
+#include "win32.h"
+#else
+#include <unistd.h>
+#include <netinet/in.h>
+#ifdef HAVE_NETINET_TCP_H
+#include <netinet/tcp.h>
+#endif
+#include <arpa/inet.h>
+#endif
+
+
+/*
+ * This macro lists the backend message types that could be "long" (more
+ * than a couple of kilobytes).
+ */
+#define VALID_LONG_MESSAGE_TYPE(id) \
+	((id) == 'T' || (id) == 'D' || (id) == 'd' || (id) == 'V' || \
+	 (id) == 'E' || (id) == 'N' || (id) == 'A')
+
+
+static void handleSyncLoss(PGconn *conn, char id, int msgLength);
+static int	getRowDescriptions(PGconn *conn);
+static int	getParamDescriptions(PGconn *conn);
+static int	getAnotherTuple(PGconn *conn, int msgLength);
+static int	getParameterStatus(PGconn *conn);
+static int	getNotify(PGconn *conn);
+static int	getCopyStart(PGconn *conn, ExecStatusType copytype);
+static int	getReadyForQuery(PGconn *conn);
+static void reportErrorPosition(PQExpBuffer msg, const char *query,
+					int loc, int encoding);
+static int build_startup_packet(const PGconn *conn, char *packet,
+					 const PQEnvironmentOption *options);
+
+
+/*
+ * parseInput: if appropriate, parse input data from backend
+ * until input is exhausted or a stopping state is reached.
+ * Note that this function will NOT attempt to read more data from the backend.
+ */
+void
+pqParseInput3(PGconn *conn)
+{
+	char		id;
+	int			msgLength;
+	int			avail;
+
+	/*
+	 * Loop to parse successive complete messages available in the buffer.
+	 */
+	for (;;)
+	{
+		/*
+		 * Try to read a message.  First get the type code and length. Return
+		 * if not enough data.
+		 */
+		conn->inCursor = conn->inStart;
+		if (pqGetc(&id, conn))
+			return;
+		if (pqGetInt(&msgLength, 4, conn))
+			return;
+
+		/*
+		 * Try to validate message type/length here.  A length less than 4 is
+		 * definitely broken.  Large lengths should only be believed for a few
+		 * message types.
+		 */
+		if (msgLength < 4)
+		{
+			handleSyncLoss(conn, id, msgLength);
+			return;
+		}
+		if (msgLength > 30000 && !VALID_LONG_MESSAGE_TYPE(id))
+		{
+			handleSyncLoss(conn, id, msgLength);
+			return;
+		}
+
+		/*
+		 * Can't process if message body isn't all here yet.
+		 */
+		msgLength -= 4;
+		avail = conn->inEnd - conn->inCursor;
+		if (avail < msgLength)
+		{
+			/*
+			 * Before returning, enlarge the input buffer if needed to hold
+			 * the whole message.  This is better than leaving it to
+			 * pqReadData because we can avoid multiple cycles of realloc()
+			 * when the message is large; also, we can implement a reasonable
+			 * recovery strategy if we are unable to make the buffer big
+			 * enough.
+			 */
+			if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength,
+									 conn))
+			{
+				/*
+				 * XXX add some better recovery code... plan is to skip over
+				 * the message using its length, then report an error. For the
+				 * moment, just treat this like loss of sync (which indeed it
+				 * might be!)
+				 */
+				handleSyncLoss(conn, id, msgLength);
+			}
+			return;
+		}
+
+		/*
+		 * NOTIFY and NOTICE messages can happen in any state; always process
+		 * them right away.
+		 *
+		 * Most other messages should only be processed while in BUSY state.
+		 * (In particular, in READY state we hold off further parsing until
+		 * the application collects the current PGresult.)
+		 *
+		 * However, if the state is IDLE then we got trouble; we need to deal
+		 * with the unexpected message somehow.
+		 *
+		 * ParameterStatus ('S') messages are a special case: in IDLE state we
+		 * must process 'em (this case could happen if a new value was adopted
+		 * from config file due to SIGHUP), but otherwise we hold off until
+		 * BUSY state.
+		 */
+		if (id == 'A')
+		{
+			if (getNotify(conn))
+				return;
+		}
+		else if (id == 'N')
+		{
+			if (pqGetErrorNotice3(conn, false))
+				return;
+		}
+		else if (conn->asyncStatus != PGASYNC_BUSY)
+		{
+			/* If not IDLE state, just wait ... */
+			if (conn->asyncStatus != PGASYNC_IDLE)
+				return;
+
+			/*
+			 * Unexpected message in IDLE state; need to recover somehow.
+			 * ERROR messages are displayed using the notice processor;
+			 * ParameterStatus is handled normally; anything else is just
+			 * dropped on the floor after displaying a suitable warning
+			 * notice.	(An ERROR is very possibly the backend telling us why
+			 * it is about to close the connection, so we don't want to just
+			 * discard it...)
+			 */
+			if (id == 'E')
+			{
+				if (pqGetErrorNotice3(conn, false /* treat as notice */ ))
+					return;
+			}
+			else if (id == 'S')
+			{
+				if (getParameterStatus(conn))
+					return;
+			}
+			else
+			{
+				pqInternalNotice(&conn->noticeHooks,
+						"message type 0x%02x arrived from server while idle",
+								 id);
+				/* Discard the unexpected message */
+				conn->inCursor += msgLength;
+			}
+		}
+		else
+		{
+			/*
+			 * In BUSY state, we can process everything.
+			 */
+			switch (id)
+			{
+				case 'C':		/* command complete */
+					if (pqGets(&conn->workBuffer, conn))
+						return;
+					if (conn->result == NULL)
+					{
+						conn->result = PQmakeEmptyPGresult(conn,
+														   PGRES_COMMAND_OK);
+						if (!conn->result)
+							return;
+					}
+					strncpy(conn->result->cmdStatus, conn->workBuffer.data,
+							CMDSTATUS_LEN);
+					conn->asyncStatus = PGASYNC_READY;
+					break;
+				case 'E':		/* error return */
+					if (pqGetErrorNotice3(conn, true))
+						return;
+					conn->asyncStatus = PGASYNC_READY;
+					break;
+				case 'Z':		/* backend is ready for new query */
+					if (getReadyForQuery(conn))
+						return;
+					conn->asyncStatus = PGASYNC_IDLE;
+					break;
+				case 'I':		/* empty query */
+					if (conn->result == NULL)
+					{
+						conn->result = PQmakeEmptyPGresult(conn,
+														   PGRES_EMPTY_QUERY);
+						if (!conn->result)
+							return;
+					}
+					conn->asyncStatus = PGASYNC_READY;
+					break;
+				case '1':		/* Parse Complete */
+					/* If we're doing PQprepare, we're done; else ignore */
+					if (conn->queryclass == PGQUERY_PREPARE)
+					{
+						if (conn->result == NULL)
+						{
+							conn->result = PQmakeEmptyPGresult(conn,
+														   PGRES_COMMAND_OK);
+							if (!conn->result)
+								return;
+						}
+						conn->asyncStatus = PGASYNC_READY;
+					}
+					break;
+				case '2':		/* Bind Complete */
+				case '3':		/* Close Complete */
+					/* Nothing to do for these message types */
+					break;
+				case 'S':		/* parameter status */
+					if (getParameterStatus(conn))
+						return;
+					break;
+				case 'K':		/* secret key data from the backend */
+
+					/*
+					 * This is expected only during backend startup, but it's
+					 * just as easy to handle it as part of the main loop.
+					 * Save the data and continue processing.
+					 */
+					if (pqGetInt(&(conn->be_pid), 4, conn))
+						return;
+					if (pqGetInt(&(conn->be_key), 4, conn))
+						return;
+					break;
+				case 'T':		/* Row Description */
+					if (conn->result == NULL ||
+						conn->queryclass == PGQUERY_DESCRIBE)
+					{
+						/* First 'T' in a query sequence */
+						if (getRowDescriptions(conn))
+							return;
+
+						/*
+						 * If we're doing a Describe, we're ready to pass the
+						 * result back to the client.
+						 */
+						if (conn->queryclass == PGQUERY_DESCRIBE)
+							conn->asyncStatus = PGASYNC_READY;
+					}
+					else
+					{
+						/*
+						 * A new 'T' message is treated as the start of
+						 * another PGresult.  (It is not clear that this is
+						 * really possible with the current backend.) We stop
+						 * parsing until the application accepts the current
+						 * result.
+						 */
+						conn->asyncStatus = PGASYNC_READY;
+						return;
+					}
+					break;
+				case 'n':		/* No Data */
+
+					/*
+					 * NoData indicates that we will not be seeing a
+					 * RowDescription message because the statement or portal
+					 * inquired about doesn't return rows.
+					 *
+					 * If we're doing a Describe, we have to pass something
+					 * back to the client, so set up a COMMAND_OK result,
+					 * instead of TUPLES_OK.  Otherwise we can just ignore
+					 * this message.
+					 */
+					if (conn->queryclass == PGQUERY_DESCRIBE)
+					{
+						if (conn->result == NULL)
+						{
+							conn->result = PQmakeEmptyPGresult(conn,
+														   PGRES_COMMAND_OK);
+							if (!conn->result)
+								return;
+						}
+						conn->asyncStatus = PGASYNC_READY;
+					}
+					break;
+				case 't':		/* Parameter Description */
+					if (getParamDescriptions(conn))
+						return;
+					break;
+				case 'D':		/* Data Row */
+					if (conn->result != NULL &&
+						conn->result->resultStatus == PGRES_TUPLES_OK)
+					{
+						/* Read another tuple of a normal query response */
+						if (getAnotherTuple(conn, msgLength))
+							return;
+					}
+					else if (conn->result != NULL &&
+							 conn->result->resultStatus == PGRES_FATAL_ERROR)
+					{
+						/*
+						 * We've already choked for some reason.  Just discard
+						 * tuples till we get to the end of the query.
+						 */
+						conn->inCursor += msgLength;
+					}
+					else
+					{
+						/* Set up to report error at end of query */
+						printfPQExpBuffer(&conn->errorMessage,
+										  libpq_gettext("server sent data (\"D\" message) without prior row description (\"T\" message)\n"));
+						pqSaveErrorResult(conn);
+						/* Discard the unexpected message */
+						conn->inCursor += msgLength;
+					}
+					break;
+				case 'G':		/* Start Copy In */
+					if (getCopyStart(conn, PGRES_COPY_IN))
+						return;
+					conn->asyncStatus = PGASYNC_COPY_IN;
+					break;
+				case 'H':		/* Start Copy Out */
+					if (getCopyStart(conn, PGRES_COPY_OUT))
+						return;
+					conn->asyncStatus = PGASYNC_COPY_OUT;
+					conn->copy_already_done = 0;
+					break;
+				case 'W':		/* Start Copy Both */
+					if (getCopyStart(conn, PGRES_COPY_BOTH))
+						return;
+					conn->asyncStatus = PGASYNC_COPY_BOTH;
+					conn->copy_already_done = 0;
+					break;
+				case 'd':		/* Copy Data */
+
+					/*
+					 * If we see Copy Data, just silently drop it.	This would
+					 * only occur if application exits COPY OUT mode too
+					 * early.
+					 */
+					conn->inCursor += msgLength;
+					break;
+				case 'c':		/* Copy Done */
+
+					/*
+					 * If we see Copy Done, just silently drop it.	This is
+					 * the normal case during PQendcopy.  We will keep
+					 * swallowing data, expecting to see command-complete for
+					 * the COPY command.
+					 */
+					break;
+				default:
+					printfPQExpBuffer(&conn->errorMessage,
+									  libpq_gettext(
+													"unexpected response from server; first received character was \"%c\"\n"),
+									  id);
+					/* build an error result holding the error message */
+					pqSaveErrorResult(conn);
+					/* not sure if we will see more, so go to ready state */
+					conn->asyncStatus = PGASYNC_READY;
+					/* Discard the unexpected message */
+					conn->inCursor += msgLength;
+					break;
+			}					/* switch on protocol character */
+		}
+		/* Successfully consumed this message */
+		if (conn->inCursor == conn->inStart + 5 + msgLength)
+		{
+			/* Normal case: parsing agrees with specified length */
+			conn->inStart = conn->inCursor;
+		}
+		else
+		{
+			/* Trouble --- report it */
+			printfPQExpBuffer(&conn->errorMessage,
+							  libpq_gettext("message contents do not agree with length in message type \"%c\"\n"),
+							  id);
+			/* build an error result holding the error message */
+			pqSaveErrorResult(conn);
+			conn->asyncStatus = PGASYNC_READY;
+			/* trust the specified message length as what to skip */
+			conn->inStart += 5 + msgLength;
+		}
+	}
+}
+
+/*
+ * handleSyncLoss: clean up after loss of message-boundary sync
+ *
+ * There isn't really a lot we can do here except abandon the connection.
+ */
+static void
+handleSyncLoss(PGconn *conn, char id, int msgLength)
+{
+	printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext(
+	"lost synchronization with server: got message type \"%c\", length %d\n"),
+					  id, msgLength);
+	/* build an error result holding the error message */
+	pqSaveErrorResult(conn);
+	conn->asyncStatus = PGASYNC_READY;	/* drop out of GetResult wait loop */
+
+	pqsecure_close(conn);
+	closesocket(conn->sock);
+	conn->sock = -1;
+	conn->status = CONNECTION_BAD;		/* No more connection to backend */
+}
+
+/*
+ * parseInput subroutine to read a 'T' (row descriptions) message.
+ * We'll build a new PGresult structure (unless called for a Describe
+ * command for a prepared statement) containing the attribute data.
+ * Returns: 0 if completed message, EOF if not enough data yet.
+ *
+ * Note that if we run out of data, we have to release the partially
+ * constructed PGresult, and rebuild it again next time.  Fortunately,
+ * that shouldn't happen often, since 'T' messages usually fit in a packet.
+ */
+static int
+getRowDescriptions(PGconn *conn)
+{
+	PGresult   *result;
+	int			nfields;
+	int			i;
+
+	/*
+	 * When doing Describe for a prepared statement, there'll already be a
+	 * PGresult created by getParamDescriptions, and we should fill data into
+	 * that.  Otherwise, create a new, empty PGresult.
+	 */
+	if (conn->queryclass == PGQUERY_DESCRIBE)
+	{
+		if (conn->result)
+			result = conn->result;
+		else
+			result = PQmakeEmptyPGresult(conn, PGRES_COMMAND_OK);
+	}
+	else
+		result = PQmakeEmptyPGresult(conn, PGRES_TUPLES_OK);
+	if (!result)
+		goto failure;
+
+	/* parseInput already read the 'T' label and message length. */
+	/* the next two bytes are the number of fields */
+	if (pqGetInt(&(result->numAttributes), 2, conn))
+		goto failure;
+	nfields = result->numAttributes;
+
+	/* allocate space for the attribute descriptors */
+	if (nfields > 0)
+	{
+		result->attDescs = (PGresAttDesc *)
+			pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
+		if (!result->attDescs)
+			goto failure;
+		MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
+	}
+
+	/* result->binary is true only if ALL columns are binary */
+	result->binary = (nfields > 0) ? 1 : 0;
+
+	/* get type info */
+	for (i = 0; i < nfields; i++)
+	{
+		int			tableid;
+		int			columnid;
+		int			typid;
+		int			typlen;
+		int			atttypmod;
+		int			format;
+
+		if (pqGets(&conn->workBuffer, conn) ||
+			pqGetInt(&tableid, 4, conn) ||
+			pqGetInt(&columnid, 2, conn) ||
+			pqGetInt(&typid, 4, conn) ||
+			pqGetInt(&typlen, 2, conn) ||
+			pqGetInt(&atttypmod, 4, conn) ||
+			pqGetInt(&format, 2, conn))
+		{
+			goto failure;
+		}
+
+		/*
+		 * Since pqGetInt treats 2-byte integers as unsigned, we need to
+		 * coerce these results to signed form.
+		 */
+		columnid = (int) ((int16) columnid);
+		typlen = (int) ((int16) typlen);
+		format = (int) ((int16) format);
+
+		result->attDescs[i].name = pqResultStrdup(result,
+												  conn->workBuffer.data);
+		if (!result->attDescs[i].name)
+			goto failure;
+		result->attDescs[i].tableid = tableid;
+		result->attDescs[i].columnid = columnid;
+		result->attDescs[i].format = format;
+		result->attDescs[i].typid = typid;
+		result->attDescs[i].typlen = typlen;
+		result->attDescs[i].atttypmod = atttypmod;
+
+		if (format != 1)
+			result->binary = 0;
+	}
+
+	/* Success! */
+	conn->result = result;
+	return 0;
+
+failure:
+
+	/*
+	 * Discard incomplete result, unless it's from getParamDescriptions.
+	 *
+	 * Note that if we hit a bufferload boundary while handling the
+	 * describe-statement case, we'll forget any PGresult space we just
+	 * allocated, and then reallocate it on next try.  This will bloat the
+	 * PGresult a little bit but the space will be freed at PQclear, so it
+	 * doesn't seem worth trying to be smarter.
+	 */
+	if (result != conn->result)
+		PQclear(result);
+	return EOF;
+}
+
+/*
+ * parseInput subroutine to read a 't' (ParameterDescription) message.
+ * We'll build a new PGresult structure containing the parameter data.
+ * Returns: 0 if completed message, EOF if not enough data yet.
+ *
+ * Note that if we run out of data, we have to release the partially
+ * constructed PGresult, and rebuild it again next time.  Fortunately,
+ * that shouldn't happen often, since 't' messages usually fit in a packet.
+ */
+static int
+getParamDescriptions(PGconn *conn)
+{
+	PGresult   *result;
+	int			nparams;
+	int			i;
+
+	result = PQmakeEmptyPGresult(conn, PGRES_COMMAND_OK);
+	if (!result)
+		goto failure;
+
+	/* parseInput already read the 't' label and message length. */
+	/* the next two bytes are the number of parameters */
+	if (pqGetInt(&(result->numParameters), 2, conn))
+		goto failure;
+	nparams = result->numParameters;
+
+	/* allocate space for the parameter descriptors */
+	if (nparams > 0)
+	{
+		result->paramDescs = (PGresParamDesc *)
+			pqResultAlloc(result, nparams * sizeof(PGresParamDesc), TRUE);
+		if (!result->paramDescs)
+			goto failure;
+		MemSet(result->paramDescs, 0, nparams * sizeof(PGresParamDesc));
+	}
+
+	/* get parameter info */
+	for (i = 0; i < nparams; i++)
+	{
+		int			typid;
+
+		if (pqGetInt(&typid, 4, conn))
+			goto failure;
+		result->paramDescs[i].typid = typid;
+	}
+
+	/* Success! */
+	conn->result = result;
+	return 0;
+
+failure:
+	PQclear(result);
+	return EOF;
+}
+
+/*
+ * parseInput subroutine to read a 'D' (row data) message.
+ * We add another tuple to the existing PGresult structure.
+ * Returns: 0 if completed message, EOF if error or not enough data yet.
+ *
+ * Note that if we run out of data, we have to suspend and reprocess
+ * the message after more data is received.  We keep a partially constructed
+ * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ */
+static int
+getAnotherTuple(PGconn *conn, int msgLength)
+{
+	PGresult   *result = conn->result;
+	int			nfields = result->numAttributes;
+	PGrowValue  rowval[result->numAttributes + 1];
+	int			tupnfields;		/* # fields from tuple */
+	int			vlen;			/* length of the current field value */
+	int			i;
+
+	/* Allocate tuple space if first time for this data message */
+	/* Get the field count and make sure it's what we expect */
+	if (pqGetInt(&tupnfields, 2, conn))
+		return EOF;
+
+	if (tupnfields != nfields)
+	{
+		/* Replace partially constructed result with an error result */
+		printfPQExpBuffer(&conn->errorMessage,
+				 libpq_gettext("unexpected field count in \"D\" message\n"));
+		pqSaveErrorResult(conn);
+		/* Discard the failed message by pretending we read it */
+		conn->inCursor = conn->inStart + 5 + msgLength;
+		return 0;
+	}
+
+	/* Scan the fields */
+	for (i = 0; i < nfields; i++)
+	{
+		/* get the value length */
+		if (pqGetInt(&vlen, 4, conn))
+			return EOF;
+		if (vlen == -1)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
+			vlen = 0;
+		
+		/*
+		 * Buffer content may be shifted on reloading additional
+		 * data. So we must set all pointers on every scan.
+		 * 
+		 * rowval[i].value always points to the next address of the
+		 * length field even if the value length is zero or the value
+		 * is NULL for the access safety.
+		 */
+		rowval[i].value = conn->inBuffer + conn->inCursor;
+ 		rowval[i].len = vlen;
+
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			return EOF;
+	}
+
+	/*
+	 * Set rowval[nfields] for the access safety. We can estimate the
+	 * length of the buffer to store by
+	 *
+     *    rowval[nfields].value - rowval[0].value - 4 * nfields.
+	 */
+	rowval[nfields].value = conn->inBuffer + conn->inCursor;
+	rowval[nfields].len = NULL_LEN;
+
+	/* Success!  Pass the completed row values to rowProcessor */
+	if (!result->rowProcessor(result, result->rowProcessorParam, rowval))
+		goto rowProcessError;
+	
+	/* Free garbage error message. */
+	if (result->rowProcessorErrMes)
+	{
+		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
+
+	return 0;
+
+rowProcessError:
+
+	/*
+	 * Replace partially constructed result with an error result. First
+	 * discard the old result to try to win back some memory.
+	 */
+	pqClearAsyncResult(conn);
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * If error message is passed from addTupleFunc, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage,
+						 libpq_gettext(result->rowProcessorErrMes ?
+									   result->rowProcessorErrMes : 
+									   "out of memory for query result\n"));
+	if (result->rowProcessorErrMes)
+	{
+o		free(result->rowProcessorErrMes);
+		result->rowProcessorErrMes = NULL;
+	}
+	pqSaveErrorResult(conn);
+
+	/* Discard the failed message by pretending we read it */
+	conn->inCursor = conn->inStart + 5 + msgLength;
+	return 0;
+}
+
+
+/*
+ * Attempt to read an Error or Notice response message.
+ * This is possible in several places, so we break it out as a subroutine.
+ * Entry: 'E' or 'N' message type and length have already been consumed.
+ * Exit: returns 0 if successfully consumed message.
+ *		 returns EOF if not enough data.
+ */
+int
+pqGetErrorNotice3(PGconn *conn, bool isError)
+{
+	PGresult   *res = NULL;
+	PQExpBufferData workBuf;
+	char		id;
+	const char *val;
+	const char *querytext = NULL;
+	int			querypos = 0;
+
+	/*
+	 * Since the fields might be pretty long, we create a temporary
+	 * PQExpBuffer rather than using conn->workBuffer.	workBuffer is intended
+	 * for stuff that is expected to be short.	We shouldn't use
+	 * conn->errorMessage either, since this might be only a notice.
+	 */
+	initPQExpBuffer(&workBuf);
+
+	/*
+	 * Make a PGresult to hold the accumulated fields.	We temporarily lie
+	 * about the result status, so that PQmakeEmptyPGresult doesn't uselessly
+	 * copy conn->errorMessage.
+	 */
+	res = PQmakeEmptyPGresult(conn, PGRES_EMPTY_QUERY);
+	if (!res)
+		goto fail;
+	res->resultStatus = isError ? PGRES_FATAL_ERROR : PGRES_NONFATAL_ERROR;
+
+	/*
+	 * Read the fields and save into res.
+	 */
+	for (;;)
+	{
+		if (pqGetc(&id, conn))
+			goto fail;
+		if (id == '\0')
+			break;				/* terminator found */
+		if (pqGets(&workBuf, conn))
+			goto fail;
+		pqSaveMessageField(res, id, workBuf.data);
+	}
+
+	/*
+	 * Now build the "overall" error message for PQresultErrorMessage.
+	 *
+	 * Also, save the SQLSTATE in conn->last_sqlstate.
+	 */
+	resetPQExpBuffer(&workBuf);
+	val = PQresultErrorField(res, PG_DIAG_SEVERITY);
+	if (val)
+		appendPQExpBuffer(&workBuf, "%s:  ", val);
+	val = PQresultErrorField(res, PG_DIAG_SQLSTATE);
+	if (val)
+	{
+		if (strlen(val) < sizeof(conn->last_sqlstate))
+			strcpy(conn->last_sqlstate, val);
+		if (conn->verbosity == PQERRORS_VERBOSE)
+			appendPQExpBuffer(&workBuf, "%s: ", val);
+	}
+	val = PQresultErrorField(res, PG_DIAG_MESSAGE_PRIMARY);
+	if (val)
+		appendPQExpBufferStr(&workBuf, val);
+	val = PQresultErrorField(res, PG_DIAG_STATEMENT_POSITION);
+	if (val)
+	{
+		if (conn->verbosity != PQERRORS_TERSE && conn->last_query != NULL)
+		{
+			/* emit position as a syntax cursor display */
+			querytext = conn->last_query;
+			querypos = atoi(val);
+		}
+		else
+		{
+			/* emit position as text addition to primary message */
+			/* translator: %s represents a digit string */
+			appendPQExpBuffer(&workBuf, libpq_gettext(" at character %s"),
+							  val);
+		}
+	}
+	else
+	{
+		val = PQresultErrorField(res, PG_DIAG_INTERNAL_POSITION);
+		if (val)
+		{
+			querytext = PQresultErrorField(res, PG_DIAG_INTERNAL_QUERY);
+			if (conn->verbosity != PQERRORS_TERSE && querytext != NULL)
+			{
+				/* emit position as a syntax cursor display */
+				querypos = atoi(val);
+			}
+			else
+			{
+				/* emit position as text addition to primary message */
+				/* translator: %s represents a digit string */
+				appendPQExpBuffer(&workBuf, libpq_gettext(" at character %s"),
+								  val);
+			}
+		}
+	}
+	appendPQExpBufferChar(&workBuf, '\n');
+	if (conn->verbosity != PQERRORS_TERSE)
+	{
+		if (querytext && querypos > 0)
+			reportErrorPosition(&workBuf, querytext, querypos,
+								conn->client_encoding);
+		val = PQresultErrorField(res, PG_DIAG_MESSAGE_DETAIL);
+		if (val)
+			appendPQExpBuffer(&workBuf, libpq_gettext("DETAIL:  %s\n"), val);
+		val = PQresultErrorField(res, PG_DIAG_MESSAGE_HINT);
+		if (val)
+			appendPQExpBuffer(&workBuf, libpq_gettext("HINT:  %s\n"), val);
+		val = PQresultErrorField(res, PG_DIAG_INTERNAL_QUERY);
+		if (val)
+			appendPQExpBuffer(&workBuf, libpq_gettext("QUERY:  %s\n"), val);
+		val = PQresultErrorField(res, PG_DIAG_CONTEXT);
+		if (val)
+			appendPQExpBuffer(&workBuf, libpq_gettext("CONTEXT:  %s\n"), val);
+	}
+	if (conn->verbosity == PQERRORS_VERBOSE)
+	{
+		const char *valf;
+		const char *vall;
+
+		valf = PQresultErrorField(res, PG_DIAG_SOURCE_FILE);
+		vall = PQresultErrorField(res, PG_DIAG_SOURCE_LINE);
+		val = PQresultErrorField(res, PG_DIAG_SOURCE_FUNCTION);
+		if (val || valf || vall)
+		{
+			appendPQExpBufferStr(&workBuf, libpq_gettext("LOCATION:  "));
+			if (val)
+				appendPQExpBuffer(&workBuf, libpq_gettext("%s, "), val);
+			if (valf && vall)	/* unlikely we'd have just one */
+				appendPQExpBuffer(&workBuf, libpq_gettext("%s:%s"),
+								  valf, vall);
+			appendPQExpBufferChar(&workBuf, '\n');
+		}
+	}
+
+	/*
+	 * Either save error as current async result, or just emit the notice.
+	 */
+	if (isError)
+	{
+		res->errMsg = pqResultStrdup(res, workBuf.data);
+		if (!res->errMsg)
+			goto fail;
+		pqClearAsyncResult(conn);
+		conn->result = res;
+		appendPQExpBufferStr(&conn->errorMessage, workBuf.data);
+	}
+	else
+	{
+		/* We can cheat a little here and not copy the message. */
+		res->errMsg = workBuf.data;
+		if (res->noticeHooks.noticeRec != NULL)
+			(*res->noticeHooks.noticeRec) (res->noticeHooks.noticeRecArg, res);
+		PQclear(res);
+	}
+
+	termPQExpBuffer(&workBuf);
+	return 0;
+
+fail:
+	PQclear(res);
+	termPQExpBuffer(&workBuf);
+	return EOF;
+}
+
+/*
+ * Add an error-location display to the error message under construction.
+ *
+ * The cursor location is measured in logical characters; the query string
+ * is presumed to be in the specified encoding.
+ */
+static void
+reportErrorPosition(PQExpBuffer msg, const char *query, int loc, int encoding)
+{
+#define DISPLAY_SIZE	60		/* screen width limit, in screen cols */
+#define MIN_RIGHT_CUT	10		/* try to keep this far away from EOL */
+
+	char	   *wquery;
+	int			slen,
+				cno,
+				i,
+			   *qidx,
+			   *scridx,
+				qoffset,
+				scroffset,
+				ibeg,
+				iend,
+				loc_line;
+	bool		mb_encoding,
+				beg_trunc,
+				end_trunc;
+
+	/* Convert loc from 1-based to 0-based; no-op if out of range */
+	loc--;
+	if (loc < 0)
+		return;
+
+	/* Need a writable copy of the query */
+	wquery = strdup(query);
+	if (wquery == NULL)
+		return;					/* fail silently if out of memory */
+
+	/*
+	 * Each character might occupy multiple physical bytes in the string, and
+	 * in some Far Eastern character sets it might take more than one screen
+	 * column as well.	We compute the starting byte offset and starting
+	 * screen column of each logical character, and store these in qidx[] and
+	 * scridx[] respectively.
+	 */
+
+	/* we need a safe allocation size... */
+	slen = strlen(wquery) + 1;
+
+	qidx = (int *) malloc(slen * sizeof(int));
+	if (qidx == NULL)
+	{
+		free(wquery);
+		return;
+	}
+	scridx = (int *) malloc(slen * sizeof(int));
+	if (scridx == NULL)
+	{
+		free(qidx);
+		free(wquery);
+		return;
+	}
+
+	/* We can optimize a bit if it's a single-byte encoding */
+	mb_encoding = (pg_encoding_max_length(encoding) != 1);
+
+	/*
+	 * Within the scanning loop, cno is the current character's logical
+	 * number, qoffset is its offset in wquery, and scroffset is its starting
+	 * logical screen column (all indexed from 0).	"loc" is the logical
+	 * character number of the error location.	We scan to determine loc_line
+	 * (the 1-based line number containing loc) and ibeg/iend (first character
+	 * number and last+1 character number of the line containing loc). Note
+	 * that qidx[] and scridx[] are filled only as far as iend.
+	 */
+	qoffset = 0;
+	scroffset = 0;
+	loc_line = 1;
+	ibeg = 0;
+	iend = -1;					/* -1 means not set yet */
+
+	for (cno = 0; wquery[qoffset] != '\0'; cno++)
+	{
+		char		ch = wquery[qoffset];
+
+		qidx[cno] = qoffset;
+		scridx[cno] = scroffset;
+
+		/*
+		 * Replace tabs with spaces in the writable copy.  (Later we might
+		 * want to think about coping with their variable screen width, but
+		 * not today.)
+		 */
+		if (ch == '\t')
+			wquery[qoffset] = ' ';
+
+		/*
+		 * If end-of-line, count lines and mark positions. Each \r or \n
+		 * counts as a line except when \r \n appear together.
+		 */
+		else if (ch == '\r' || ch == '\n')
+		{
+			if (cno < loc)
+			{
+				if (ch == '\r' ||
+					cno == 0 ||
+					wquery[qidx[cno - 1]] != '\r')
+					loc_line++;
+				/* extract beginning = last line start before loc. */
+				ibeg = cno + 1;
+			}
+			else
+			{
+				/* set extract end. */
+				iend = cno;
+				/* done scanning. */
+				break;
+			}
+		}
+
+		/* Advance */
+		if (mb_encoding)
+		{
+			int			w;
+
+			w = pg_encoding_dsplen(encoding, &wquery[qoffset]);
+			/* treat any non-tab control chars as width 1 */
+			if (w <= 0)
+				w = 1;
+			scroffset += w;
+			qoffset += pg_encoding_mblen(encoding, &wquery[qoffset]);
+		}
+		else
+		{
+			/* We assume wide chars only exist in multibyte encodings */
+			scroffset++;
+			qoffset++;
+		}
+	}
+	/* Fix up if we didn't find an end-of-line after loc */
+	if (iend < 0)
+	{
+		iend = cno;				/* query length in chars, +1 */
+		qidx[iend] = qoffset;
+		scridx[iend] = scroffset;
+	}
+
+	/* Print only if loc is within computed query length */
+	if (loc <= cno)
+	{
+		/* If the line extracted is too long, we truncate it. */
+		beg_trunc = false;
+		end_trunc = false;
+		if (scridx[iend] - scridx[ibeg] > DISPLAY_SIZE)
+		{
+			/*
+			 * We first truncate right if it is enough.  This code might be
+			 * off a space or so on enforcing MIN_RIGHT_CUT if there's a wide
+			 * character right there, but that should be okay.
+			 */
+			if (scridx[ibeg] + DISPLAY_SIZE >= scridx[loc] + MIN_RIGHT_CUT)
+			{
+				while (scridx[iend] - scridx[ibeg] > DISPLAY_SIZE)
+					iend--;
+				end_trunc = true;
+			}
+			else
+			{
+				/* Truncate right if not too close to loc. */
+				while (scridx[loc] + MIN_RIGHT_CUT < scridx[iend])
+				{
+					iend--;
+					end_trunc = true;
+				}
+
+				/* Truncate left if still too long. */
+				while (scridx[iend] - scridx[ibeg] > DISPLAY_SIZE)
+				{
+					ibeg++;
+					beg_trunc = true;
+				}
+			}
+		}
+
+		/* truncate working copy at desired endpoint */
+		wquery[qidx[iend]] = '\0';
+
+		/* Begin building the finished message. */
+		i = msg->len;
+		appendPQExpBuffer(msg, libpq_gettext("LINE %d: "), loc_line);
+		if (beg_trunc)
+			appendPQExpBufferStr(msg, "...");
+
+		/*
+		 * While we have the prefix in the msg buffer, compute its screen
+		 * width.
+		 */
+		scroffset = 0;
+		for (; i < msg->len; i += pg_encoding_mblen(encoding, &msg->data[i]))
+		{
+			int			w = pg_encoding_dsplen(encoding, &msg->data[i]);
+
+			if (w <= 0)
+				w = 1;
+			scroffset += w;
+		}
+
+		/* Finish up the LINE message line. */
+		appendPQExpBufferStr(msg, &wquery[qidx[ibeg]]);
+		if (end_trunc)
+			appendPQExpBufferStr(msg, "...");
+		appendPQExpBufferChar(msg, '\n');
+
+		/* Now emit the cursor marker line. */
+		scroffset += scridx[loc] - scridx[ibeg];
+		for (i = 0; i < scroffset; i++)
+			appendPQExpBufferChar(msg, ' ');
+		appendPQExpBufferChar(msg, '^');
+		appendPQExpBufferChar(msg, '\n');
+	}
+
+	/* Clean up. */
+	free(scridx);
+	free(qidx);
+	free(wquery);
+}
+
+
+/*
+ * Attempt to read a ParameterStatus message.
+ * This is possible in several places, so we break it out as a subroutine.
+ * Entry: 'S' message type and length have already been consumed.
+ * Exit: returns 0 if successfully consumed message.
+ *		 returns EOF if not enough data.
+ */
+static int
+getParameterStatus(PGconn *conn)
+{
+	PQExpBufferData valueBuf;
+
+	/* Get the parameter name */
+	if (pqGets(&conn->workBuffer, conn))
+		return EOF;
+	/* Get the parameter value (could be large) */
+	initPQExpBuffer(&valueBuf);
+	if (pqGets(&valueBuf, conn))
+	{
+		termPQExpBuffer(&valueBuf);
+		return EOF;
+	}
+	/* And save it */
+	pqSaveParameterStatus(conn, conn->workBuffer.data, valueBuf.data);
+	termPQExpBuffer(&valueBuf);
+	return 0;
+}
+
+
+/*
+ * Attempt to read a Notify response message.
+ * This is possible in several places, so we break it out as a subroutine.
+ * Entry: 'A' message type and length have already been consumed.
+ * Exit: returns 0 if successfully consumed Notify message.
+ *		 returns EOF if not enough data.
+ */
+static int
+getNotify(PGconn *conn)
+{
+	int			be_pid;
+	char	   *svname;
+	int			nmlen;
+	int			extralen;
+	PGnotify   *newNotify;
+
+	if (pqGetInt(&be_pid, 4, conn))
+		return EOF;
+	if (pqGets(&conn->workBuffer, conn))
+		return EOF;
+	/* must save name while getting extra string */
+	svname = strdup(conn->workBuffer.data);
+	if (!svname)
+		return EOF;
+	if (pqGets(&conn->workBuffer, conn))
+	{
+		free(svname);
+		return EOF;
+	}
+
+	/*
+	 * Store the strings right after the PQnotify structure so it can all be
+	 * freed at once.  We don't use NAMEDATALEN because we don't want to tie
+	 * this interface to a specific server name length.
+	 */
+	nmlen = strlen(svname);
+	extralen = strlen(conn->workBuffer.data);
+	newNotify = (PGnotify *) malloc(sizeof(PGnotify) + nmlen + extralen + 2);
+	if (newNotify)
+	{
+		newNotify->relname = (char *) newNotify + sizeof(PGnotify);
+		strcpy(newNotify->relname, svname);
+		newNotify->extra = newNotify->relname + nmlen + 1;
+		strcpy(newNotify->extra, conn->workBuffer.data);
+		newNotify->be_pid = be_pid;
+		newNotify->next = NULL;
+		if (conn->notifyTail)
+			conn->notifyTail->next = newNotify;
+		else
+			conn->notifyHead = newNotify;
+		conn->notifyTail = newNotify;
+	}
+
+	free(svname);
+	return 0;
+}
+
+/*
+ * getCopyStart - process CopyInResponse, CopyOutResponse or
+ * CopyBothResponse message
+ *
+ * parseInput already read the message type and length.
+ */
+static int
+getCopyStart(PGconn *conn, ExecStatusType copytype)
+{
+	PGresult   *result;
+	int			nfields;
+	int			i;
+
+	result = PQmakeEmptyPGresult(conn, copytype);
+	if (!result)
+		goto failure;
+
+	if (pqGetc(&conn->copy_is_binary, conn))
+		goto failure;
+	result->binary = conn->copy_is_binary;
+	/* the next two bytes are the number of fields	*/
+	if (pqGetInt(&(result->numAttributes), 2, conn))
+		goto failure;
+	nfields = result->numAttributes;
+
+	/* allocate space for the attribute descriptors */
+	if (nfields > 0)
+	{
+		result->attDescs = (PGresAttDesc *)
+			pqResultAlloc(result, nfields * sizeof(PGresAttDesc), TRUE);
+		if (!result->attDescs)
+			goto failure;
+		MemSet(result->attDescs, 0, nfields * sizeof(PGresAttDesc));
+	}
+
+	for (i = 0; i < nfields; i++)
+	{
+		int			format;
+
+		if (pqGetInt(&format, 2, conn))
+			goto failure;
+
+		/*
+		 * Since pqGetInt treats 2-byte integers as unsigned, we need to
+		 * coerce these results to signed form.
+		 */
+		format = (int) ((int16) format);
+		result->attDescs[i].format = format;
+	}
+
+	/* Success! */
+	conn->result = result;
+	return 0;
+
+failure:
+	PQclear(result);
+	return EOF;
+}
+
+/*
+ * getReadyForQuery - process ReadyForQuery message
+ */
+static int
+getReadyForQuery(PGconn *conn)
+{
+	char		xact_status;
+
+	if (pqGetc(&xact_status, conn))
+		return EOF;
+	switch (xact_status)
+	{
+		case 'I':
+			conn->xactStatus = PQTRANS_IDLE;
+			break;
+		case 'T':
+			conn->xactStatus = PQTRANS_INTRANS;
+			break;
+		case 'E':
+			conn->xactStatus = PQTRANS_INERROR;
+			break;
+		default:
+			conn->xactStatus = PQTRANS_UNKNOWN;
+			break;
+	}
+
+	return 0;
+}
+
+/*
+ * getCopyDataMessage - fetch next CopyData message, process async messages
+ *
+ * Returns length word of CopyData message (> 0), or 0 if no complete
+ * message available, -1 if end of copy, -2 if error.
+ */
+static int
+getCopyDataMessage(PGconn *conn)
+{
+	char		id;
+	int			msgLength;
+	int			avail;
+
+	for (;;)
+	{
+		/*
+		 * Do we have the next input message?  To make life simpler for async
+		 * callers, we keep returning 0 until the next message is fully
+		 * available, even if it is not Copy Data.
+		 */
+		conn->inCursor = conn->inStart;
+		if (pqGetc(&id, conn))
+			return 0;
+		if (pqGetInt(&msgLength, 4, conn))
+			return 0;
+		if (msgLength < 4)
+		{
+			handleSyncLoss(conn, id, msgLength);
+			return -2;
+		}
+		avail = conn->inEnd - conn->inCursor;
+		if (avail < msgLength - 4)
+		{
+			/*
+			 * Before returning, enlarge the input buffer if needed to hold
+			 * the whole message.  See notes in parseInput.
+			 */
+			if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength - 4,
+									 conn))
+			{
+				/*
+				 * XXX add some better recovery code... plan is to skip over
+				 * the message using its length, then report an error. For the
+				 * moment, just treat this like loss of sync (which indeed it
+				 * might be!)
+				 */
+				handleSyncLoss(conn, id, msgLength);
+				return -2;
+			}
+			return 0;
+		}
+
+		/*
+		 * If it's a legitimate async message type, process it.  (NOTIFY
+		 * messages are not currently possible here, but we handle them for
+		 * completeness.)  Otherwise, if it's anything except Copy Data,
+		 * report end-of-copy.
+		 */
+		switch (id)
+		{
+			case 'A':			/* NOTIFY */
+				if (getNotify(conn))
+					return 0;
+				break;
+			case 'N':			/* NOTICE */
+				if (pqGetErrorNotice3(conn, false))
+					return 0;
+				break;
+			case 'S':			/* ParameterStatus */
+				if (getParameterStatus(conn))
+					return 0;
+				break;
+			case 'd':			/* Copy Data, pass it back to caller */
+				return msgLength;
+			default:			/* treat as end of copy */
+				return -1;
+		}
+
+		/* Drop the processed message and loop around for another */
+		conn->inStart = conn->inCursor;
+	}
+}
+
+/*
+ * PQgetCopyData - read a row of data from the backend during COPY OUT
+ * or COPY BOTH
+ *
+ * If successful, sets *buffer to point to a malloc'd row of data, and
+ * returns row length (always > 0) as result.
+ * Returns 0 if no row available yet (only possible if async is true),
+ * -1 if end of copy (consult PQgetResult), or -2 if error (consult
+ * PQerrorMessage).
+ */
+int
+pqGetCopyData3(PGconn *conn, char **buffer, int async)
+{
+	int			msgLength;
+
+	for (;;)
+	{
+		/*
+		 * Collect the next input message.	To make life simpler for async
+		 * callers, we keep returning 0 until the next message is fully
+		 * available, even if it is not Copy Data.
+		 */
+		msgLength = getCopyDataMessage(conn);
+		if (msgLength < 0)
+		{
+			/*
+			 * On end-of-copy, exit COPY_OUT or COPY_BOTH mode and let caller
+			 * read status with PQgetResult().	The normal case is that it's
+			 * Copy Done, but we let parseInput read that.	If error, we
+			 * expect the state was already changed.
+			 */
+			if (msgLength == -1)
+				conn->asyncStatus = PGASYNC_BUSY;
+			return msgLength;	/* end-of-copy or error */
+		}
+		if (msgLength == 0)
+		{
+			/* Don't block if async read requested */
+			if (async)
+				return 0;
+			/* Need to load more data */
+			if (pqWait(TRUE, FALSE, conn) ||
+				pqReadData(conn) < 0)
+				return -2;
+			continue;
+		}
+
+		/*
+		 * Drop zero-length messages (shouldn't happen anyway).  Otherwise
+		 * pass the data back to the caller.
+		 */
+		msgLength -= 4;
+		if (msgLength > 0)
+		{
+			*buffer = (char *) malloc(msgLength + 1);
+			if (*buffer == NULL)
+			{
+				printfPQExpBuffer(&conn->errorMessage,
+								  libpq_gettext("out of memory\n"));
+				return -2;
+			}
+			memcpy(*buffer, &conn->inBuffer[conn->inCursor], msgLength);
+			(*buffer)[msgLength] = '\0';		/* Add terminating null */
+
+			/* Mark message consumed */
+			conn->inStart = conn->inCursor + msgLength;
+
+			return msgLength;
+		}
+
+		/* Empty, so drop it and loop around for another */
+		conn->inStart = conn->inCursor;
+	}
+}
+
+/*
+ * PQgetline - gets a newline-terminated string from the backend.
+ *
+ * See fe-exec.c for documentation.
+ */
+int
+pqGetline3(PGconn *conn, char *s, int maxlen)
+{
+	int			status;
+
+	if (conn->sock < 0 ||
+		conn->asyncStatus != PGASYNC_COPY_OUT ||
+		conn->copy_is_binary)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext("PQgetline: not doing text COPY OUT\n"));
+		*s = '\0';
+		return EOF;
+	}
+
+	while ((status = PQgetlineAsync(conn, s, maxlen - 1)) == 0)
+	{
+		/* need to load more data */
+		if (pqWait(TRUE, FALSE, conn) ||
+			pqReadData(conn) < 0)
+		{
+			*s = '\0';
+			return EOF;
+		}
+	}
+
+	if (status < 0)
+	{
+		/* End of copy detected; gin up old-style terminator */
+		strcpy(s, "\\.");
+		return 0;
+	}
+
+	/* Add null terminator, and strip trailing \n if present */
+	if (s[status - 1] == '\n')
+	{
+		s[status - 1] = '\0';
+		return 0;
+	}
+	else
+	{
+		s[status] = '\0';
+		return 1;
+	}
+}
+
+/*
+ * PQgetlineAsync - gets a COPY data row without blocking.
+ *
+ * See fe-exec.c for documentation.
+ */
+int
+pqGetlineAsync3(PGconn *conn, char *buffer, int bufsize)
+{
+	int			msgLength;
+	int			avail;
+
+	if (conn->asyncStatus != PGASYNC_COPY_OUT)
+		return -1;				/* we are not doing a copy... */
+
+	/*
+	 * Recognize the next input message.  To make life simpler for async
+	 * callers, we keep returning 0 until the next message is fully available
+	 * even if it is not Copy Data.  This should keep PQendcopy from blocking.
+	 * (Note: unlike pqGetCopyData3, we do not change asyncStatus here.)
+	 */
+	msgLength = getCopyDataMessage(conn);
+	if (msgLength < 0)
+		return -1;				/* end-of-copy or error */
+	if (msgLength == 0)
+		return 0;				/* no data yet */
+
+	/*
+	 * Move data from libpq's buffer to the caller's.  In the case where a
+	 * prior call found the caller's buffer too small, we use
+	 * conn->copy_already_done to remember how much of the row was already
+	 * returned to the caller.
+	 */
+	conn->inCursor += conn->copy_already_done;
+	avail = msgLength - 4 - conn->copy_already_done;
+	if (avail <= bufsize)
+	{
+		/* Able to consume the whole message */
+		memcpy(buffer, &conn->inBuffer[conn->inCursor], avail);
+		/* Mark message consumed */
+		conn->inStart = conn->inCursor + avail;
+		/* Reset state for next time */
+		conn->copy_already_done = 0;
+		return avail;
+	}
+	else
+	{
+		/* We must return a partial message */
+		memcpy(buffer, &conn->inBuffer[conn->inCursor], bufsize);
+		/* The message is NOT consumed from libpq's buffer */
+		conn->copy_already_done += bufsize;
+		return bufsize;
+	}
+}
+
+/*
+ * PQendcopy
+ *
+ * See fe-exec.c for documentation.
+ */
+int
+pqEndcopy3(PGconn *conn)
+{
+	PGresult   *result;
+
+	if (conn->asyncStatus != PGASYNC_COPY_IN &&
+		conn->asyncStatus != PGASYNC_COPY_OUT)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("no COPY in progress\n"));
+		return 1;
+	}
+
+	/* Send the CopyDone message if needed */
+	if (conn->asyncStatus == PGASYNC_COPY_IN)
+	{
+		if (pqPutMsgStart('c', false, conn) < 0 ||
+			pqPutMsgEnd(conn) < 0)
+			return 1;
+
+		/*
+		 * If we sent the COPY command in extended-query mode, we must issue a
+		 * Sync as well.
+		 */
+		if (conn->queryclass != PGQUERY_SIMPLE)
+		{
+			if (pqPutMsgStart('S', false, conn) < 0 ||
+				pqPutMsgEnd(conn) < 0)
+				return 1;
+		}
+	}
+
+	/*
+	 * make sure no data is waiting to be sent, abort if we are non-blocking
+	 * and the flush fails
+	 */
+	if (pqFlush(conn) && pqIsnonblocking(conn))
+		return 1;
+
+	/* Return to active duty */
+	conn->asyncStatus = PGASYNC_BUSY;
+	resetPQExpBuffer(&conn->errorMessage);
+
+	/*
+	 * Non blocking connections may have to abort at this point.  If everyone
+	 * played the game there should be no problem, but in error scenarios the
+	 * expected messages may not have arrived yet.	(We are assuming that the
+	 * backend's packetizing will ensure that CommandComplete arrives along
+	 * with the CopyDone; are there corner cases where that doesn't happen?)
+	 */
+	if (pqIsnonblocking(conn) && PQisBusy(conn))
+		return 1;
+
+	/* Wait for the completion response */
+	result = PQgetResult(conn);
+
+	/* Expecting a successful result */
+	if (result && result->resultStatus == PGRES_COMMAND_OK)
+	{
+		PQclear(result);
+		return 0;
+	}
+
+	/*
+	 * Trouble. For backwards-compatibility reasons, we issue the error
+	 * message as if it were a notice (would be nice to get rid of this
+	 * silliness, but too many apps probably don't handle errors from
+	 * PQendcopy reasonably).  Note that the app can still obtain the error
+	 * status from the PGconn object.
+	 */
+	if (conn->errorMessage.len > 0)
+	{
+		/* We have to strip the trailing newline ... pain in neck... */
+		char		svLast = conn->errorMessage.data[conn->errorMessage.len - 1];
+
+		if (svLast == '\n')
+			conn->errorMessage.data[conn->errorMessage.len - 1] = '\0';
+		pqInternalNotice(&conn->noticeHooks, "%s", conn->errorMessage.data);
+		conn->errorMessage.data[conn->errorMessage.len - 1] = svLast;
+	}
+
+	PQclear(result);
+
+	return 1;
+}
+
+
+/*
+ * PQfn - Send a function call to the POSTGRES backend.
+ *
+ * See fe-exec.c for documentation.
+ */
+PGresult *
+pqFunctionCall3(PGconn *conn, Oid fnid,
+				int *result_buf, int *actual_result_len,
+				int result_is_int,
+				const PQArgBlock *args, int nargs)
+{
+	bool		needInput = false;
+	ExecStatusType status = PGRES_FATAL_ERROR;
+	char		id;
+	int			msgLength;
+	int			avail;
+	int			i;
+
+	/* PQfn already validated connection state */
+
+	if (pqPutMsgStart('F', false, conn) < 0 ||	/* function call msg */
+		pqPutInt(fnid, 4, conn) < 0 ||	/* function id */
+		pqPutInt(1, 2, conn) < 0 ||		/* # of format codes */
+		pqPutInt(1, 2, conn) < 0 ||		/* format code: BINARY */
+		pqPutInt(nargs, 2, conn) < 0)	/* # of args */
+	{
+		pqHandleSendFailure(conn);
+		return NULL;
+	}
+
+	for (i = 0; i < nargs; ++i)
+	{							/* len.int4 + contents	   */
+		if (pqPutInt(args[i].len, 4, conn))
+		{
+			pqHandleSendFailure(conn);
+			return NULL;
+		}
+		if (args[i].len == -1)
+			continue;			/* it's NULL */
+
+		if (args[i].isint)
+		{
+			if (pqPutInt(args[i].u.integer, args[i].len, conn))
+			{
+				pqHandleSendFailure(conn);
+				return NULL;
+			}
+		}
+		else
+		{
+			if (pqPutnchar((char *) args[i].u.ptr, args[i].len, conn))
+			{
+				pqHandleSendFailure(conn);
+				return NULL;
+			}
+		}
+	}
+
+	if (pqPutInt(1, 2, conn) < 0)		/* result format code: BINARY */
+	{
+		pqHandleSendFailure(conn);
+		return NULL;
+	}
+
+	if (pqPutMsgEnd(conn) < 0 ||
+		pqFlush(conn))
+	{
+		pqHandleSendFailure(conn);
+		return NULL;
+	}
+
+	for (;;)
+	{
+		if (needInput)
+		{
+			/* Wait for some data to arrive (or for the channel to close) */
+			if (pqWait(TRUE, FALSE, conn) ||
+				pqReadData(conn) < 0)
+				break;
+		}
+
+		/*
+		 * Scan the message. If we run out of data, loop around to try again.
+		 */
+		needInput = true;
+
+		conn->inCursor = conn->inStart;
+		if (pqGetc(&id, conn))
+			continue;
+		if (pqGetInt(&msgLength, 4, conn))
+			continue;
+
+		/*
+		 * Try to validate message type/length here.  A length less than 4 is
+		 * definitely broken.  Large lengths should only be believed for a few
+		 * message types.
+		 */
+		if (msgLength < 4)
+		{
+			handleSyncLoss(conn, id, msgLength);
+			break;
+		}
+		if (msgLength > 30000 && !VALID_LONG_MESSAGE_TYPE(id))
+		{
+			handleSyncLoss(conn, id, msgLength);
+			break;
+		}
+
+		/*
+		 * Can't process if message body isn't all here yet.
+		 */
+		msgLength -= 4;
+		avail = conn->inEnd - conn->inCursor;
+		if (avail < msgLength)
+		{
+			/*
+			 * Before looping, enlarge the input buffer if needed to hold the
+			 * whole message.  See notes in parseInput.
+			 */
+			if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength,
+									 conn))
+			{
+				/*
+				 * XXX add some better recovery code... plan is to skip over
+				 * the message using its length, then report an error. For the
+				 * moment, just treat this like loss of sync (which indeed it
+				 * might be!)
+				 */
+				handleSyncLoss(conn, id, msgLength);
+				break;
+			}
+			continue;
+		}
+
+		/*
+		 * We should see V or E response to the command, but might get N
+		 * and/or A notices first. We also need to swallow the final Z before
+		 * returning.
+		 */
+		switch (id)
+		{
+			case 'V':			/* function result */
+				if (pqGetInt(actual_result_len, 4, conn))
+					continue;
+				if (*actual_result_len != -1)
+				{
+					if (result_is_int)
+					{
+						if (pqGetInt(result_buf, *actual_result_len, conn))
+							continue;
+					}
+					else
+					{
+						if (pqGetnchar((char *) result_buf,
+									   *actual_result_len,
+									   conn))
+							continue;
+					}
+				}
+				/* correctly finished function result message */
+				status = PGRES_COMMAND_OK;
+				break;
+			case 'E':			/* error return */
+				if (pqGetErrorNotice3(conn, true))
+					continue;
+				status = PGRES_FATAL_ERROR;
+				break;
+			case 'A':			/* notify message */
+				/* handle notify and go back to processing return values */
+				if (getNotify(conn))
+					continue;
+				break;
+			case 'N':			/* notice */
+				/* handle notice and go back to processing return values */
+				if (pqGetErrorNotice3(conn, false))
+					continue;
+				break;
+			case 'Z':			/* backend is ready for new query */
+				if (getReadyForQuery(conn))
+					continue;
+				/* consume the message and exit */
+				conn->inStart += 5 + msgLength;
+				/* if we saved a result object (probably an error), use it */
+				if (conn->result)
+					return pqPrepareAsyncResult(conn);
+				return PQmakeEmptyPGresult(conn, status);
+			case 'S':			/* parameter status */
+				if (getParameterStatus(conn))
+					continue;
+				break;
+			default:
+				/* The backend violates the protocol. */
+				printfPQExpBuffer(&conn->errorMessage,
+								  libpq_gettext("protocol error: id=0x%x\n"),
+								  id);
+				pqSaveErrorResult(conn);
+				/* trust the specified message length as what to skip */
+				conn->inStart += 5 + msgLength;
+				return pqPrepareAsyncResult(conn);
+		}
+		/* Completed this message, keep going */
+		/* trust the specified message length as what to skip */
+		conn->inStart += 5 + msgLength;
+		needInput = false;
+	}
+
+	/*
+	 * We fall out of the loop only upon failing to read data.
+	 * conn->errorMessage has been set by pqWait or pqReadData. We want to
+	 * append it to any already-received error message.
+	 */
+	pqSaveErrorResult(conn);
+	return pqPrepareAsyncResult(conn);
+}
+
+
+/*
+ * Construct startup packet
+ *
+ * Returns a malloc'd packet buffer, or NULL if out of memory
+ */
+char *
+pqBuildStartupPacket3(PGconn *conn, int *packetlen,
+					  const PQEnvironmentOption *options)
+{
+	char	   *startpacket;
+
+	*packetlen = build_startup_packet(conn, NULL, options);
+	startpacket = (char *) malloc(*packetlen);
+	if (!startpacket)
+		return NULL;
+	*packetlen = build_startup_packet(conn, startpacket, options);
+	return startpacket;
+}
+
+/*
+ * Build a startup packet given a filled-in PGconn structure.
+ *
+ * We need to figure out how much space is needed, then fill it in.
+ * To avoid duplicate logic, this routine is called twice: the first time
+ * (with packet == NULL) just counts the space needed, the second time
+ * (with packet == allocated space) fills it in.  Return value is the number
+ * of bytes used.
+ */
+static int
+build_startup_packet(const PGconn *conn, char *packet,
+					 const PQEnvironmentOption *options)
+{
+	int			packet_len = 0;
+	const PQEnvironmentOption *next_eo;
+	const char *val;
+
+	/* Protocol version comes first. */
+	if (packet)
+	{
+		ProtocolVersion pv = htonl(conn->pversion);
+
+		memcpy(packet + packet_len, &pv, sizeof(ProtocolVersion));
+	}
+	packet_len += sizeof(ProtocolVersion);
+
+	/* Add user name, database name, options */
+
+#define ADD_STARTUP_OPTION(optname, optval) \
+	do { \
+		if (packet) \
+			strcpy(packet + packet_len, optname); \
+		packet_len += strlen(optname) + 1; \
+		if (packet) \
+			strcpy(packet + packet_len, optval); \
+		packet_len += strlen(optval) + 1; \
+	} while(0)
+
+	if (conn->pguser && conn->pguser[0])
+		ADD_STARTUP_OPTION("user", conn->pguser);
+	if (conn->dbName && conn->dbName[0])
+		ADD_STARTUP_OPTION("database", conn->dbName);
+	if (conn->replication && conn->replication[0])
+		ADD_STARTUP_OPTION("replication", conn->replication);
+	if (conn->pgoptions && conn->pgoptions[0])
+		ADD_STARTUP_OPTION("options", conn->pgoptions);
+	if (conn->send_appname)
+	{
+		/* Use appname if present, otherwise use fallback */
+		val = conn->appname ? conn->appname : conn->fbappname;
+		if (val && val[0])
+			ADD_STARTUP_OPTION("application_name", val);
+	}
+
+	if (conn->client_encoding_initial && conn->client_encoding_initial[0])
+		ADD_STARTUP_OPTION("client_encoding", conn->client_encoding_initial);
+
+	/* Add any environment-driven GUC settings needed */
+	for (next_eo = options; next_eo->envName; next_eo++)
+	{
+		if ((val = getenv(next_eo->envName)) != NULL)
+		{
+			if (pg_strcasecmp(val, "default") != 0)
+				ADD_STARTUP_OPTION(next_eo->pgName, val);
+		}
+	}
+
+	/* Add trailing terminator */
+	if (packet)
+		packet[packet_len] = '\0';
+	packet_len++;
+
+	return packet_len;
+}
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..23cf729 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQsetRowProcessor	  161
+PQsetRowProcessorErrMsg	  162
+PQskipRemainingResults	  163
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)
 	conn->wait_ssl_try = false;
 #endif
 
+	/* set default row processor */
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
 	 * buffers on Unix systems.  That way, when we are sending a large amount
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)
 	initPQExpBuffer(&conn->errorMessage);
 	initPQExpBuffer(&conn->workBuffer);
 
+	/* set up initial row buffer */
+	conn->rowBufLen = 32;
+	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+
 	if (conn->inBuffer == NULL ||
 		conn->outBuffer == NULL ||
+		conn->rowBuf == NULL ||
 		PQExpBufferBroken(&conn->errorMessage) ||
 		PQExpBufferBroken(&conn->workBuffer))
 	{
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)
 		free(conn->inBuffer);
 	if (conn->outBuffer)
 		free(conn->outBuffer);
+	if (conn->rowBuf)
+		free(conn->rowBuf);
 	termPQExpBuffer(&conn->errorMessage);
 	termPQExpBuffer(&conn->workBuffer);
 
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..d7f3ae9 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
 
 
 /* ----------------
@@ -160,6 +161,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->rowProcessorErrMsg = NULL;
 
 	if (conn)
 	{
@@ -701,7 +703,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +757,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,6 +828,73 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
+
+/*
+ * PQsetRowProcessorErrMsg
+ *    Set the error message pass back to the caller of RowProcessor.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+{
+	if (msg)
+		res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+	else
+		res->rowProcessorErrMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+{
+	PGresAttValue *tup;
+	int			nfields = res->numAttributes;
+	int			i;
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL)
+		return FALSE;
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+				return FALSE;
+
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
+	return pqAddTuple(res, tup);
+}
+
+/*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
  *	  Returns TRUE if OK, FALSE if not enough memory to add the row
@@ -1223,7 +1290,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -1832,6 +1898,22 @@ PQexecFinish(PGconn *conn)
 }
 
 /*
+ * Exaust remaining Data Rows in curret conn.
+ */
+PGresult *
+PQskipRemainingResults(PGconn *conn)
+{
+	pqClearAsyncResult(conn);
+
+	/* conn->result is set NULL in pqClearAsyncResult() */
+	pqSaveErrorResult(conn);
+
+	/* Skip over remaining Data Row messages */
+	PQgetResult(conn);
+}
+
+
+/*
  * PQdescribePrepared
  *	  Obtain information about a previously prepared statement
  *
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..ae4d7b0 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, FALSE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, TRUE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -703,52 +707,51 @@ failure:
 
 /*
  * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
 	char	   *bitmap = std_bitmap;
 	int			i;
+	int			rp;
 	size_t		nbytes;			/* the number of bytes in bitmap  */
 	char		bmap;			/* One byte of the bitmap */
 	int			bitmap_index;	/* Its index */
 	int			bitcnt;			/* number of bits examined in current byte */
 	int			vlen;			/* length of the current field value */
 
+	/* resize row buffer if needed */
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+			goto rowProcessError;
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+	else
+	{
+		rowbuf = conn->rowBuf;
+	}
+
 	result->binary = binary;
 
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	if (binary)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
-		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
-		}
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,7 +760,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+			goto rowProcessError;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +774,29 @@ getAnotherTuple(PGconn *conn, bool binary)
 	for (i = 0; i < nfields; i++)
 	{
 		if (!(bmap & 0200))
-		{
-			/* if the field value is absent, make it a null string */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-		}
+			vlen = NULL_LEN;
+		else if (pqGetInt(&vlen, 4, conn))
+				goto EOFexit;
 		else
 		{
-			/* get the value length (the first four bytes are for length) */
-			if (pqGetInt(&vlen, 4, conn))
-				goto EOFexit;
 			if (!binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
-			tup[i].len = vlen;
-			/* read in the value */
-			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
+
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip the value */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto EOFexit;
+
 		/* advance the bitmap stuff */
 		bitcnt++;
 		if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +809,56 @@ getAnotherTuple(PGconn *conn, bool binary)
 			bmap <<= 1;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
-
 	if (bitmap != std_bitmap)
 		free(bitmap);
-	return 0;
+	bitmap = NULL;
+
+	/* tag the row as parsed */
+	conn->inStart = conn->inCursor;
+
+	/* Pass the completed row values to rowProcessor */
+	rp= conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
+	if (rp == 1)
+		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
+	else if (rp != 0)
+		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
+
+rowProcessError:
 
-outOfMemory:
 	/* Replace partially constructed result with an error result */
 
-	/*
-	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
-	 * there's not enough memory to concatenate messages...
-	 */
-	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	if (result->rowProcessorErrMsg)
+	{
+		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+		pqSaveErrorResult(conn);
+	}
+	else
+	{
+		/*
+		 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
+		 * there's not enough memory to concatenate messages...
+		 */
+		pqClearAsyncResult(conn);
+		resetPQExpBuffer(&conn->errorMessage);
 
-	/*
-	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
-	 * do to recover...
-	 */
-	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+		/*
+		 * If error message is passed from RowProcessor, set it into
+		 * PGconn, assume out of memory if not.
+		 */
+		appendPQExpBufferStr(&conn->errorMessage,
+							 libpq_gettext("out of memory for query result\n"));
+
+		/*
+		 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
+		 * do to recover...
+		 */
+		conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+	}
 	conn->asyncStatus = PGASYNC_READY;
+
 	/* Discard the failed message --- good idea? */
 	conn->inStart = conn->inEnd;
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..0260ba6 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, msgLength))
 							return;
+
+						/* getAnotherTuple() moves inStart itself */
+						continue;
 					}
 					else if (conn->result != NULL &&
 							 conn->result->resultStatus == PGRES_FATAL_ERROR)
@@ -613,33 +616,22 @@ failure:
 
 /*
  * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
+	int			rp;
 
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
@@ -652,52 +644,88 @@ getAnotherTuple(PGconn *conn, int msgLength)
 				 libpq_gettext("unexpected field count in \"D\" message\n"));
 		pqSaveErrorResult(conn);
 		/* Discard the failed message by pretending we read it */
-		conn->inCursor = conn->inStart + 5 + msgLength;
+		conn->inStart += 5 + msgLength;
 		return 0;
 	}
 
+	/* resize row buffer if needed */
+	rowbuf = conn->rowBuf;
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+		{
+			goto outOfMemory1;
+		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+
 	/* Scan the fields */
 	for (i = 0; i < nfields; i++)
 	{
 		/* get the value length */
 		if (pqGetInt(&vlen, 4, conn))
-			return EOF;
+			goto protocolError;
 		if (vlen == -1)
-		{
-			/* null field */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-			continue;
-		}
-		if (vlen < 0)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
-		}
-		tup[i].len = vlen;
-		/* read in the value */
-		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto protocolError;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/* tag the row as parsed, check if correctly */
+	conn->inStart += 5 + msgLength;
+	if (conn->inCursor != conn->inStart)
+		goto protocolError;
 
+	/* Pass the completed row values to rowProcessor */
+	rp = conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
+	if (rp == 1)
+	{
+		/* everything is good */
+		return 0;
+	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
+
+	/* there was some problem */
+	if (rp == 0)
+	{
+		if (result->rowProcessorErrMsg == NULL)
+			goto outOfMemory2;
+
+		/* use supplied error message */
+		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+	}
+	else
+	{
+		/* row processor messed up */
+		printfPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("invalid return value from row processor\n"));
+	}
+	pqSaveErrorResult(conn);
 	return 0;
 
-outOfMemory:
+outOfMemory1:
+	/* Discard the failed message by pretending we read it */
+	conn->inStart += 5 + msgLength;
 
+outOfMemory2:
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
@@ -706,9 +734,14 @@ outOfMemory:
 	printfPQExpBuffer(&conn->errorMessage,
 					  libpq_gettext("out of memory for query result\n"));
 	pqSaveErrorResult(conn);
+	return 0;
 
+protocolError:
+	printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext("invalid row contents\n"));
+	pqSaveErrorResult(conn);
 	/* Discard the failed message by pretending we read it */
-	conn->inCursor = conn->inStart + 5 + msgLength;
+	conn->inStart += 5 + msgLength;
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..2d913c4 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGrowValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -367,6 +378,8 @@ extern PGresult *PQexecPrepared(PGconn *conn,
 			   const int *paramFormats,
 			   int resultFormat);
 
+extern PGresult *PQskipRemainingResults(PGconn *conn);
+
 /* Interface for multiple-result or asynchronous queries */
 extern int	PQsendQuery(PGconn *conn, const char *query);
 extern int PQsendQueryParams(PGconn *conn,
@@ -416,6 +429,37 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQsetRowProcessorErrMsg.  It
+ * is assumed by caller as out of memory when the error message is not
+ * set on failure. This function is assumed not to throw any
+ * exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, void *param,
+								PGrowValue *columns);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+								   void *rowProcessorParam);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +498,7 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void	PQsetRowProcessorErrMsg(PGresult *res, char *msg);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..1fc5aab 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -209,6 +209,9 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
+
+	/* temp etorage for message from row processor callback */
+	char	   *rowProcessorErrMsg;
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -398,7 +401,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +445,14 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+	/*
+	 * Read column data from network buffer.
+	 */
+	PQrowProcessor rowProcessor;/* Function pointer */
+	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+	int rowBufLen;				/* Number of columns allocated in rowBuf */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -560,6 +570,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120216.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..9cb6d65 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,274 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PQresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PQresult</type> is passed to user.  This behaviour can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the row processor function.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, void *param, PGrowValue *columns);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.
+       On failure this function should set the error message
+       with <function>PGsetRowProcessorErrMsg</function> if the cause
+       is other than out of memory.
+       When non-blocking API is in use, it can also return 2
+       for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter> and <parameter>columns</parameter>
+       values will stay valid so row can be processed outside of callback.
+       Caller is resposible for tracking whether the <parameter>PQisBusy</parameter>
+       returned early from callback or for other reasons.
+       Usually this should happen via setting cached values to NULL
+       before calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed via <function>PQfinish</function>.
+       Processing can also be continued without closing the connection,
+       call <function>getResult</function> on syncronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaing rows
+       by calling <function>PQskipRemainingResults</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     Column values of the row to process.  Column values
+	     are located in network buffer, the processor must
+	     copy them out from there.
+	   </para>
+	   <para>
+	     Column values are not null-terminated, so processor cannot
+	     use C string functions on them directly.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipremaingresults">
+    <term>
+     <function>PQskipRemainigResults</function>
+     <indexterm>
+      <primary>PQskipRemainigResults</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+		Discard all the remaining row data
+		after <function>PQexec</function>
+		or <function>PQgetResult</function> exits by the exception raised
+		in <type>RowProcessor</type> without closing connection.
+      </para>
+      <para>
+		Do not call this function when the functions above return normally.
+<synopsis>
+void PQskipRemainingResults(PGconn *conn)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+    <term>
+     <function>PQsetRowProcessorErrMsg</function>
+     <indexterm>
+      <primary>PQsetRowProcessorErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>PQrowProcessor</type>.  If this message is not set, the
+	caller assumes the error to be out of memory.
+<synopsis>
+void PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>PQrowProcessor</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>msg</parameter></term>
+	    <listitem>
+	      <para>
+		Error message. This will be copied internally so there is
+		no need to care of the scope.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it will be overwritten. Set NULL to cancel the the custom
+		message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120216.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..b1c171a 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,24 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+	ErrorData *edata;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +103,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +520,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +577,36 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
 	res = PQexec(conn, buf.data);
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+		else if (storeinfo.edata)
+			ReThrowError(storeinfo.edata);
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +618,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +679,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +755,249 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
+	}
+	PG_CATCH();
+	{
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipRemainingResults(conn);
+		storeinfo.error_occurred = TRUE;
 	}
+	PG_END_TRY();
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+	finishStoreInfo(&storeinfo);
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+			else if (storeinfo.edata)
+				ReThrowError(storeinfo.edata);
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-	PG_TRY();
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->edata = NULL;
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+	if (sinfo->valbuf)
+		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuflen)
+		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+	if (sinfo->cstrs == NULL)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		if (sinfo->valbuf)
+			free(sinfo->valbuf);
+		if (sinfo->valbuflen)
+			free(sinfo->valbuflen);
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
 
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
+				free(sinfo->valbuf[i]);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->cstrs)
+	{
+		free(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      **cstrs = sinfo->cstrs;
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->error_occurred)
+		return FALSE;
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return FALSE;
+	}
+
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			char *tmp = sinfo->valbuf[i];
+			int tmplen = sinfo->valbuflen[i];
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+			/*
+			 * Divide calls to malloc and realloc so that things will
+			 * go fine even on the systems of which realloc() does not
+			 * accept NULL as old memory block.
+			 *
+			 * Also try to (re)allocate in bigger steps to
+			 * avoid flood of allocations on weird data.
+			 */
+			if (tmp == NULL)
+			{
+				tmplen = len + 1;
+				if (tmplen < 64)
+					tmplen = 64;
+				tmp = (char *)malloc(tmplen);
+			}
+			else if (tmplen < len + 1)
+			{
+				if (len + 1 > tmplen * 2)
+					tmplen = len + 1;
+				else
+					tmplen = tmplen * 2;
+				tmp = (char *)realloc(tmp, tmplen);
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+			/*
+			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
+			 * when realloc returns NULL.
+			 */
+			if (tmp == NULL)
+				return FALSE;
 
-		PQclear(res);
-	}
-	PG_CATCH();
-	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+			sinfo->valbuf[i] = tmp;
+			sinfo->valbuflen[i] = tmplen;
+
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return TRUE;
 }
 
 /*
early_exit_20120216.difftext/x-patch; charset=us-asciiDownload
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index de95ea8..9cb6d65 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7352,6 +7352,14 @@ typedef struct
        On failure this function should set the error message
        with <function>PGsetRowProcessorErrMsg</function> if the cause
        is other than out of memory.
+       When non-blocking API is in use, it can also return 2
+       for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter> and <parameter>columns</parameter>
+       values will stay valid so row can be processed outside of callback.
+       Caller is resposible for tracking whether the <parameter>PQisBusy</parameter>
+       returned early from callback or for other reasons.
+       Usually this should happen via setting cached values to NULL
+       before calling <function>PQisBusy</function>.
      </para>
 
      <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index 7498580..ae4d7b0 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -820,6 +820,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	rp= conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
 	if (rp == 1)
 		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
 	else if (rp != 0)
 		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
 
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index a67e3ac..0260ba6 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -697,6 +697,11 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* everything is good */
 		return 0;
 	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
 
 	/* there was some problem */
 	if (rp == 0)
#50Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#49)
Re: Speed dblink using alternate libpq tuple storage

On Thu, Feb 16, 2012 at 05:49:34PM +0900, Kyotaro HORIGUCHI wrote:

I added the function PQskipRemainingResult() and use it in
dblink. This reduces the number of executing try-catch block from
the number of rows to one per query in dblink.

This implementation is wrong - you must not simply call PQgetResult()
when connection is in async mode. And the resulting PGresult must
be freed.

Please just make PGsetRowProcessorErrMsg() callable outside from
callback. That also makes clear to code that sees final PGresult
what happened. As a bonus, this will simply make the function
more robust and less special.

Although it's easy to create some PQsetRowSkipFlag() function
that will silently skip remaining rows, I think such logic
is best left to user to handle. It creates unnecessary special
case when handling final PGresult, so in the big picture
it creates confusion.

This patch is based on the patch above and composed in the same
manner - main three patches include all modifications and the '2'
patch separately.

I think there is not need to carry the early-exit patch separately.
It is visible in archives and nobody screamed about it yet,
so I guess it's acceptable. Also it's so simple, there is no
point in spending time rebasing it.

diff --git a/src/interfaces/libpq/#fe-protocol3.c# b/src/interfaces/libpq/#fe-protocol3.c#

There is some backup file in your git repo.

--
marko

#51Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#50)
Re: Speed dblink using alternate libpq tuple storage

Demos/tests of the new API:

https://github.com/markokr/libpq-rowproc-demos

Comments resulting from that:

- PQsetRowProcessorErrMsg() should take const char*

- callback API should be (void *, PGresult *, PQrowValue*)
or void* at the end, but not in the middle

I have not looked yet what needs to be done to get
ErrMsg callable outside of callback, if it requires PGconn,
then we should add PGconn also to callback args.

On Thu, Feb 16, 2012 at 05:49:34PM +0900, Kyotaro HORIGUCHI wrote:

 I added the function PQskipRemainingResult() and use it in
dblink. This reduces the number of executing try-catch block from
the number of rows to one per query in dblink.

I still think we don't need extra skipping function.

Yes, the callback function needs have a flag to know that
rows need to be skip, but for such low-level API it does
not seem to be that hard requirement.

If this really needs to be made easier then getRowProcessor
might be better approach, to allow easy implementation
of generic skipping func for user.

--
marko

#52Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#48)
Re: Speed dblink using alternate libpq tuple storage

Hello,

I don't have any attachment to PQskipRemainingResults(), but I
think that some formal method to skip until Command-Complete('C')
without breaking session is necessary because current libpq does
so.

====

On Thu, Feb 16, 2012 at 02:24:19PM +0900, Kyotaro HORIGUCHI wrote:

The choices of the libpq user on that point are,

- Continue to read succeeding tuples.
- Throw away the remaining results.

There is also third choice, which may be even more popular than
those ones - PQfinish().

That's it. I've implicitly assumed not to tear off the current
session.

I think we already have such function - PQsetRowProcessor().
Considering the user can use that to install skipping callback
or simply set some flag in it's own per-connection state,

PQsetRowProcessor() sets row processor not to PGresult but
PGconn. So using PGsetRowProcessor() makes no sense for the
PGresult on which the user currently working. Another interface
function is needed to do that on PGresult.

But of course the users can do that by signalling using flags
within their code without PQsetRowProcessor or
PQskipRemainingResults.

Or returning to the beginning implement using PG_TRY() to inhibit
longjmp out of the row processor itself makes that possible too.

Altough it is possible in that ways, it needs (currently)
undocumented (in human-readable langs :-) knowledge about the
client protocol and the handling manner of that in libpq which
might be changed with no description in the release notes.

I suspect the need is not that big.

I think so, too. But current implement of libpq does so for the
out-of-memory on receiving result rows. So I think some formal
(documented, in other words) way to do that is indispensable.

But if you want to set error state for skipping, I would instead
generalize PQsetRowProcessorErrMsg() to support setting error state
outside of callback. That would also help the external processing with
'return 2'. But I would keep the requirement that row processing must
be ongoing, standalone error setting does not make sense. Thus the name
can stay.

mmm.. I consider that the cause of the problem proposed here is
the exceptions raised by certain server-side functions called in
row processor especially in C/C++ code. And we shouldn't use
PG_TRY() to catch it there where is too frequently executed. I
think 'return 2' is not applicable for the case. Some aid for
non-local exit from row processors (PQexec and the link from
users's sight) is needed. And I think it should be formal.

There seems to be 2 ways to do it:

1) Replace the PGresult under PGconn. This seems ot require that
PQsetRowProcessorErrMsg takes PGconn as argument instead of
PGresult. This also means callback should get PGconn as
argument. Kind of makes sense even.

2) Convert current partial PGresult to error state. That also
makes sense, current use ->rowProcessorErrMsg to transport
the message to later set the error in caller feels weird.

I guess I slightly prefer 2) to 1).

The former might be inappropreate from the point of view of the
`undocumented knowledge' above. The latter seems missing the
point about exceptions.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#53Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#52)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 21, 2012 at 02:11:35PM +0900, Kyotaro HORIGUCHI wrote:

I don't have any attachment to PQskipRemainingResults(), but I
think that some formal method to skip until Command-Complete('C')
without breaking session is necessary because current libpq does
so.

We have such function: PQgetResult(). Only question is how
to flag that the rows should be dropped.

I think we already have such function - PQsetRowProcessor().
Considering the user can use that to install skipping callback
or simply set some flag in it's own per-connection state,

PQsetRowProcessor() sets row processor not to PGresult but
PGconn. So using PGsetRowProcessor() makes no sense for the
PGresult on which the user currently working. Another interface
function is needed to do that on PGresult.

If we are talking about skipping incoming result rows,
it's PGconn feature, not PGresult. Because you want to do
network traffic for that, yes?

Also, as row handler is on connection, then changing it's
behavior is connection context, not result.

But of course the users can do that by signalling using flags
within their code without PQsetRowProcessor or
PQskipRemainingResults.

Or returning to the beginning implement using PG_TRY() to inhibit
longjmp out of the row processor itself makes that possible too.

Altough it is possible in that ways, it needs (currently)
undocumented (in human-readable langs :-) knowledge about the
client protocol and the handling manner of that in libpq which
might be changed with no description in the release notes.

You might be right that how to do it may not be obvious.

Ok, lets see how it looks. But please do it like this:

- PQgetRowProcessor() that returns existing row processor.

- PQskipResult() that just replaces row processor, then calls
PQgetResult() to eat the result. It's behaviour fully
matches PQgetResult() then.

I guess the main thing that annoys me with skipping is that
it would require additional magic flag inside libpq.
With PQgetRowProcessor() it does not need anymore, it's
just a helper function that user can implement as well.

Only question here is what should happen if there are
several incoming resultsets (several queries in PQexec).
Should we leave to user to loop over them?

Seems there is 2 approaches here:

1) int PQskipResult(PGconn)
2) int PQskipResult(PGconn, int skipAll)

Both cases returning:
1 - got resultset, there might be more
0 - PQgetResult() returned NULL, connection is empty
-1 - error

Although 1) mirrors regular PGgetResult() better, most users
might not know that function as they are using sync API.
They have simpler time with 2). So 2) then?

--
marko

#54Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#53)
Re: Speed dblink using alternate libpq tuple storage

Hello,

I don't have any attachment to PQskipRemainingResults(), but I
think that some formal method to skip until Command-Complete('C')
without breaking session is necessary because current libpq does
so.

We have such function: PQgetResult(). Only question is how
to flag that the rows should be dropped.

I agree with it. I did this by conn->result->resultStatus ==
PGRES_FATAL_ERROR that instructs pqParseInput[23]() to skip
calling getAnotherTuple() but another means to do the same thing
without marking error is needed.

Also, as row handler is on connection, then changing it's
behavior is connection context, not result.

OK, current implement copying the pointer to the row processor
from PGconn to PGresult and getAnotherTuple() using it on
PGresult to avoid unintended replacement of row processor by
PQsetRowProcessor(), and I understand that row processor setting
should be in PGconn context and the change by PGsetRowProcessor()
should immediately become effective. That's right?

Ok, lets see how it looks. But please do it like this:

- PQgetRowProcessor() that returns existing row processor.

This seems also can be done by the return value of
PQsetRowProcessor() (currently void). Anyway, I provide this
function and also change the return value of PQsetRowProcessor().

- PQskipResult() that just replaces row processor, then calls
PQgetResult() to eat the result. It's behaviour fully
matches PQgetResult() then.

There seems to be two choices, one is that PQskipResult()
replaces the row processor with NULL pointer and
getAnotherTuple() calls row processor if not NULL. And the
another is PQskipResult() sets the empty function as row
processor. I do the latter for the present.

This approach does needless call of getAnotherTuple(). Seeing if
the pointer to row processor is NULL in pqParseInput[23]() could
avoid this extra calls and may reduce the time for skip, but I
think it is no need to do so for such rare cases.

I guess the main thing that annoys me with skipping is that
it would require additional magic flag inside libpq.
With PQgetRowProcessor() it does not need anymore, it's
just a helper function that user can implement as well.

Ok.

Only question here is what should happen if there are
several incoming resultsets (several queries in PQexec).
Should we leave to user to loop over them?

Seems there is 2 approaches here:

1) int PQskipResult(PGconn)
2) int PQskipResult(PGconn, int skipAll)

Both cases returning:
1 - got resultset, there might be more
0 - PQgetResult() returned NULL, connection is empty
-1 - error

Although 1) mirrors regular PGgetResult() better, most users
might not know that function as they are using sync API.
They have simpler time with 2). So 2) then?

Let me confirm the effects of this function. Is the below
description right?

- PQskipResult(conn, false) makes just following PQgetResult() to
skip current bunch of rows and the consequent PQgetResult()'s
gathers rows as usual.

- PQskipResult(conn, true) makes all consequent PQgetResult()'s
to skip all the rows.

If this is right, row processor should stay also in PGresult
context. PQskipResult() replaces the row processor in PGconn when
the second parameter is true, and in PGresult for false.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#55Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#54)
Re: Speed dblink using alternate libpq tuple storage

Sorry, I should fix a wrong word selection..

Let me confirm the effects of this function. Is the below
description right?

- PQskipResult(conn, false) makes just following PQgetResult() to
skip current bunch of rows and the consequent PQgetResult()'s
gathers rows as usual.

- PQskipResult(conn, false) makes just following PQgetResult() to
skip current bunch of rows and the succeeding PQgetResult()'s
gathers rows as usual. ~~~~~~~~~~

- PQskipResult(conn, true) makes all consequent PQgetResult()'s
to skip all the rows.

- PQskipResult(conn, true) makes all succeeding PQgetResult()'s
to skip all the rows. ~~~~~~~~~~

--
Kyotaro Horiguchi
NTT Open Source Software Center

#56Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#54)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 21, 2012 at 11:42 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:

Also, as row handler is on connection, then changing it's
behavior is connection context, not result.

OK, current implement copying the pointer to the row processor
from PGconn to PGresult and getAnotherTuple() using it on
PGresult to avoid unintended replacement of row processor by
PQsetRowProcessor(), and I understand that row processor setting
should be in PGconn context and the change by PGsetRowProcessor()
should immediately become effective. That's right?

Note I dropped the row processor from under PGresult.
Please don't put it back there.

Ok, lets see how it looks.  But please do it like this:

- PQgetRowProcessor() that returns existing row processor.

This seems also can be done by the return value of
PQsetRowProcessor() (currently void). Anyway, I provide this
function and also change the return value of PQsetRowProcessor().

Note you need processorParam as well.
I think it's simpler to rely on PQgetProcessor()

- PQskipResult() that just replaces row processor, then calls
  PQgetResult() to eat the result.  It's behaviour fully
  matches PQgetResult() then.

There seems to be two choices, one is that PQskipResult()
replaces the row processor with NULL pointer and
getAnotherTuple() calls row processor if not NULL. And the
another is PQskipResult() sets the empty function as row
processor. I do the latter for the present.

Yes, let's avoid NULLs.

This approach does needless call of getAnotherTuple(). Seeing if
the pointer to row processor is NULL in pqParseInput[23]() could
avoid this extra calls and may reduce the time for skip, but I
think it is no need to do so for such rare cases.

We should optimize for common case, which is non-skipping
row processor.

Only question here is what should happen if there are
several incoming resultsets (several queries in PQexec).
Should we leave to user to loop over them?

Seems there is 2 approaches here:

1) int PQskipResult(PGconn)
2) int PQskipResult(PGconn, int skipAll)

Both cases returning:
    1 - got resultset, there might be more
    0 - PQgetResult() returned NULL, connection is empty
   -1 - error

Although 1) mirrors regular PGgetResult() better, most users
might not know that function as they are using sync API.
They have simpler time with 2).  So 2) then?

Let me confirm the effects of this function. Is the below
description right?

- PQskipResult(conn, false) makes just following PQgetResult() to
 skip current bunch of rows and the consequent PQgetResult()'s
 gathers rows as usual.

Yes.

- PQskipResult(conn, true) makes all consequent PQgetResult()'s
 to skip all the rows.

If this is right, row processor should stay also in PGresult
context. PQskipResult() replaces the row processor in PGconn when
the second parameter is true, and in PGresult for false.

No, let's keep row processor only under PGconn.

--
marko

#57Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#56)
Re: Speed dblink using alternate libpq tuple storage

Hello,

Note I dropped the row processor from under PGresult.
Please don't put it back there.

I overlooked that. I understand it.

This seems also can be done by the return value of
PQsetRowProcessor() (currently void). Anyway, I provide this
function and also change the return value of PQsetRowProcessor().

Note you need processorParam as well.
I think it's simpler to rely on PQgetProcessor()

Hmm. Ok.

Let me confirm the effects of this function. Is the below
description right?

- PQskipResult(conn, false) makes just following PQgetResult() to
 skip current bunch of rows and the consequent PQgetResult()'s
 gathers rows as usual.

Yes.

- PQskipResult(conn, true) makes all consequent PQgetResult()'s
 to skip all the rows.

Well, Is this right?

If this is right, row processor should stay also in PGresult
context. PQskipResult() replaces the row processor in PGconn when
the second parameter is true, and in PGresult for false.

No, let's keep row processor only under PGconn.

Then, Should I add the stash for the row processor (and needless
for param) to recall after in PGconn?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#58Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#57)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 21, 2012 at 12:13 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:

- PQskipResult(conn, true) makes all consequent PQgetResult()'s
 to skip all the rows.

Well, Is this right?

Yes, call getResult() until it returns NULL.

If this is right, row processor should stay also in PGresult
context. PQskipResult() replaces the row processor in PGconn when
the second parameter is true, and in PGresult for false.

No, let's keep row processor only under PGconn.

Then, Should I add the stash for the row processor (and needless
for param) to recall after in PGconn?

PQskipResult:
- store old callback and param in local vars
- set do-nothing row callback
- call PQgetresult() once, or until it returns NULL
- restore old callback
- return 1 if last result was non-NULL, 0 otherwise

--
marko

#59Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#58)
Re: Speed dblink using alternate libpq tuple storage

Thank you. Everything seems clear.
Please wait for a while.

Show quoted text

PQskipResult:
- store old callback and param in local vars
- set do-nothing row callback
- call PQgetresu

On Tue, Feb 21, 2012 at 12:13 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@oss.ntt.co.jp> wrote:

- PQskipResult(conn, true) makes all consequent PQgetResult()'s
to skip all the rows.

Well, Is this right?

Yes, call getResult() until it returns NULL.

If this is right, row processor should stay also in PGresult
context. PQskipResult() replaces the row processor in PGconn when
the second parameter is true, and in PGresult for false.

No, let's keep row processor only under PGconn.

Then, Should I add the stash for the row processor (and needless
for param) to recall after in PGconn?

PQskipResult:
- store old callback and param in local vars
- set do-nothing row callback
- call PQgetresult() once, or until it returns NULL
- restore old callback
- return 1 if last result was non-NULL, 0 otherwise

--
marko

#60Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#59)
4 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, this is ... nth version of the patch.

Thank you. Everything seems clear.
Please wait for a while.

libpq-fe.h
- PQskipResult() is added instead of PGskipRemainigResults().

fe-exec.c
- PQskipResult() is added instead of PGskipRemainigResults().
- PQgetRowProcessor() is added.

dblink.c
- Use PQskipReslt() to skip remaining rows.
- Shorten the path from catching exception to rethrowing it. And
storeInfo.edata has been removed due to that.
- dblink_fetch() now catches exceptions properly.

libpq.sgml
- PQskipResult() is added instead of PGskipRemainigResults().
- Some misspelling has been corrected.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120222.patchapplication/octet-stream; name=libpq_rowproc_20120222.patchDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..7e02497 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQsetRowProcessor	  161
+PQgetRowProcessor	  162
+PQsetRowProcessorErrMsg	  163
+PQskipResult		  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)
 	conn->wait_ssl_try = false;
 #endif
 
+	/* set default row processor */
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
 	 * buffers on Unix systems.  That way, when we are sending a large amount
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)
 	initPQExpBuffer(&conn->errorMessage);
 	initPQExpBuffer(&conn->workBuffer);
 
+	/* set up initial row buffer */
+	conn->rowBufLen = 32;
+	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+
 	if (conn->inBuffer == NULL ||
 		conn->outBuffer == NULL ||
+		conn->rowBuf == NULL ||
 		PQExpBufferBroken(&conn->errorMessage) ||
 		PQExpBufferBroken(&conn->workBuffer))
 	{
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)
 		free(conn->inBuffer);
 	if (conn->outBuffer)
 		free(conn->outBuffer);
+	if (conn->rowBuf)
+		free(conn->rowBuf);
 	termPQExpBuffer(&conn->errorMessage);
 	termPQExpBuffer(&conn->workBuffer);
 
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..53da93c 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddRow(PGresult *res, void *param, PGrowValue *columns);
 
 
 /* ----------------
@@ -160,6 +161,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->rowProcessorErrMsg = NULL;
 
 	if (conn)
 	{
@@ -701,7 +703,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +757,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,6 +828,87 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+	if (param)
+		*param = conn->rowProcessorParam;
+
+	return conn->rowProcessor;
+}
+
+/*
+ * PQsetRowProcessorErrMsg
+ *    Set the error message pass back to the caller of RowProcessor.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+{
+	if (msg)
+		res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+	else
+		res->rowProcessorErrMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, void *param, PGrowValue *columns)
+{
+	PGresAttValue *tup;
+	int			nfields = res->numAttributes;
+	int			i;
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL)
+		return FALSE;
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+				return FALSE;
+
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
+	return pqAddTuple(res, tup);
+}
+
+/*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
  *	  Returns TRUE if OK, FALSE if not enough memory to add the row
@@ -1223,7 +1304,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -1831,6 +1911,48 @@ PQexecFinish(PGconn *conn)
 	return lastResult;
 }
 
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, void *param, PGrowValue *columns)
+{
+	return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+	PQrowProcessor savedRowProcessor;
+	void * savedRowProcParam;
+	int ret = 0;
+
+	/* save the current row processor settings and set dummy processor */
+	savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+	PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+	
+	/*
+	 * Throw away the remaining rows in current result, or all succeeding
+	 * results if skipAll is not FALSE.
+	 */
+	if (skipAll)
+		while (PQgetResult(conn));
+	else if (PQgetResult(conn))
+		ret = 1;
+	
+	PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+	return ret;
+}
+
+
 /*
  * PQdescribePrepared
  *	  Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..ae4d7b0 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, FALSE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, TRUE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -703,52 +707,51 @@ failure:
 
 /*
  * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
 	char	   *bitmap = std_bitmap;
 	int			i;
+	int			rp;
 	size_t		nbytes;			/* the number of bytes in bitmap  */
 	char		bmap;			/* One byte of the bitmap */
 	int			bitmap_index;	/* Its index */
 	int			bitcnt;			/* number of bits examined in current byte */
 	int			vlen;			/* length of the current field value */
 
+	/* resize row buffer if needed */
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+			goto rowProcessError;
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+	else
+	{
+		rowbuf = conn->rowBuf;
+	}
+
 	result->binary = binary;
 
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	if (binary)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
-		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
-		}
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,7 +760,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+			goto rowProcessError;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +774,29 @@ getAnotherTuple(PGconn *conn, bool binary)
 	for (i = 0; i < nfields; i++)
 	{
 		if (!(bmap & 0200))
-		{
-			/* if the field value is absent, make it a null string */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-		}
+			vlen = NULL_LEN;
+		else if (pqGetInt(&vlen, 4, conn))
+				goto EOFexit;
 		else
 		{
-			/* get the value length (the first four bytes are for length) */
-			if (pqGetInt(&vlen, 4, conn))
-				goto EOFexit;
 			if (!binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
-			tup[i].len = vlen;
-			/* read in the value */
-			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
+
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip the value */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto EOFexit;
+
 		/* advance the bitmap stuff */
 		bitcnt++;
 		if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +809,56 @@ getAnotherTuple(PGconn *conn, bool binary)
 			bmap <<= 1;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
-
 	if (bitmap != std_bitmap)
 		free(bitmap);
-	return 0;
+	bitmap = NULL;
+
+	/* tag the row as parsed */
+	conn->inStart = conn->inCursor;
+
+	/* Pass the completed row values to rowProcessor */
+	rp= conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
+	if (rp == 1)
+		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
+	else if (rp != 0)
+		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
+
+rowProcessError:
 
-outOfMemory:
 	/* Replace partially constructed result with an error result */
 
-	/*
-	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
-	 * there's not enough memory to concatenate messages...
-	 */
-	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	if (result->rowProcessorErrMsg)
+	{
+		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+		pqSaveErrorResult(conn);
+	}
+	else
+	{
+		/*
+		 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
+		 * there's not enough memory to concatenate messages...
+		 */
+		pqClearAsyncResult(conn);
+		resetPQExpBuffer(&conn->errorMessage);
 
-	/*
-	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
-	 * do to recover...
-	 */
-	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+		/*
+		 * If error message is passed from RowProcessor, set it into
+		 * PGconn, assume out of memory if not.
+		 */
+		appendPQExpBufferStr(&conn->errorMessage,
+							 libpq_gettext("out of memory for query result\n"));
+
+		/*
+		 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
+		 * do to recover...
+		 */
+		conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+	}
 	conn->asyncStatus = PGASYNC_READY;
+
 	/* Discard the failed message --- good idea? */
 	conn->inStart = conn->inEnd;
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..0260ba6 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, msgLength))
 							return;
+
+						/* getAnotherTuple() moves inStart itself */
+						continue;
 					}
 					else if (conn->result != NULL &&
 							 conn->result->resultStatus == PGRES_FATAL_ERROR)
@@ -613,33 +616,22 @@ failure:
 
 /*
  * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
+	int			rp;
 
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
@@ -652,52 +644,88 @@ getAnotherTuple(PGconn *conn, int msgLength)
 				 libpq_gettext("unexpected field count in \"D\" message\n"));
 		pqSaveErrorResult(conn);
 		/* Discard the failed message by pretending we read it */
-		conn->inCursor = conn->inStart + 5 + msgLength;
+		conn->inStart += 5 + msgLength;
 		return 0;
 	}
 
+	/* resize row buffer if needed */
+	rowbuf = conn->rowBuf;
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+		{
+			goto outOfMemory1;
+		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+
 	/* Scan the fields */
 	for (i = 0; i < nfields; i++)
 	{
 		/* get the value length */
 		if (pqGetInt(&vlen, 4, conn))
-			return EOF;
+			goto protocolError;
 		if (vlen == -1)
-		{
-			/* null field */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-			continue;
-		}
-		if (vlen < 0)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
-		}
-		tup[i].len = vlen;
-		/* read in the value */
-		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto protocolError;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/* tag the row as parsed, check if correctly */
+	conn->inStart += 5 + msgLength;
+	if (conn->inCursor != conn->inStart)
+		goto protocolError;
 
+	/* Pass the completed row values to rowProcessor */
+	rp = conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
+	if (rp == 1)
+	{
+		/* everything is good */
+		return 0;
+	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
+
+	/* there was some problem */
+	if (rp == 0)
+	{
+		if (result->rowProcessorErrMsg == NULL)
+			goto outOfMemory2;
+
+		/* use supplied error message */
+		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+	}
+	else
+	{
+		/* row processor messed up */
+		printfPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("invalid return value from row processor\n"));
+	}
+	pqSaveErrorResult(conn);
 	return 0;
 
-outOfMemory:
+outOfMemory1:
+	/* Discard the failed message by pretending we read it */
+	conn->inStart += 5 + msgLength;
 
+outOfMemory2:
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
@@ -706,9 +734,14 @@ outOfMemory:
 	printfPQExpBuffer(&conn->errorMessage,
 					  libpq_gettext("out of memory for query result\n"));
 	pqSaveErrorResult(conn);
+	return 0;
 
+protocolError:
+	printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext("invalid row contents\n"));
+	pqSaveErrorResult(conn);
 	/* Discard the failed message by pretending we read it */
-	conn->inCursor = conn->inStart + 5 + msgLength;
+	conn->inStart += 5 + msgLength;
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..ffe88e2 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGrowValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +427,39 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQsetRowProcessorErrMsg.  It
+ * is assumed by caller as out of memory when the error message is not
+ * set on failure. This function is assumed not to throw any
+ * exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, void *param,
+								PGrowValue *columns);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+								   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +498,7 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void	PQsetRowProcessorErrMsg(PGresult *res, char *msg);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..1fc5aab 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -209,6 +209,9 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
+
+	/* temp etorage for message from row processor callback */
+	char	   *rowProcessorErrMsg;
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -398,7 +401,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +445,14 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+	/*
+	 * Read column data from network buffer.
+	 */
+	PQrowProcessor rowProcessor;/* Function pointer */
+	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+	int rowBufLen;				/* Number of columns allocated in rowBuf */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -560,6 +570,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120222.patchapplication/octet-stream; name=libpq_rowproc_doc_20120222.patchDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..6b95be8 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,330 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PQresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PQresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the row processor function.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, void *param, PGrowValue *columns);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.  On
+       failure this function should set the error message
+       with <function>PGsetRowProcessorErrMsg</function> if the cause
+       is other than out of memory.  When non-blocking API is in use,
+       it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed via <function>PQfinish</function>.
+       Processing can also be continued without closing the connection,
+       call <function>getResult</function> on syncronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     Column values of the row to process.  Column values
+	     are located in network buffer, the processor must
+	     copy them out from there.
+	   </para>
+	   <para>
+	     Column values are not null-terminated, so processor cannot
+	     use C string functions on them directly.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+		Discard all the remaining row data
+		after <function>PQexec</function>
+		or <function>PQgetResult</function> exits by the exception raised
+		in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>skipAll</parameter></term>
+	   <listitem>
+	     <para>
+	       Skip remaining rows in current result
+	       if <parameter>skipAll</parameter> is false(0). Skip
+	       remaining rows in current result and all rows in
+	       succeeding results if true(non-zero).
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+    <term>
+     <function>PQsetRowProcessorErrMsg</function>
+     <indexterm>
+      <primary>PQsetRowProcessorErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>PQrowProcessor</type>.  If this message is not set, the
+	caller assumes the error to be out of memory.
+<synopsis>
+void PQsetRowProcessorErrMsg(PGresult *res, char *msg)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>PQrowProcessor</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>msg</parameter></term>
+	    <listitem>
+	      <para>
+		Error message. This will be copied internally so there is
+		no need to care of the scope.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it will be overwritten. Set NULL to cancel the the custom
+		message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120222.patchapplication/octet-stream; name=dblink_use_rowproc_20120222.patchDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..a0c9bd8 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, void *param, PGrowValue *columns);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +769,252 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
+
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
 
-	PG_TRY();
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+	if (sinfo->valbuf)
+		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuflen)
+		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+	if (sinfo->cstrs == NULL)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		if (sinfo->valbuf)
+			free(sinfo->valbuf);
+		if (sinfo->valbuflen)
+			free(sinfo->valbuflen);
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
 
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
+				free(sinfo->valbuf[i]);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->cstrs)
+	{
+		free(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+static int
+storeHandler(PGresult *res, void *param, PGrowValue *columns)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      **cstrs = sinfo->cstrs;
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->error_occurred)
+		return FALSE;
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return FALSE;
+	}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			char *tmp = sinfo->valbuf[i];
+			int tmplen = sinfo->valbuflen[i];
+
+			/*
+			 * Divide calls to malloc and realloc so that things will
+			 * go fine even on the systems of which realloc() does not
+			 * accept NULL as old memory block.
+			 *
+			 * Also try to (re)allocate in bigger steps to
+			 * avoid flood of allocations on weird data.
+			 */
+			if (tmp == NULL)
+			{
+				tmplen = len + 1;
+				if (tmplen < 64)
+					tmplen = 64;
+				tmp = (char *)malloc(tmplen);
+			}
+			else if (tmplen < len + 1)
+			{
+				if (len + 1 > tmplen * 2)
+					tmplen = len + 1;
+				else
+					tmplen = tmplen * 2;
+				tmp = (char *)realloc(tmp, tmplen);
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+			/*
+			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
+			 * when realloc returns NULL.
+			 */
+			if (tmp == NULL)
+				return FALSE;
 
-		PQclear(res);
-	}
-	PG_CATCH();
-	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+			sinfo->valbuf[i] = tmp;
+			sinfo->valbuflen[i] = tmplen;
+
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return TRUE;
 }
 
 /*
early_exit_20120222.diffapplication/octet-stream; name=early_exit_20120222.diffDownload
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index 7b14366..6b95be8 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7351,7 +7351,17 @@ typedef struct
        This function must return 1 for success, and 0 for failure.  On
        failure this function should set the error message
        with <function>PGsetRowProcessorErrMsg</function> if the cause
-       is other than out of memory.
+       is other than out of memory.  When non-blocking API is in use,
+       it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
      </para>
 
      <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index 7498580..ae4d7b0 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -820,6 +820,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	rp= conn->rowProcessor(result, conn->rowProcessorParam, rowbuf);
 	if (rp == 1)
 		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
 	else if (rp != 0)
 		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
 
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index a67e3ac..0260ba6 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -697,6 +697,11 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* everything is good */
 		return 0;
 	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
 
 	/* there was some problem */
 	if (rp == 0)
#61Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#60)
Re: Speed dblink using alternate libpq tuple storage

On Wed, Feb 22, 2012 at 12:27:57AM +0900, Kyotaro HORIGUCHI wrote:

fe-exec.c
- PQskipResult() is added instead of PGskipRemainigResults().

It must free the PGresults that PQgetResult() returns.

Also, please fix 2 issues mentined here:

http://archives.postgresql.org/message-id/CACMqXCLvpkjb9+c6sqJXitMHvrRCo+yu4q4bQ--0d7L=vw62Yg@mail.gmail.com

--
marko

#62Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#61)
4 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, this is new version of the patch.

# This patch is based on the commit
# 2bbd88f8f841b01efb073972b60d4dc1ff1f6fd0 @ Feb 13 to avoid the
# compile error caused by undeclared LEAKPROOF in kwlist.h.

It must free the PGresults that PQgetResult() returns.

I'm sorry. It slipped out of my mind. Add PQclear() for the
return value.

Also, please fix 2 issues mentined here:

- PQsetRowProcessorErrMsg() now handles msg as const string.

- Changed the order of the parameters of the type PQrowProcessor.
New order is (PGresult *res, PGrowValue *columns, void *params).

# PQsetRowProcessorErrMsg outside of callback is not implemented.

- Documentation and dblink are modified according to the changes
above.

By the way, I would like to ask you one question. What is the
reason why void* should be head or tail of the parameter list?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120223.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..7e02497 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQsetRowProcessor	  161
+PQgetRowProcessor	  162
+PQsetRowProcessorErrMsg	  163
+PQskipResult		  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)
 	conn->wait_ssl_try = false;
 #endif
 
+	/* set default row processor */
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
 	 * buffers on Unix systems.  That way, when we are sending a large amount
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)
 	initPQExpBuffer(&conn->errorMessage);
 	initPQExpBuffer(&conn->workBuffer);
 
+	/* set up initial row buffer */
+	conn->rowBufLen = 32;
+	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+
 	if (conn->inBuffer == NULL ||
 		conn->outBuffer == NULL ||
+		conn->rowBuf == NULL ||
 		PQExpBufferBroken(&conn->errorMessage) ||
 		PQExpBufferBroken(&conn->workBuffer))
 	{
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)
 		free(conn->inBuffer);
 	if (conn->outBuffer)
 		free(conn->outBuffer);
+	if (conn->rowBuf)
+		free(conn->rowBuf);
 	termPQExpBuffer(&conn->errorMessage);
 	termPQExpBuffer(&conn->workBuffer);
 
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..cd287cd 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddRow(PGresult *res, PGrowValue *columns, void *param);
 
 
 /* ----------------
@@ -160,6 +161,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status)
 	result->curBlock = NULL;
 	result->curOffset = 0;
 	result->spaceLeft = 0;
+	result->rowProcessorErrMsg = NULL;
 
 	if (conn)
 	{
@@ -701,7 +703,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +757,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,6 +828,87 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+	if (param)
+		*param = conn->rowProcessorParam;
+
+	return conn->rowProcessor;
+}
+
+/*
+ * PQsetRowProcessorErrMsg
+ *    Set the error message pass back to the caller of RowProcessor.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQsetRowProcessorErrMsg(PGresult *res, const char *msg)
+{
+	if (msg)
+		res->rowProcessorErrMsg = pqResultStrdup(res, msg);
+	else
+		res->rowProcessorErrMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+	PGresAttValue *tup;
+	int			nfields = res->numAttributes;
+	int			i;
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL)
+		return FALSE;
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+				return FALSE;
+
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
+	return pqAddTuple(res, tup);
+}
+
+/*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
  *	  Returns TRUE if OK, FALSE if not enough memory to add the row
@@ -1223,7 +1304,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -1831,6 +1911,55 @@ PQexecFinish(PGconn *conn)
 	return lastResult;
 }
 
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+	return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+	PQrowProcessor savedRowProcessor;
+	void * savedRowProcParam;
+	PGresult *res;
+	int ret = 0;
+
+	/* save the current row processor settings and set dummy processor */
+	savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+	PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+	
+	/*
+	 * Throw away the remaining rows in current result, or all succeeding
+	 * results if skipAll is not FALSE.
+	 */
+	if (skipAll)
+	{
+		while ((res = PQgetResult(conn)) != NULL)
+			PQclear(res);
+	}
+	else if ((res = PQgetResult(conn)) != NULL)
+	{
+		PQclear(res);
+		ret = 1;
+	}
+	
+	PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+	return ret;
+}
+
+
 /*
  * PQdescribePrepared
  *	  Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..6578019 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, FALSE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, TRUE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -703,52 +707,51 @@ failure:
 
 /*
  * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
 	char	   *bitmap = std_bitmap;
 	int			i;
+	int			rp;
 	size_t		nbytes;			/* the number of bytes in bitmap  */
 	char		bmap;			/* One byte of the bitmap */
 	int			bitmap_index;	/* Its index */
 	int			bitcnt;			/* number of bits examined in current byte */
 	int			vlen;			/* length of the current field value */
 
+	/* resize row buffer if needed */
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+			goto rowProcessError;
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+	else
+	{
+		rowbuf = conn->rowBuf;
+	}
+
 	result->binary = binary;
 
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	if (binary)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
-		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
-		}
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,7 +760,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+			goto rowProcessError;
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +774,29 @@ getAnotherTuple(PGconn *conn, bool binary)
 	for (i = 0; i < nfields; i++)
 	{
 		if (!(bmap & 0200))
-		{
-			/* if the field value is absent, make it a null string */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-		}
+			vlen = NULL_LEN;
+		else if (pqGetInt(&vlen, 4, conn))
+				goto EOFexit;
 		else
 		{
-			/* get the value length (the first four bytes are for length) */
-			if (pqGetInt(&vlen, 4, conn))
-				goto EOFexit;
 			if (!binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
-			tup[i].len = vlen;
-			/* read in the value */
-			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
+
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip the value */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto EOFexit;
+
 		/* advance the bitmap stuff */
 		bitcnt++;
 		if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +809,56 @@ getAnotherTuple(PGconn *conn, bool binary)
 			bmap <<= 1;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
-
 	if (bitmap != std_bitmap)
 		free(bitmap);
-	return 0;
+	bitmap = NULL;
+
+	/* tag the row as parsed */
+	conn->inStart = conn->inCursor;
+
+	/* Pass the completed row values to rowProcessor */
+	rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+	if (rp == 1)
+		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
+	else if (rp != 0)
+		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
+
+rowProcessError:
 
-outOfMemory:
 	/* Replace partially constructed result with an error result */
 
-	/*
-	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
-	 * there's not enough memory to concatenate messages...
-	 */
-	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	if (result->rowProcessorErrMsg)
+	{
+		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+		pqSaveErrorResult(conn);
+	}
+	else
+	{
+		/*
+		 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
+		 * there's not enough memory to concatenate messages...
+		 */
+		pqClearAsyncResult(conn);
+		resetPQExpBuffer(&conn->errorMessage);
 
-	/*
-	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
-	 * do to recover...
-	 */
-	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+		/*
+		 * If error message is passed from RowProcessor, set it into
+		 * PGconn, assume out of memory if not.
+		 */
+		appendPQExpBufferStr(&conn->errorMessage,
+							 libpq_gettext("out of memory for query result\n"));
+
+		/*
+		 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
+		 * do to recover...
+		 */
+		conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+	}
 	conn->asyncStatus = PGASYNC_READY;
+
 	/* Discard the failed message --- good idea? */
 	conn->inStart = conn->inEnd;
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..a19ee88 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, msgLength))
 							return;
+
+						/* getAnotherTuple() moves inStart itself */
+						continue;
 					}
 					else if (conn->result != NULL &&
 							 conn->result->resultStatus == PGRES_FATAL_ERROR)
@@ -613,33 +616,22 @@ failure:
 
 /*
  * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
+	int			rp;
 
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
@@ -652,52 +644,88 @@ getAnotherTuple(PGconn *conn, int msgLength)
 				 libpq_gettext("unexpected field count in \"D\" message\n"));
 		pqSaveErrorResult(conn);
 		/* Discard the failed message by pretending we read it */
-		conn->inCursor = conn->inStart + 5 + msgLength;
+		conn->inStart += 5 + msgLength;
 		return 0;
 	}
 
+	/* resize row buffer if needed */
+	rowbuf = conn->rowBuf;
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+		{
+			goto outOfMemory1;
+		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+
 	/* Scan the fields */
 	for (i = 0; i < nfields; i++)
 	{
 		/* get the value length */
 		if (pqGetInt(&vlen, 4, conn))
-			return EOF;
+			goto protocolError;
 		if (vlen == -1)
-		{
-			/* null field */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-			continue;
-		}
-		if (vlen < 0)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
-		}
-		tup[i].len = vlen;
-		/* read in the value */
-		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto protocolError;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/* tag the row as parsed, check if correctly */
+	conn->inStart += 5 + msgLength;
+	if (conn->inCursor != conn->inStart)
+		goto protocolError;
 
+	/* Pass the completed row values to rowProcessor */
+	rp = conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+	if (rp == 1)
+	{
+		/* everything is good */
+		return 0;
+	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
+
+	/* there was some problem */
+	if (rp == 0)
+	{
+		if (result->rowProcessorErrMsg == NULL)
+			goto outOfMemory2;
+
+		/* use supplied error message */
+		printfPQExpBuffer(&conn->errorMessage, "%s", result->rowProcessorErrMsg);
+	}
+	else
+	{
+		/* row processor messed up */
+		printfPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("invalid return value from row processor\n"));
+	}
+	pqSaveErrorResult(conn);
 	return 0;
 
-outOfMemory:
+outOfMemory1:
+	/* Discard the failed message by pretending we read it */
+	conn->inStart += 5 + msgLength;
 
+outOfMemory2:
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
@@ -706,9 +734,14 @@ outOfMemory:
 	printfPQExpBuffer(&conn->errorMessage,
 					  libpq_gettext("out of memory for query result\n"));
 	pqSaveErrorResult(conn);
+	return 0;
 
+protocolError:
+	printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext("invalid row contents\n"));
+	pqSaveErrorResult(conn);
 	/* Discard the failed message by pretending we read it */
-	conn->inCursor = conn->inStart + 5 + msgLength;
+	conn->inStart += 5 + msgLength;
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..b7370e2 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGrowValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +427,39 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQsetRowProcessorErrMsg.  It
+ * is assumed by caller as out of memory when the error message is not
+ * set on failure. This function is assumed not to throw any
+ * exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+								   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +498,7 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void	PQsetRowProcessorErrMsg(PGresult *res, const char *msg);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..1fc5aab 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -209,6 +209,9 @@ struct pg_result
 	PGresult_data *curBlock;	/* most recently allocated block */
 	int			curOffset;		/* start offset of free space in block */
 	int			spaceLeft;		/* number of free bytes remaining in block */
+
+	/* temp etorage for message from row processor callback */
+	char	   *rowProcessorErrMsg;
 };
 
 /* PGAsyncStatusType defines the state of the query-execution state machine */
@@ -398,7 +401,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +445,14 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+	/*
+	 * Read column data from network buffer.
+	 */
+	PQrowProcessor rowProcessor;/* Function pointer */
+	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+	int rowBufLen;				/* Number of columns allocated in rowBuf */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -560,6 +570,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120223.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..0087b43 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,330 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PQresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PQresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the row processor function.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.  On
+       failure this function should set the error message
+       with <function>PGsetRowProcessorErrMsg</function> if the cause
+       is other than out of memory.  When non-blocking API is in use,
+       it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed via <function>PQfinish</function>.
+       Processing can also be continued without closing the connection,
+       call <function>getResult</function> on syncronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     Column values of the row to process.  Column values
+	     are located in network buffer, the processor must
+	     copy them out from there.
+	   </para>
+	   <para>
+	     Column values are not null-terminated, so processor cannot
+	     use C string functions on them directly.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+		Discard all the remaining row data
+		after <function>PQexec</function>
+		or <function>PQgetResult</function> exits by the exception raised
+		in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>skipAll</parameter></term>
+	   <listitem>
+	     <para>
+	       Skip remaining rows in current result
+	       if <parameter>skipAll</parameter> is false(0). Skip
+	       remaining rows in current result and all rows in
+	       succeeding results if true(non-zero).
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessorerrmsg">
+    <term>
+     <function>PQsetRowProcessorErrMsg</function>
+     <indexterm>
+      <primary>PQsetRowProcessorErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>PQrowProcessor</type>.  If this message is not set, the
+	caller assumes the error to be out of memory.
+<synopsis>
+void PQsetRowProcessorErrMsg(PGresult *res, const char *msg)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>PQrowProcessor</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>msg</parameter></term>
+	    <listitem>
+	      <para>
+		Error message. This will be copied internally so there is
+		no need to care of the scope.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it will be overwritten. Set NULL to cancel the the custom
+		message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120223.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..d9f1b3a 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +769,252 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
+
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
 
-	PG_TRY();
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+	if (sinfo->valbuf)
+		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuflen)
+		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+	if (sinfo->cstrs == NULL)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		if (sinfo->valbuf)
+			free(sinfo->valbuf);
+		if (sinfo->valbuflen)
+			free(sinfo->valbuflen);
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
 
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
+				free(sinfo->valbuf[i]);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->cstrs)
+	{
+		free(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      **cstrs = sinfo->cstrs;
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->error_occurred)
+		return FALSE;
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		return FALSE;
+	}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			char *tmp = sinfo->valbuf[i];
+			int tmplen = sinfo->valbuflen[i];
+
+			/*
+			 * Divide calls to malloc and realloc so that things will
+			 * go fine even on the systems of which realloc() does not
+			 * accept NULL as old memory block.
+			 *
+			 * Also try to (re)allocate in bigger steps to
+			 * avoid flood of allocations on weird data.
+			 */
+			if (tmp == NULL)
+			{
+				tmplen = len + 1;
+				if (tmplen < 64)
+					tmplen = 64;
+				tmp = (char *)malloc(tmplen);
+			}
+			else if (tmplen < len + 1)
+			{
+				if (len + 1 > tmplen * 2)
+					tmplen = len + 1;
+				else
+					tmplen = tmplen * 2;
+				tmp = (char *)realloc(tmp, tmplen);
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+			/*
+			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
+			 * when realloc returns NULL.
+			 */
+			if (tmp == NULL)
+				return FALSE;
 
-		PQclear(res);
-	}
-	PG_CATCH();
-	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+			sinfo->valbuf[i] = tmp;
+			sinfo->valbuflen[i] = tmplen;
+
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return TRUE;
 }
 
 /*
early_exit_20120223.difftext/x-patch; charset=us-asciiDownload
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index 9c4c810..0087b43 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7351,7 +7351,17 @@ typedef struct
        This function must return 1 for success, and 0 for failure.  On
        failure this function should set the error message
        with <function>PGsetRowProcessorErrMsg</function> if the cause
-       is other than out of memory.
+       is other than out of memory.  When non-blocking API is in use,
+       it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
      </para>
 
      <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index c3220ed..6578019 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -820,6 +820,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
 	if (rp == 1)
 		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
 	else if (rp != 0)
 		PQsetRowProcessorErrMsg(result, libpq_gettext("invalid return value from row processor\n"));
 
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index 44d1ec4..a19ee88 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -697,6 +697,11 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* everything is good */
 		return 0;
 	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
 
 	/* there was some problem */
 	if (rp == 0)
#63Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#62)
Re: Speed dblink using alternate libpq tuple storage

On Thu, Feb 23, 2012 at 07:14:03PM +0900, Kyotaro HORIGUCHI wrote:

Hello, this is new version of the patch.

Looks good.

By the way, I would like to ask you one question. What is the
reason why void* should be head or tail of the parameter list?

Aesthetical reasons:

1) PGresult and PGrowValue belong together.

2) void* is probably the context object for handler. When doing
object-oriented programming in C the main object is usually first.
Like libpq does - PGconn is always first argument.

But as libpq does not know the actual meaning of void* for handler,
is can be last param as well.

When I wrote the demo code, I noticed that it is unnatural to have
void* in the middle.

Last comment - if we drop the plan to make PQsetRowProcessorErrMsg()
usable outside of handler, we can simplify internal usage as well:
the PGresult->rowProcessorErrMsg can be dropped and let's use
->errMsg to transport the error message.

The PGresult is long-lived structure and adding fields for such
temporary usage feels wrong. There is no other libpq code between
row processor and getAnotherTuple, so the use is safe.

--
marko

#64Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#63)
4 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, this is new version of the patch.

By the way, I would like to ask you one question. What is the
reason why void* should be head or tail of the parameter list?

Aesthetical reasons:

I got it. Thank you.

Last comment - if we drop the plan to make PQsetRowProcessorErrMsg()
usable outside of handler, we can simplify internal usage as well:
the PGresult->rowProcessorErrMsg can be dropped and let's use
->errMsg to transport the error message.

The PGresult is long-lived structure and adding fields for such
temporary usage feels wrong. There is no other libpq code between
row processor and getAnotherTuple, so the use is safe.

I almost agree with it. Plus, I think it is no reason to consider
out of memory as particular because now row processor becomes
generic. But the previous patch does different process for OOM
and others, but I couldn't see obvious reason to do so.

- PGresult.rowProcessorErrMes is removed and use PGconn.errMsg
instead with the new interface function PQresultSetErrMes().

- Now row processors should set message for any error status
occurred within. pqAddRow and dblink.c is modified to do so.

- getAnotherTuple() sets the error message `unknown error' for
the condition rp == 0 && ->errMsg == NULL.

- Always forward input cursor and do pqClearAsyncResult() and
pqSaveErrorResult() when rp == 0 in getAnotherTuple()
regardless whether ->errMsg is NULL or not in fe-protocol3.c.

- conn->inStart is already moved to the start point of the next
message when row processor is called. So forwarding inStart in
outOfMemory1 seems redundant. I removed it.

- printfPQExpBuffer() compains for variable message. So use
resetPQExpBuffer() and appendPQExpBufferStr() instead.

=====
- dblink does not use conn->errorMessage before, and also now...
all errors are displayed as `Error occurred on dblink connection...'.

- TODO: No NLS messages for error messages.

- Somehow make check yields error for base revision. So I have
not done that.

- I have no idea how to do test for protocol 2...

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120224.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..239edb8 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQsetRowProcessor	  161
+PQgetRowProcessor	  162
+PQresultSetErrMsg	  163
+PQskipResult		  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)
 	conn->wait_ssl_try = false;
 #endif
 
+	/* set default row processor */
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
 	 * buffers on Unix systems.  That way, when we are sending a large amount
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)
 	initPQExpBuffer(&conn->errorMessage);
 	initPQExpBuffer(&conn->workBuffer);
 
+	/* set up initial row buffer */
+	conn->rowBufLen = 32;
+	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+
 	if (conn->inBuffer == NULL ||
 		conn->outBuffer == NULL ||
+		conn->rowBuf == NULL ||
 		PQExpBufferBroken(&conn->errorMessage) ||
 		PQExpBufferBroken(&conn->workBuffer))
 	{
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)
 		free(conn->inBuffer);
 	if (conn->outBuffer)
 		free(conn->outBuffer);
+	if (conn->rowBuf)
+		free(conn->rowBuf);
 	termPQExpBuffer(&conn->errorMessage);
 	termPQExpBuffer(&conn->workBuffer);
 
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..7fd3c9c 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddRow(PGresult *res, PGrowValue *columns, void *param);
 
 
 /* ----------------
@@ -701,7 +702,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +756,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,6 +827,93 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+	if (param)
+		*param = conn->rowProcessorParam;
+
+	return conn->rowProcessor;
+}
+
+/*
+ * PQresultSetErrMsg
+ *    Set the error message to PGresult.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQresultSetErrMsg(PGresult *res, const char *msg)
+{
+	if (msg)
+		res->errMsg = pqResultStrdup(res, msg);
+	else
+		res->errMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+	PGresAttValue *tup;
+	int			nfields = res->numAttributes;
+	int			i;
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL)
+	{
+		PQresultSetErrMsg(res, "out of memory for query result\n");
+		return FALSE;
+	}
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+			{
+				PQresultSetErrMsg(res, "out of memory for query result\n");
+				return FALSE;
+			}
+
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
+	return pqAddTuple(res, tup);
+}
+
+/*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
  *	  Returns TRUE if OK, FALSE if not enough memory to add the row
@@ -1223,7 +1309,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -1831,6 +1916,55 @@ PQexecFinish(PGconn *conn)
 	return lastResult;
 }
 
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+	return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+	PQrowProcessor savedRowProcessor;
+	void * savedRowProcParam;
+	PGresult *res;
+	int ret = 0;
+
+	/* save the current row processor settings and set dummy processor */
+	savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+	PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+	
+	/*
+	 * Throw away the remaining rows in current result, or all succeeding
+	 * results if skipAll is not FALSE.
+	 */
+	if (skipAll)
+	{
+		while ((res = PQgetResult(conn)) != NULL)
+			PQclear(res);
+	}
+	else if ((res = PQgetResult(conn)) != NULL)
+	{
+		PQclear(res);
+		ret = 1;
+	}
+	
+	PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+	return ret;
+}
+
+
 /*
  * PQdescribePrepared
  *	  Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..3b0520d 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, FALSE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, TRUE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -703,52 +707,55 @@ failure:
 
 /*
  * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
 	char	   *bitmap = std_bitmap;
 	int			i;
+	int			rp;
 	size_t		nbytes;			/* the number of bytes in bitmap  */
 	char		bmap;			/* One byte of the bitmap */
 	int			bitmap_index;	/* Its index */
 	int			bitcnt;			/* number of bits examined in current byte */
 	int			vlen;			/* length of the current field value */
+	char        *errmsg = libpq_gettext("unknown error\n");
 
-	result->binary = binary;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	/* resize row buffer if needed */
+	if (nfields > conn->rowBufLen)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
 		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
+			errmsg = libpq_gettext("out of memory for query result\n");
+			goto error;
 		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+	else
+	{
+		rowbuf = conn->rowBuf;
+	}
+
+	result->binary = binary;
+
+	if (binary)
+	{
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,7 +764,10 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+		{
+			errmsg = libpq_gettext("out of memory for query result\n");
+			goto error;
+		}
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
@@ -771,34 +781,29 @@ getAnotherTuple(PGconn *conn, bool binary)
 	for (i = 0; i < nfields; i++)
 	{
 		if (!(bmap & 0200))
-		{
-			/* if the field value is absent, make it a null string */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-		}
+			vlen = NULL_LEN;
+		else if (pqGetInt(&vlen, 4, conn))
+				goto EOFexit;
 		else
 		{
-			/* get the value length (the first four bytes are for length) */
-			if (pqGetInt(&vlen, 4, conn))
-				goto EOFexit;
 			if (!binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
-			tup[i].len = vlen;
-			/* read in the value */
-			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
+
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip the value */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto EOFexit;
+
 		/* advance the bitmap stuff */
 		bitcnt++;
 		if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +816,53 @@ getAnotherTuple(PGconn *conn, bool binary)
 			bmap <<= 1;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
-
 	if (bitmap != std_bitmap)
 		free(bitmap);
-	return 0;
+	bitmap = NULL;
 
-outOfMemory:
-	/* Replace partially constructed result with an error result */
+	/* tag the row as parsed */
+	conn->inStart = conn->inCursor;
 
+	/* Pass the completed row values to rowProcessor */
+	rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+	if (rp == 1)
+		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
+	else if (rp == 0)
+	{
+		errmsg = result->errMsg;
+		result->errMsg = NULL;
+		if (errmsg == NULL)
+			errmsg = libpq_gettext("unknown error in row processor\n");
+		goto error;
+	}
+
+	errmsg = libpq_gettext("invalid return value from row processor\n");
+
+error:
 	/*
 	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
-
+	resetPQExpBuffer(&conn->errorMessage);
+	
+	/*
+	 * If error message is passed from RowProcessor, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	appendPQExpBufferStr(&conn->errorMessage, errmsg);
+	
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
 	 * do to recover...
 	 */
 	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+
 	conn->asyncStatus = PGASYNC_READY;
+
 	/* Discard the failed message --- good idea? */
 	conn->inStart = conn->inEnd;
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..c8202c2 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, msgLength))
 							return;
+
+						/* getAnotherTuple() moves inStart itself */
+						continue;
 					}
 					else if (conn->result != NULL &&
 							 conn->result->resultStatus == PGRES_FATAL_ERROR)
@@ -613,33 +616,23 @@ failure:
 
 /*
  * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
+	int			rp;
+	char        *errmsg = libpq_gettext("unknown error\n");
 
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
@@ -647,13 +640,22 @@ getAnotherTuple(PGconn *conn, int msgLength)
 
 	if (tupnfields != nfields)
 	{
-		/* Replace partially constructed result with an error result */
-		printfPQExpBuffer(&conn->errorMessage,
-				 libpq_gettext("unexpected field count in \"D\" message\n"));
-		pqSaveErrorResult(conn);
-		/* Discard the failed message by pretending we read it */
-		conn->inCursor = conn->inStart + 5 + msgLength;
-		return 0;
+		errmsg = libpq_gettext("unexpected field count in \"D\" message\n");
+		goto error_and_forward;
+	}
+
+	/* resize row buffer if needed */
+	rowbuf = conn->rowBuf;
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+		{
+			errmsg = libpq_gettext("out of memory for query result\n");
+			goto error_and_forward;
+		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
 	}
 
 	/* Scan the fields */
@@ -661,54 +663,78 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	{
 		/* get the value length */
 		if (pqGetInt(&vlen, 4, conn))
-			return EOF;
-		if (vlen == -1)
 		{
-			/* null field */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-			continue;
+			/*
+			 * Forwarding inCursor does not make no sense when protocol error
+			 */
+			errmsg = libpq_gettext("invalid row contents\n");
+			goto error;
 		}
-		if (vlen < 0)
+		if (vlen == -1)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
-		}
-		tup[i].len = vlen;
-		/* read in the value */
-		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			return EOF;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/* tag the row as parsed, check if correctly */
+	conn->inStart += 5 + msgLength;
+	if (conn->inCursor != conn->inStart)
+	{
+		errmsg = libpq_gettext("invalid row contents\n");
+		goto error;
+	}
 
-	return 0;
+	/* Pass the completed row values to rowProcessor */
+	rp = conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+	if (rp == 1)
+	{
+		/* everything is good */
+		return 0;
+	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
 
-outOfMemory:
+	/* there was some problem */
+	if (rp == 0)
+	{
+		errmsg = result->errMsg;
+		result->errMsg = NULL;
+		if (errmsg == NULL)
+			errmsg = libpq_gettext("unknown error in row processor\n");
+		goto error;
+	}
+
+	errmsg = libpq_gettext("invalid return value from row processor\n");
+	goto error;
+
+error_and_forward:
+	/* Discard the failed message by pretending we read it */
+	conn->inCursor = conn->inStart + 5 + msgLength;
 
+error:
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	resetPQExpBuffer(&conn->errorMessage);
+	appendPQExpBufferStr(&conn->errorMessage, errmsg);
 	pqSaveErrorResult(conn);
-
-	/* Discard the failed message by pretending we read it */
-	conn->inCursor = conn->inStart + 5 + msgLength;
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..810b04e 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGrowValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +427,38 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQresultSetErrMsg.  It is assumed by
+ * caller as out of memory when the error message is not set on
+ * failure. This function is assumed not to throw any exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+								   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +497,7 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void	PQresultSetErrMsg(PGresult *res, const char *msg);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..9cabd20 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -398,7 +398,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +442,14 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+	/*
+	 * Read column data from network buffer.
+	 */
+	PQrowProcessor rowProcessor;/* Function pointer */
+	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+	int rowBufLen;				/* Number of columns allocated in rowBuf */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -560,6 +567,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120224.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..1f91f98 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,329 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PQresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PQresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the row processor function.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.  On
+       failure this function should set the error message
+       with <function>PQresultSetErrMsg</function>.  When non-blocking
+       API is in use, it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed via <function>PQfinish</function>.
+       Processing can also be continued without closing the connection,
+       call <function>getResult</function> on syncronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     Column values of the row to process.  Column values
+	     are located in network buffer, the processor must
+	     copy them out from there.
+	   </para>
+	   <para>
+	     Column values are not null-terminated, so processor cannot
+	     use C string functions on them directly.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+		Discard all the remaining row data
+		after <function>PQexec</function>
+		or <function>PQgetResult</function> exits by the exception raised
+		in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>skipAll</parameter></term>
+	   <listitem>
+	     <para>
+	       Skip remaining rows in current result
+	       if <parameter>skipAll</parameter> is false(0). Skip
+	       remaining rows in current result and all rows in
+	       succeeding results if true(non-zero).
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqresultseterrmsg">
+    <term>
+     <function>PQresultSetErrMsg</function>
+     <indexterm>
+      <primary>PQresultSetErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>PQrowProcessor</type>.  If this message is not set, the
+	caller assumes the error to be `unknown' error.
+<synopsis>
+void PQresultSetErrMsg(PGresult *res, const char *msg)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>PQrowProcessor</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>msg</parameter></term>
+	    <listitem>
+	      <para>
+		Error message. This will be copied internally so there is
+		no need to care of the scope.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it will be overwritten. Set NULL to cancel the the custom
+		message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120224.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..9ce1466 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -715,164 +769,259 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
+
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
 
-	PG_TRY();
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+	if (sinfo->valbuf)
+		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuflen)
+		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+	if (sinfo->cstrs == NULL)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		if (sinfo->valbuf)
+			free(sinfo->valbuf);
+		if (sinfo->valbuflen)
+			free(sinfo->valbuflen);
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
 
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
+				free(sinfo->valbuf[i]);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->cstrs)
+	{
+		free(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      **cstrs = sinfo->cstrs;
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->error_occurred)
+	{
+		PQresultSetErrMsg(res, "storeHandler is called after error\n");
+		return FALSE;
+	}
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		
+		PQresultSetErrMsg(res, "unexpected field count in \"D\" message\n");
+		return FALSE;
+	}
+
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			char *tmp = sinfo->valbuf[i];
+			int tmplen = sinfo->valbuflen[i];
+
+			/*
+			 * Divide calls to malloc and realloc so that things will
+			 * go fine even on the systems of which realloc() does not
+			 * accept NULL as old memory block.
+			 *
+			 * Also try to (re)allocate in bigger steps to
+			 * avoid flood of allocations on weird data.
+			 */
+			if (tmp == NULL)
+			{
+				tmplen = len + 1;
+				if (tmplen < 64)
+					tmplen = 64;
+				tmp = (char *)malloc(tmplen);
+			}
+			else if (tmplen < len + 1)
+			{
+				if (len + 1 > tmplen * 2)
+					tmplen = len + 1;
 				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+					tmplen = tmplen * 2;
+				tmp = (char *)realloc(tmp, tmplen);
+			}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+			/*
+			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
+			 * when realloc returns NULL.
+			 */
+			if (tmp == NULL)
+			{
+				PQresultSetErrMsg(res, "out of memory for query result\n");						return FALSE;
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+			sinfo->valbuf[i] = tmp;
+			sinfo->valbuflen[i] = tmplen;
 
-		PQclear(res);
-	}
-	PG_CATCH();
-	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return TRUE;
 }
 
 /*
early_exit_20120224.difftext/x-patch; charset=us-asciiDownload
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index 2deb432..1f91f98 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7350,7 +7350,17 @@ typedef struct
      <para>
        This function must return 1 for success, and 0 for failure.  On
        failure this function should set the error message
-       with <function>PQresultSetErrMsg</function>.
+       with <function>PQresultSetErrMsg</function>.  When non-blocking
+       API is in use, it can also return 2 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
      </para>
 
      <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index da36bfa..3b0520d 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -827,6 +827,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
 	if (rp == 1)
 		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
 	else if (rp == 0)
 	{
 		errmsg = result->errMsg;
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index 12fef30..c8202c2 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -703,6 +703,11 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* everything is good */
 		return 0;
 	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
 
 	/* there was some problem */
 	if (rp == 0)
#65Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#64)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Feb 24, 2012 at 07:53:14PM +0900, Kyotaro HORIGUCHI wrote:

Hello, this is new version of the patch.

By the way, I would like to ask you one question. What is the
reason why void* should be head or tail of the parameter list?

Aesthetical reasons:

I got it. Thank you.

Last comment - if we drop the plan to make PQsetRowProcessorErrMsg()
usable outside of handler, we can simplify internal usage as well:
the PGresult->rowProcessorErrMsg can be dropped and let's use
->errMsg to transport the error message.

The PGresult is long-lived structure and adding fields for such
temporary usage feels wrong. There is no other libpq code between
row processor and getAnotherTuple, so the use is safe.

I almost agree with it. Plus, I think it is no reason to consider
out of memory as particular because now row processor becomes
generic. But the previous patch does different process for OOM
and others, but I couldn't see obvious reason to do so.

On OOM, the old result is freed to have higher chance that
constructing new result succeeds. But if we want to transport
error message, we need to keep old PGresult around. Thus
two separate paths.

This also means your new code is broken, the errmsg becomes
invalid after pqClearAsyncResult().

It's better to restore old two-path error handling.

- Now row processors should set message for any error status
occurred within. pqAddRow and dblink.c is modified to do so.

I don't think that should be required. Just use a dummy msg.

- getAnotherTuple() sets the error message `unknown error' for
the condition rp == 0 && ->errMsg == NULL.

Ok. I think most client will want to drop connection
on error from rowproc, so exact message does not matter.

- Always forward input cursor and do pqClearAsyncResult() and
pqSaveErrorResult() when rp == 0 in getAnotherTuple()
regardless whether ->errMsg is NULL or not in fe-protocol3.c.

Ok. Although skipping packet on OOM does is dubious,
we will skip all further packets anyway, so let's be
consistent on problems.

There is still one EOF in v3 getAnotherTuple() - pqGetInt(tupnfields),
please turn that one also to protocolerror.

- conn->inStart is already moved to the start point of the next
message when row processor is called. So forwarding inStart in
outOfMemory1 seems redundant. I removed it.

- printfPQExpBuffer() compains for variable message. So use
resetPQExpBuffer() and appendPQExpBufferStr() instead.

Instead use ("%s", errmsg) as argument there. libpq code
is noisy enough, no need to add more.

- I have no idea how to do test for protocol 2...

I have a urge to test with "rm fe-protocol2.c"...

--
marko

#66Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#65)
Let's drop V2 protocol

On Fri, Feb 24, 2012 at 02:11:45PM +0200, Marko Kreen wrote:

On Fri, Feb 24, 2012 at 07:53:14PM +0900, Kyotaro HORIGUCHI wrote:

- I have no idea how to do test for protocol 2...

I have a urge to test with "rm fe-protocol2.c"...

Now I tested with 7.3.21 and the non-error case works fine. Error state
does not - and not because patch is buggy, but because it has never
worked - V2 protocol has no working concept of skipping packets because
pending error state.

On OOM, V2 code does:

conn->inStart = conn->inEnd;

and hopes for the best, but it does not work, because on short results
it moves past ReadyForQuery, on long results it moves into middle of
some packet.

With user-specified row processor, we need to have a working
error state handling. With some surgery, it's possible to
introduce something like

if (conn->result->resultStatus != PGRES_TUPLES_OK)

into various places in the code, to ignore but still
parse the packets. But it will be rather non-trivial patch.

So could we like, uh, not do it and simply drop the V2 code?

Ofcourse, the row-processor patch does not make the situation worse,
so we could just say "don't use custom row processor with V2 servers",
but it still raises the question: "Does anyone have pre-7.4
servers around and if yes, then why does he need to use 9.2 libpq
to access those?"

--
marko

#67Merlin Moncure
mmoncure@gmail.com
In reply to: Marko Kreen (#66)
Re: Let's drop V2 protocol

On Fri, Feb 24, 2012 at 7:52 AM, Marko Kreen <markokr@gmail.com> wrote:

On Fri, Feb 24, 2012 at 02:11:45PM +0200, Marko Kreen wrote:

On Fri, Feb 24, 2012 at 07:53:14PM +0900, Kyotaro HORIGUCHI wrote:

- I have no idea how to do test for protocol 2...

I have a urge to test with "rm fe-protocol2.c"...

Now I tested with 7.3.21 and the non-error case works fine.  Error state
does not - and not because patch is buggy, but because it has never
worked - V2 protocol has no working concept of skipping packets because
pending error state.

On OOM, V2 code does:

  conn->inStart = conn->inEnd;

and hopes for the best, but it does not work, because on short results
it moves past ReadyForQuery, on long results it moves into middle of
some packet.

With user-specified row processor, we need to have a working
error state handling.  With some surgery, it's possible to
introduce something like

  if (conn->result->resultStatus != PGRES_TUPLES_OK)

into various places in the code, to ignore but still
parse the packets.  But it will be rather non-trivial patch.

So could we like, uh, not do it and simply drop the V2 code?

Ofcourse, the row-processor patch does not make the situation worse,
so we could just say "don't use custom row processor with V2 servers",
but it still raises the question: "Does anyone have pre-7.4
servers around and if yes, then why does he need to use 9.2 libpq
to access those?"

I think it's plausible that very old client libraries could connect to
a modern server. But it's pretty unlikely to have a 9.2 app contact
an ancient server IMO.

merlin

#68Marko Kreen
markokr@gmail.com
In reply to: Merlin Moncure (#67)
Re: Let's drop V2 protocol

On Fri, Feb 24, 2012 at 4:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Fri, Feb 24, 2012 at 7:52 AM, Marko Kreen <markokr@gmail.com> wrote:

So could we like, uh, not do it and simply drop the V2 code?

I think it's plausible that very old client libraries could connect to
a modern server.  But it's pretty unlikely to have a 9.2 app contact
an ancient server IMO.

We can drop it from libpq but keep the server-side support,
the codebase is different.

--
marko

#69Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#46)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 14, 2012 at 01:39:06AM +0200, Marko Kreen wrote:

I tried imaging some sort of getFoo() style API for fetching in-flight
row data, but I always ended up with "rewrite libpq" step, so I feel
it's not productive to go there.

Instead I added simple feature: rowProcessor can return '2',
in which case getAnotherTuple() does early exit without setting
any error state. In user side it appears as PQisBusy() returned
with TRUE result. All pointers stay valid, so callback can just
stuff them into some temp area. ATM there is not indication though
whether the exit was due to callback or other reasons, so user
must detect it based on whether new temp pointers appeares,
which means those must be cleaned before calling PQisBusy() again.
This actually feels like feature, those must not stay around
after single call.

To see how iterating a resultset would look like I implemented PQgetRow()
function using the currently available public API:

/*
* Wait and return next row in resultset.
*
* returns:
* 1 - row data available, the pointers are owned by PGconn
* 0 - result done, use PQgetResult() to get final result
* -1 - some problem, check connection error
*/
int PQgetRow(PGconn *db, PGresult **hdr_p, PGrowValue **row_p);

code at:

https://github.com/markokr/libpq-rowproc-demos/blob/master/getrow.c

usage:

/* send query */
if (!PQsendQuery(db, q))
die(db, "PQsendQuery");

/* fetch rows one-by-one */
while (1) {
rc = PQgetRow(db, &hdr, &row);
if (rc > 0)
proc_row(hdr, row);
else if (rc == 0)
break;
else
die(db, "streamResult");
}
/* final PGresult, either PGRES_TUPLES_OK or error */
final = PQgetResult(db);

It does not look like it can replace the public callback API,
because it does not work with fully-async connections well.
But it *does* continue the line of synchronous APIs:

- PQexec(): last result only
- PQsendQuery() + PQgetResult(): each result separately
- PQsendQuery() + PQgetRow() + PQgetResult(): each row separately

Also the generic implementation is slightly messy, because
it cannot assume anything about surrounding usage patterns,
while same code living in some user framework can. But
for simple users who just want to synchronously iterate
over resultset, it might be good enough API?

It does have a inconsistency problem - the row data does
not live in PGresult but in custom container. Proper
API pattern would be to have PQgetRow() that gives
functional PGresult, but that is not interesting for
high-performace users. Solutions:

- rename to PQrecvRow()
- rename to PQrecvRow() and additionally provide PQgetRow()
- Don't bother, let users implement it themselves via callback API.

Comments?

--
marko

#70Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#68)
Re: Let's drop V2 protocol

Marko Kreen <markokr@gmail.com> writes:

On Fri, Feb 24, 2012 at 4:26 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

I think it's plausible that very old client libraries could connect to
a modern server. �But it's pretty unlikely to have a 9.2 app contact
an ancient server IMO.

We can drop it from libpq but keep the server-side support,
the codebase is different.

I believe it's still somewhat common among JDBC users to force
V2-protocol connections as a workaround for over-eager prepared
statement planning. Although I think the issue they're trying to dodge
will be fixed as of 9.2, that doesn't mean the server-side support can
go away.

As for taking it out of libpq, I would vote against doing that as long
as we have pg_dump support for pre-7.4 servers. Now, I think it would
be entirely reasonable to kill pg_dump's support for pre-7.3 servers,
because that would simplify life in numerous ways for pg_dump; but 7.4
was not a big compatibility break for pg_dump so it seems a bit
arbitrary to kill its support for 7.3 specifically.

As near as I can tell the argument here is basically that we don't want
to try to fix unfixable bugs in the V2-protocol code. Well, yeah,
they're unfixable ... why do you think we invented V3? But they are
what they are, and as long as you don't run into those limitations the
protocol does still work. There are plenty of features that require V3
already, so I see no reason not to classify the row-processor stuff as
one more.

regards, tom lane

#71Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#70)
Re: Let's drop V2 protocol

On Fri, Feb 24, 2012 at 11:46:19AM -0500, Tom Lane wrote:

As for taking it out of libpq, I would vote against doing that as long
as we have pg_dump support for pre-7.4 servers. Now, I think it would
be entirely reasonable to kill pg_dump's support for pre-7.3 servers,
because that would simplify life in numerous ways for pg_dump; but 7.4
was not a big compatibility break for pg_dump so it seems a bit
arbitrary to kill its support for 7.3 specifically.

So we need to maintain V2 protocol in libpq to specifically support 7.3?

What's so special about 7.3?

As near as I can tell the argument here is basically that we don't want
to try to fix unfixable bugs in the V2-protocol code. Well, yeah,
they're unfixable ... why do you think we invented V3? But they are
what they are, and as long as you don't run into those limitations the
protocol does still work. There are plenty of features that require V3
already, so I see no reason not to classify the row-processor stuff as
one more.

Agreed. But still - having to reorg the never-used V2 code in parallel
to actual development in V3 code makes all changes in the area
unnecessary harder.

--
marko

#72Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#71)
Re: Let's drop V2 protocol

Marko Kreen <markokr@gmail.com> writes:

On Fri, Feb 24, 2012 at 11:46:19AM -0500, Tom Lane wrote:

As for taking it out of libpq, I would vote against doing that as long
as we have pg_dump support for pre-7.4 servers. Now, I think it would
be entirely reasonable to kill pg_dump's support for pre-7.3 servers,
because that would simplify life in numerous ways for pg_dump; but 7.4
was not a big compatibility break for pg_dump so it seems a bit
arbitrary to kill its support for 7.3 specifically.

So we need to maintain V2 protocol in libpq to specifically support 7.3?

What's so special about 7.3?

From pg_dump's viewpoint, the main thing about 7.3 is it's where we
added server-tracked object dependencies. It also has schemas, though
I don't recall at the moment how much effort pg_dump has to spend on
faking those for an older server.

regards, tom lane

#73Merlin Moncure
mmoncure@gmail.com
In reply to: Tom Lane (#70)
Re: Let's drop V2 protocol

On Fri, Feb 24, 2012 at 10:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I believe it's still somewhat common among JDBC users to force
V2-protocol connections as a workaround for over-eager prepared
statement planning.  Although I think the issue they're trying to dodge
will be fixed as of 9.2, that doesn't mean the server-side support can
go away.

good point. I just went through this. The JDBC driver has a
'prepareThreshold' directive that *mostly* disables server side
prepared statements so you can work with tools like pgbouncer.
Unfortunately, it doesn't work for explicit transaction control
statements so that you have to downgrade jdbc protocol or patch the
driver (which is really the better way to go, but I can understand why
that won't fly for a lot of people).

merlin

#74Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#69)
1 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Feb 24, 2012 at 05:46:16PM +0200, Marko Kreen wrote:

- rename to PQrecvRow() and additionally provide PQgetRow()

I tried it and it seems to work as API - there is valid behaviour
for both sync and async connections.

Sync connection - PQgetRow() waits for data from network:

if (!PQsendQuery(db, q))
die(db, "PQsendQuery");
while (1) {
r = PQgetRow(db);
if (!r)
break;
handle(r);
PQclear(r);
}
r = PQgetResult(db);

Async connection - PQgetRow() does PQisBusy() loop internally,
but does not read from network:

on read event:
PQconsumeInput(db);
while (1) {
r = PQgetRow(db);
if (!r)
break;
handle(r);
PQclear(r);
}
if (!PQisBusy(db))
r = PQgetResult(db);
else
waitForMoredata();

As it seems to simplify life for quite many potential users,
it seems worth including in libpq properly.

Attached patch is on top of v20120223 of row-processor
patch. Only change in general code is allowing
early exit for syncronous connection, as we now have
valid use-case for it.

--
marko

Attachments:

pqgetrow.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 0087b43..b2779a8 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4115,6 +4115,111 @@ int PQflush(PGconn *conn);
    read-ready and then read the response as described above.
   </para>
 
+  <para>
+   Above-mentioned functions always wait until full resultset has arrived
+   before makeing row data available as PGresult.  Sometimes it's
+   more useful to process rows as soon as the arrive from network.
+   For that, following functions can be used:
+   <variablelist>
+    <varlistentry id="libpq-pqgetrow">
+     <term>
+      <function>PQgetRow</function>
+      <indexterm>
+       <primary>PQgetRow</primary>
+      </indexterm>
+     </term>
+
+     <listitem>
+      <para>
+       Waits for the next row from a prior
+       <function>PQsendQuery</function>,
+       <function>PQsendQueryParams</function>,
+       <function>PQsendQueryPrepared</function> call, and returns it.
+       A null pointer is returned when no more rows are available or
+       some error happened.
+<synopsis>
+PGresult *PQgetRow(PGconn *conn);
+</synopsis>
+      </para>
+
+      <para>
+       If this function returns non-NULL result, it is a
+       <structname>PGresult</structname> that contains exactly 1 row.
+       It needs to be freed later with <function>PQclear</function>.
+      </para>
+      <para>
+       On synchronous connection, the function will wait for more
+       data from network until all resultset is done.  So it returns
+       NULL only if resultset has completely received or some error
+       happened.  In both cases, call <function>PQgetResult</function>
+       next to get final status.
+      </para>
+
+      <para>
+       On asynchronous connection the function does not read more data
+       from network.   So after NULL call <function>PQisBusy</function>
+       to see whether final <structname>PGresult</structname> is avilable
+       or more data needs to be read from network via
+       <function>PQconsumeInput</function>.  Do not call
+       <function>PQisBusy</function> before <function>PQgetRow</function>
+       has returned NULL, as <function>PQisBusy</function> will parse
+       any available rows and add them to main <function>PGresult</function>
+       that will be returned later by <function>PQgetResult</function>.
+      </para>
+
+     </listitem>
+    </varlistentry>
+
+    <varlistentry id="libpq-pqrecvrow">
+     <term>
+      <function>PQrecvRow</function>
+      <indexterm>
+       <primary>PQrecvRow</primary>
+      </indexterm>
+     </term>
+
+     <listitem>
+      <para>
+       Get row data without constructing PGresult for it.  This is the
+       underlying function for <function>PQgetRow</function>.
+<synopsis>
+int PQrecvRow(PGconn *conn, PGresult **hdr_p, PGrowValue **row_p);
+</synopsis>
+      </para>
+
+      <para>
+       It returns row data as pointers to network buffer.
+       All structures are owned by <application>libpq</application>'s
+       <structname>PGconn</structname> and must not be freed or stored
+       by user.  Instead row data should be copied to user structures, before
+       any <application>libpq</application> result-processing function
+       is called.
+      </para>
+      <para>
+       It returns 1 when row data is available.
+       Argument <parameter>hdr_p</parameter> will contain pointer
+       to empty <structname>PGresult</structname> that describes
+       row contents.  Actual data is in <parameter>row_p</parameter>.
+       For the description of structure <structname>PGrowValue</structname>
+       see <xref linkend="libpq-altrowprocessor">.
+      </para>
+      <para>It returns 0 when no more rows are avalable.  On synchronous
+       connection, it means resultset is fully arrived.  Call
+       <function>PQgetResult</function> to get final status.
+       On asynchronous connection it can also mean more data
+       needs to be read from network.  Call <function>PQisBusy</function>
+       to see whether <function>PQgetResult</function>
+       or <function>PQconsumeInput</function> needs to be called next.
+      </para>
+      <para>
+       it returns -1 if some network error occured.
+       See connection status functions in <xref linkend="libpq-status">.
+      </para>
+
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
  </sect1>
 
  <sect1 id="libpq-cancel">
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 7e02497..e9cbe2f 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -164,3 +164,5 @@ PQsetRowProcessor	  161
 PQgetRowProcessor	  162
 PQsetRowProcessorErrMsg	  163
 PQskipResult		  164
+PQrecvRow		  165
+PQgetRow		  166
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index cd287cd..df19824 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1959,6 +1959,131 @@ PQskipResult(PGconn *conn, int skipAll)
 	return ret;
 }
 
+/* temp buffer to pass pointers */
+struct RecvRowBuf
+{
+	PGresult *temp_hdr;
+	PGrowValue *temp_row;
+};
+
+/* set pointers, do early exit from PQisBusy() */
+static int
+recv_row_proc(PGresult *hdr, PGrowValue *row, void *arg)
+{
+	struct RecvRowBuf *buf = arg;
+	buf->temp_hdr = hdr;
+	buf->temp_row = row;
+	return 2;
+}
+
+/*
+ * PQrecvRow
+ *
+ * Wait and return next row in resultset.
+ *
+ * Returns:
+ *   1 - got row data, the pointers are owned by PGconn
+ *   0 - no rows available, either resultset complete
+ *       or more data needed (async-only)
+ *  -1 - some problem, check connection error
+ */
+int
+PQrecvRow(PGconn *conn, PGresult **hdr_p, PGrowValue **row_p)
+{
+	struct RecvRowBuf buf;
+	int rc;
+	int ret = -1;
+	PQrowProcessor oldproc;
+	void *oldarg;
+
+	*hdr_p = NULL;
+	*row_p = NULL;
+
+	/* the query may be still pending, send it */
+	while (1)
+	{
+		rc = PQflush(conn);
+		if (rc < 0)
+			return -1;
+		if (rc == 0)
+			break;
+		if (pqWait(FALSE, TRUE, conn))
+			return -1;
+	}
+
+	/* replace existing row processor */
+	oldproc = PQgetRowProcessor(conn, &oldarg);
+	PQsetRowProcessor(conn, recv_row_proc, &buf);
+
+	/* read data */
+	while (1)
+	{
+		buf.temp_hdr = NULL;
+		buf.temp_row = NULL;
+
+		/* done with resultset? */
+		if (!PQisBusy(conn))
+			break;
+
+		/* new row available? */
+		if (buf.temp_row)
+		{
+			*hdr_p = buf.temp_hdr;
+			*row_p = buf.temp_row;
+			ret = 1;
+			goto done;
+		}
+
+		/*
+		 * More data needed
+		 */
+
+		if (pqIsnonblocking(conn))
+			/* let user worry about new data */
+			break;
+		if (pqWait(TRUE, FALSE, conn))
+			goto done;
+		if (!PQconsumeInput(conn))
+			goto done;
+	}
+	/* no more rows available */
+	ret = 0;
+done:
+	/* restore old row processor */
+	PQsetRowProcessor(conn, oldproc, oldarg);
+	return ret;
+}
+
+/*
+ * PQgetRow
+ *		Returns next available row for resultset.  NULL means
+ *		no row available, either resultset is done
+ *		or more data needed (only if async connection).
+ */
+PGresult *
+PQgetRow(PGconn *conn)
+{
+	PGresult *hdr, *res;
+	PGrowValue *row;
+
+	/* check if row is available */
+	if (PQrecvRow(conn, &hdr, &row) != 1)
+		return NULL;
+
+	/* Now make PGresult out of it */
+	res = PQcopyResult(hdr, PG_COPYRES_ATTRS);
+	if (!res)
+		goto NoMem;
+	if (pqAddRow(res, row, NULL))
+		return res;
+
+NoMem:
+	PQclear(res);
+	printfPQExpBuffer(&conn->errorMessage,
+					  libpq_gettext("out of memory\n"));
+	pqSaveErrorResult(conn);
+	return NULL;
+}
 
 /*
  * PQdescribePrepared
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index 6578019..c922ab7 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -820,7 +820,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 	rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
 	if (rp == 1)
 		return 0;
-	else if (rp == 2 && pqIsnonblocking(conn))
+	else if (rp == 2)
 		/* processor requested early exit */
 		return EOF;
 	else if (rp != 0)
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index a19ee88..aee7768 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -697,7 +697,7 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* everything is good */
 		return 0;
 	}
-	if (rp == 2 && pqIsnonblocking(conn))
+	if (rp == 2)
 	{
 		/* processor requested early exit */
 		return EOF;
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index b7370e2..9180ed6 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -400,6 +400,10 @@ extern int PQsendQueryPrepared(PGconn *conn,
 					int resultFormat);
 extern PGresult *PQgetResult(PGconn *conn);
 
+/* fetch single row from resultset */
+extern PGresult *PQgetRow(PGconn *conn);
+extern int PQrecvRow(PGconn *conn, PGresult **hdr_p, PGrowValue **row_p);
+
 /* Routines for managing an asynchronous query */
 extern int	PQisBusy(PGconn *conn);
 extern int	PQconsumeInput(PGconn *conn);
#75Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#65)
Re: Speed dblink using alternate libpq tuple storage

Hello,

On OOM, the old result is freed to have higher chance that
constructing new result succeeds. But if we want to transport
error message, we need to keep old PGresult around. Thus
two separate paths.

Ok, I understood the aim. But now we can use non-local exit to do
that for not only asynchronous reading (PQgetResult()) but
synchronous (PQexec()). If we should provide a means other than
exceptions to do that, I think it should be usable for both
syncronous and asynchronous reading. conn->asyncStatus seems to
be used for the case.

Wow is the modification below?

- getAnotherTuple() now returns 0 to continue as before, and 1
instead of EOF to signal EOF state, and 2 to instruct to exit
immediately.

- pqParseInput3 set conn->asyncStatus to PGASYNC_BREAK for the
last case,

- then PQgetResult() returns immediately when
asyncStatus == PGASYNC_TUPLES_BREAK after parseInput() retunes.

- and PQexecFinish() returns immediately if PQgetResult() returns
with aysncStatys == PGASYNC_TUPLES_BREAK.

- PGgetResult() sets asyncStatus = PGRES_TUPLES_OK if called with
asyncStatus == PGRES_TUPLES_BREAK

- New libpq API PQisBreaked(PGconn*) returns if asyncStatus ==
PGRES_TUPLES_BREAK can be used to check if the transfer is breaked.

Instead use ("%s", errmsg) as argument there. libpq code
is noisy enough, no need to add more.

Ok. I will do so.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#76Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#75)
Re: Speed dblink using alternate libpq tuple storage

On Mon, Feb 27, 2012 at 05:20:30PM +0900, Kyotaro HORIGUCHI wrote:

Hello,

On OOM, the old result is freed to have higher chance that
constructing new result succeeds. But if we want to transport
error message, we need to keep old PGresult around. Thus
two separate paths.

Ok, I understood the aim. But now we can use non-local exit to do
that for not only asynchronous reading (PQgetResult()) but
synchronous (PQexec()). If we should provide a means other than
exceptions to do that, I think it should be usable for both
syncronous and asynchronous reading. conn->asyncStatus seems to
be used for the case.

Wow is the modification below?

- getAnotherTuple() now returns 0 to continue as before, and 1
instead of EOF to signal EOF state, and 2 to instruct to exit
immediately.

- pqParseInput3 set conn->asyncStatus to PGASYNC_BREAK for the
last case,

- then PQgetResult() returns immediately when
asyncStatus == PGASYNC_TUPLES_BREAK after parseInput() retunes.

- and PQexecFinish() returns immediately if PQgetResult() returns
with aysncStatys == PGASYNC_TUPLES_BREAK.

- PGgetResult() sets asyncStatus = PGRES_TUPLES_OK if called with
asyncStatus == PGRES_TUPLES_BREAK

- New libpq API PQisBreaked(PGconn*) returns if asyncStatus ==
PGRES_TUPLES_BREAK can be used to check if the transfer is breaked.

I don't get where are you going with such changes. Are you trying to
make V2 code survive row-processor errors? (Only V2 has concept of
"EOF state".) Then forget about it. It's more important to not
destabilize V3 code.

And error from row processor is not something special from other errors.
Why does it need special state? And note that adding new PGASYNC or
PGRES state needs review of all if() and switch() statements in the
code that use those fields... So there must serious need for it.
At the moment I don't see such need.

I just asked you to replace ->rowProcessorErrMsg with ->errMsg
to get rid of unnecessary field.

But if you want to do bigger cleanup, then you can instead make
PQsetRowProcessorErrMsg() more generic:

PQsetErrorMessage(PGconn *conn, const char *msg)

Current code has the tiny problem that it adds new special state between
PQsetRowProcessorErrMsg() and return from callback to getAnotherTuple()
where getAnotherTuple() sets actual error state. When the callback
exits via exception, the state can leak to other code. It should not
break anything, but it's still annoying.

Also, with the PQgetRow() patch, the need for doing complex processing
under callback has decreased and the need to set error outside callback
has increased.

As a bonus, such generic error-setting function would lose any extra
special state introduced by row-processor patch.

Previously I mentioned that callback would need to have additional
PGconn* argument to make connection available to callback to use such
generic error setting function, but now I think it does not need it -
simple callbacks don't need to set errors and complex callback can make
the PGconn available via Param. Eg. the internal callback should set
Param to PGconn, instead keeping NULL there.

--
marko

#77Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#76)
Re: Speed dblink using alternate libpq tuple storage

Hello.

I will show you fixed version patch later, please wait for a
while.

======

It's more important to not destabilize V3 code.

Ok, I withdraw that agreeing with that point. And I noticed that
the proposal before is totally a crap becuase I have mixed up
asyncStatus with resultStatus in it.

And error from row processor is not something special from
other errors. Why does it need special state?

I'm sorry to have upset the discussion. What I wanted there is a
means other than exceptions to exit out of PQexec() by row
processor trigger without discarding the result built halfway,
like async.

I just asked you to replace ->rowProcessorErrMsg with ->errMsg
to get rid of unnecessary field.

Ok, I will remove extra code.

Also, with the PQgetRow() patch, the need for doing complex processing
under callback has decreased and the need to set error outside callback
has increased.

As a bonus, such generic error-setting function would lose any extra
special state introduced by row-processor patch.

That sounds nice. I will show you the patch without it for the
present, then try to include.

Previously I mentioned that callback would need to have additional
PGconn* argument to make connection available to callback to use such
generic error setting function, but now I think it does not need it -
simple callbacks don't need to set errors and complex callback can make
the PGconn available via Param. Eg. the internal callback should set
Param to PGconn, instead keeping NULL there.

I agree with it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#78Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#77)
4 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

This is the new version of the patch.
It is not rebased to the HEAD because of a build error.

It's better to restore old two-path error handling.

I restorerd "OOM and save result" route. But it seems needed to
get back any amount of memory on REAL OOM as the comment in
original code says.

So I restored the meaning of rp == 0 && errMsg == NULL as REAL
OOM which is to throw the async result away and the result will
be preserved if errMsg is not NULL. `unknown error' has been
removed.

As the result, if row processor returns 0 the parser skips to the
end of rows and returns the working result or an error result
according to whether errMsg is set or not in the row processor.

I don't think that should be required. Just use a dummy msg.

Considering the above, pqAddRow is also restored to leave errMsg
NULL on OOM.

There is still one EOF in v3 getAnotherTuple() -
pqGetInt(tupnfields), please turn that one also to
protocolerror.

pqGetInt() returns EOF only when it wants additional reading from
network if the parameter `bytes' is appropreate. Non-zero return
from it seems should be handled as EOF, not a protocol error. The
one point I had modified bugilly is also restored. The so-called
'protocol error' has been vanished eventually.

Instead use ("%s", errmsg) as argument there. libpq code
is noisy enough, no need to add more.

done

Is there someting left undone?

By the way, I noticed that dblink always says that the current
connection is 'unnamed' in messages the errors in
dblink_record_internal@dblink. I could see that
dblink_record_internal defines the local variable conname = NULL
and pass it to dblink_res_error to display the error message. But
no assignment on it in the function.

It seemed properly shown when I added the code to set conname
from PG_GETARG_TEXT_PP(0) if available, in other words do that
just after DBLINK_GET_CONN/DBLINK_GET_NAMED_CONN's. It seems the
dblink's manner... This is not included in this patch.

Furthurmore dblink_res_error looks only into returned PGresult to
display the error and always says only `Error occurred on dblink
connection..: could not execute query'..

Is it right to consider this as follows?

- dblink is wrong in error handling. A client of libpq should
see PGconn by PQerrorMessage() if (or regardless of whether?)
PGresult says nothing about error.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120228.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..239edb8 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,7 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQsetRowProcessor	  161
+PQgetRowProcessor	  162
+PQresultSetErrMsg	  163
+PQskipResult		  164
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)
 	conn->wait_ssl_try = false;
 #endif
 
+	/* set default row processor */
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
 	 * buffers on Unix systems.  That way, when we are sending a large amount
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)
 	initPQExpBuffer(&conn->errorMessage);
 	initPQExpBuffer(&conn->workBuffer);
 
+	/* set up initial row buffer */
+	conn->rowBufLen = 32;
+	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+
 	if (conn->inBuffer == NULL ||
 		conn->outBuffer == NULL ||
+		conn->rowBuf == NULL ||
 		PQExpBufferBroken(&conn->errorMessage) ||
 		PQExpBufferBroken(&conn->workBuffer))
 	{
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)
 		free(conn->inBuffer);
 	if (conn->outBuffer)
 		free(conn->outBuffer);
+	if (conn->rowBuf)
+		free(conn->rowBuf);
 	termPQExpBuffer(&conn->errorMessage);
 	termPQExpBuffer(&conn->workBuffer);
 
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..ce58778 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddRow(PGresult *res, PGrowValue *columns, void *param);
 
 
 /* ----------------
@@ -701,7 +702,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +756,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,6 +827,87 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+	if (param)
+		*param = conn->rowProcessorParam;
+
+	return conn->rowProcessor;
+}
+
+/*
+ * PQresultSetErrMsg
+ *    Set the error message to PGresult.
+ *
+ *    You can replace the previous message by alternative mes, or clear
+ *    it with NULL.
+ */
+void
+PQresultSetErrMsg(PGresult *res, const char *msg)
+{
+	if (msg)
+		res->errMsg = pqResultStrdup(res, msg);
+	else
+		res->errMsg = NULL;
+}
+
+/*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns TRUE if OK, FALSE if not enough memory to add the row.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+	PGresAttValue *tup;
+	int			nfields = res->numAttributes;
+	int			i;
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL)
+		return FALSE;
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+				return FALSE;
+
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
+	return pqAddTuple(res, tup);
+}
+
+/*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
  *	  Returns TRUE if OK, FALSE if not enough memory to add the row
@@ -1223,7 +1303,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -1831,6 +1910,55 @@ PQexecFinish(PGconn *conn)
 	return lastResult;
 }
 
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+	return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+	PQrowProcessor savedRowProcessor;
+	void * savedRowProcParam;
+	PGresult *res;
+	int ret = 0;
+
+	/* save the current row processor settings and set dummy processor */
+	savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+	PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+	
+	/*
+	 * Throw away the remaining rows in current result, or all succeeding
+	 * results if skipAll is not FALSE.
+	 */
+	if (skipAll)
+	{
+		while ((res = PQgetResult(conn)) != NULL)
+			PQclear(res);
+	}
+	else if ((res = PQgetResult(conn)) != NULL)
+	{
+		PQclear(res);
+		ret = 1;
+	}
+	
+	PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+	return ret;
+}
+
+
 /*
  * PQdescribePrepared
  *	  Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..36773cb 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, FALSE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, TRUE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -703,52 +707,55 @@ failure:
 
 /*
  * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
 	char	   *bitmap = std_bitmap;
 	int			i;
+	int			rp;
 	size_t		nbytes;			/* the number of bytes in bitmap  */
 	char		bmap;			/* One byte of the bitmap */
 	int			bitmap_index;	/* Its index */
 	int			bitcnt;			/* number of bits examined in current byte */
 	int			vlen;			/* length of the current field value */
+	char        *errmsg = libpq_gettext("unknown error\n");
 
-	result->binary = binary;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	/* resize row buffer if needed */
+	if (nfields > conn->rowBufLen)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
 		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
+			errmsg = libpq_gettext("out of memory for query result\n");
+			goto error_clearresult;
 		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+	else
+	{
+		rowbuf = conn->rowBuf;
+	}
+
+	result->binary = binary;
+
+	if (binary)
+	{
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,11 +764,15 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+		{
+			errmsg = libpq_gettext("out of memory for query result\n");
+			goto error_clearresult;
+		}
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
-		goto EOFexit;
+		goto error_clearresult;
+
 
 	/* Scan the fields */
 	bitmap_index = 0;
@@ -771,34 +782,29 @@ getAnotherTuple(PGconn *conn, bool binary)
 	for (i = 0; i < nfields; i++)
 	{
 		if (!(bmap & 0200))
-		{
-			/* if the field value is absent, make it a null string */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-		}
+			vlen = NULL_LEN;
+		else if (pqGetInt(&vlen, 4, conn))
+				goto EOFexit;
 		else
 		{
-			/* get the value length (the first four bytes are for length) */
-			if (pqGetInt(&vlen, 4, conn))
-				goto EOFexit;
 			if (!binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
-			tup[i].len = vlen;
-			/* read in the value */
-			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
+
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip the value */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto EOFexit;
+
 		/* advance the bitmap stuff */
 		bitcnt++;
 		if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +817,64 @@ getAnotherTuple(PGconn *conn, bool binary)
 			bmap <<= 1;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
-
 	if (bitmap != std_bitmap)
 		free(bitmap);
-	return 0;
+	bitmap = NULL;
 
-outOfMemory:
-	/* Replace partially constructed result with an error result */
+	/* tag the row as parsed */
+	conn->inStart = conn->inCursor;
+
+	/* Pass the completed row values to rowProcessor */
+	rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+	if (rp == 1)
+		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
+	else if (rp == 0)
+	{
+		errmsg = result->errMsg;
+		result->errMsg = NULL;
+		if (errmsg == NULL)
+		{
+			/* If errmsg == NULL, we assume that the row processor
+			 * notices out of memory. We should immediately free any
+			 * space to go forward. */
+			errmsg = "out of memory";
+			goto error_clearresult;
+		}
+		/*
+		 * We assume that some ancestor which has a relation with the
+		 * row processor wants the result built halfway when row
+		 * processor sets any errMsg for rp == 0.
+		 */
+		goto error_saveresult;
+	}
 
+	errmsg = libpq_gettext("invalid return value from row processor\n");
+	/* FALL THROUGH */
+error_clearresult:
 	/*
 	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
 
+error_saveresult:
+	/*
+	 * If error message is passed from RowProcessor, set it into
+	 * PGconn, assume out of memory if not.
+	 */
+	printfPQExpBuffer(&conn->errorMessage, "%s", errmsg);
+	
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
 	 * do to recover...
 	 */
 	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+
 	conn->asyncStatus = PGASYNC_READY;
+
 	/* Discard the failed message --- good idea? */
 	conn->inStart = conn->inEnd;
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..2693ce0 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, msgLength))
 							return;
+
+						/* getAnotherTuple() moves inStart itself */
+						continue;
 					}
 					else if (conn->result != NULL &&
 							 conn->result->resultStatus == PGRES_FATAL_ERROR)
@@ -613,33 +616,23 @@ failure:
 
 /*
  * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
+	int			rp;
+	char        *errmsg = libpq_gettext("unknown error\n");
 
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
@@ -647,13 +640,22 @@ getAnotherTuple(PGconn *conn, int msgLength)
 
 	if (tupnfields != nfields)
 	{
-		/* Replace partially constructed result with an error result */
-		printfPQExpBuffer(&conn->errorMessage,
-				 libpq_gettext("unexpected field count in \"D\" message\n"));
-		pqSaveErrorResult(conn);
-		/* Discard the failed message by pretending we read it */
-		conn->inCursor = conn->inStart + 5 + msgLength;
-		return 0;
+		errmsg = libpq_gettext("unexpected field count in \"D\" message\n");
+		goto error_and_forward;
+	}
+
+	/* resize row buffer if needed */
+	rowbuf = conn->rowBuf;
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+		{
+			errmsg = libpq_gettext("out of memory for query result\n");
+			goto error_and_forward;
+		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
 	}
 
 	/* Scan the fields */
@@ -662,53 +664,88 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* get the value length */
 		if (pqGetInt(&vlen, 4, conn))
 			return EOF;
+
 		if (vlen == -1)
-		{
-			/* null field */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-			continue;
-		}
-		if (vlen < 0)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
-		}
-		tup[i].len = vlen;
-		/* read in the value */
-		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			return EOF;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/* tag the row as parsed, check if correctly */
+	conn->inStart += 5 + msgLength;
+	if (conn->inCursor != conn->inStart)
+	{
+		errmsg = libpq_gettext("invalid row contents\n");
+		goto error_clearresult;
+	}
 
-	return 0;
+	/* Pass the completed row values to rowProcessor */
+	rp = conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
+	if (rp == 1)
+	{
+		/* everything is good */
+		return 0;
+	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
+
+	/* there was some problem */
+	if (rp == 0)
+	{
+		/*
+		 * Unlink errMsg from result here to use it after
+		 * pqClearAsyncResult() is called.
+		 */
+		errmsg = result->errMsg;
+		result->errMsg = NULL;
+		if (errmsg == NULL)
+		{
+			/* If errmsg == NULL, we assume that the row processor
+			 * notices out of memory. We should immediately free any
+			 * space to go forward. */
+			errmsg = "out of memory";
+			goto error_clearresult;
+		}
+		/*
+		 * We assume that some ancestor which has a relation with the
+		 * row processor wants the result built halfway when row
+		 * processor sets any errMsg for rp == 0.
+		 */
+		goto error_saveresult;
+	}
 
-outOfMemory:
+	errmsg = libpq_gettext("invalid return value from row processor\n");
+	goto error_clearresult;
+
+error_and_forward:
+	/* Discard the failed message by pretending we read it */
+	conn->inCursor = conn->inStart + 5 + msgLength;
 
+error_clearresult:
+	pqClearAsyncResult(conn);
+	
+error_saveresult:
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
-	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	printfPQExpBuffer(&conn->errorMessage, "%s", errmsg);
 	pqSaveErrorResult(conn);
-
-	/* Discard the failed message by pretending we read it */
-	conn->inCursor = conn->inStart + 5 + msgLength;
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..810b04e 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGrowValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +427,38 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one
+ * pointing to particular column data in network buffer.
+ * This function is supposed to copy data out from there
+ * and store somewhere.  NULL is signified with len<0.
+ *
+ * This function must return 1 for success and must return 0 for
+ * failure and may set error message by PQresultSetErrMsg.  It is assumed by
+ * caller as out of memory when the error message is not set on
+ * failure. This function is assumed not to throw any exception.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+								   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
@@ -454,6 +497,7 @@ extern char *PQcmdTuples(PGresult *res);
 extern char *PQgetvalue(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetlength(const PGresult *res, int tup_num, int field_num);
 extern int	PQgetisnull(const PGresult *res, int tup_num, int field_num);
+extern void	PQresultSetErrMsg(PGresult *res, const char *msg);
 extern int	PQnparams(const PGresult *res);
 extern Oid	PQparamtype(const PGresult *res, int param_num);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..9cabd20 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -398,7 +398,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +442,14 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+	/*
+	 * Read column data from network buffer.
+	 */
+	PQrowProcessor rowProcessor;/* Function pointer */
+	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+	int rowBufLen;				/* Number of columns allocated in rowBuf */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -560,6 +567,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120228.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..5ef89e7 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,332 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PGresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PGresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the row processor function.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and 0 for failure.  On
+       failure the caller assumes the error as an out of memory and
+       releases the PGresult under construction. If you set any
+       message with <function>PQresultSetErrMsg</function>, it is set
+       as the PGconn's error message and the PGresult will be
+       preserved.  When non-blocking API is in use, it can also return
+       2 for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed
+       via <function>PQfinish</function>.  Processing can also be
+       continued without closing the connection,
+       call <function>getResult</function> on synchronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     Column values of the row to process.  Column values
+	     are located in network buffer, the processor must
+	     copy them out from there.
+	   </para>
+	   <para>
+	     Column values are not null-terminated, so processor cannot
+	     use C string functions on them directly.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+		Discard all the remaining row data
+		after <function>PQexec</function>
+		or <function>PQgetResult</function> exits by the exception raised
+		in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>skipAll</parameter></term>
+	   <listitem>
+	     <para>
+	       Skip remaining rows in current result
+	       if <parameter>skipAll</parameter> is false(0). Skip
+	       remaining rows in current result and all rows in
+	       succeeding results if true(non-zero).
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqresultseterrmsg">
+    <term>
+     <function>PQresultSetErrMsg</function>
+     <indexterm>
+      <primary>PQresultSetErrMsg</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+	Set the message for the error occurred
+	in <type>PQrowProcessor</type>.  If this message is not set, the
+	caller assumes the error to be `unknown' error.
+<synopsis>
+void PQresultSetErrMsg(PGresult *res, const char *msg)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	  <varlistentry>
+	    <term><parameter>res</parameter></term>
+	    <listitem>
+	      <para>
+		A pointer to the <type>PGresult</type> object
+		passed to <type>PQrowProcessor</type>.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	  <varlistentry>
+	    <term><parameter>msg</parameter></term>
+	    <listitem>
+	      <para>
+		Error message. This will be copied internally so there is
+		no need to care of the scope.
+	      </para>
+	      <para>
+		If <parameter>res</parameter> already has a message previously
+		set, it will be overwritten. Set NULL to cancel the the custom
+		message.
+	      </para>
+	    </listitem>
+	  </varlistentry>
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120228.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..8bf0759 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -660,6 +714,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		{
 			/* text,text,bool */
 			DBLINK_GET_CONN;
+			conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			fail = PG_GETARG_BOOL(2);
 		}
@@ -675,6 +730,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 			else
 			{
 				DBLINK_GET_CONN;
+				conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 				sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			}
 		}
@@ -705,6 +761,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		else
 			/* shouldn't happen */
 			elog(ERROR, "wrong number of arguments");
+
+		conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	}
 
 	if (!conn)
@@ -715,164 +773,257 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
+
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
+	{
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
 
-	PG_TRY();
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
+
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
+
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+	if (sinfo->valbuf)
+		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuflen)
+		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
+
+	if (sinfo->cstrs == NULL)
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		if (sinfo->valbuf)
+			free(sinfo->valbuf);
+		if (sinfo->valbuflen)
+			free(sinfo->valbuflen);
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
 
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-			is_sql_cmd = false;
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
+		{
+			if (sinfo->valbuf[i])
+				free(sinfo->valbuf[i]);
 		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->cstrs)
+	{
+		free(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      **cstrs = sinfo->cstrs;
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	if (sinfo->error_occurred)
+	{
+		PQresultSetErrMsg(res, "storeHandler is called after error\n");
+		return FALSE;
+	}
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
+
+		/* This error will be processed in
+		 * dblink_record_internal(). So do not set error message
+		 * here. */
+		
+		PQresultSetErrMsg(res, "unexpected field count in \"D\" message\n");
+		return FALSE;
+	}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
+	{
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			char *tmp = sinfo->valbuf[i];
+			int tmplen = sinfo->valbuflen[i];
+
+			/*
+			 * Divide calls to malloc and realloc so that things will
+			 * go fine even on the systems of which realloc() does not
+			 * accept NULL as old memory block.
+			 *
+			 * Also try to (re)allocate in bigger steps to
+			 * avoid flood of allocations on weird data.
+			 */
+			if (tmp == NULL)
+			{
+				tmplen = len + 1;
+				if (tmplen < 64)
+					tmplen = 64;
+				tmp = (char *)malloc(tmplen);
+			}
+			else if (tmplen < len + 1)
+			{
+				if (len + 1 > tmplen * 2)
+					tmplen = len + 1;
+				else
+					tmplen = tmplen * 2;
+				tmp = (char *)realloc(tmp, tmplen);
 			}
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+			/*
+			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
+			 * when realloc returns NULL.
+			 */
+			if (tmp == NULL)
+				return FALSE;  /* Inform out of memory to the caller */
 
-		PQclear(res);
-	}
-	PG_CATCH();
-	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+			sinfo->valbuf[i] = tmp;
+			sinfo->valbuflen[i] = tmplen;
+
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return TRUE;
 }
 
 /*
early_exit_20120228.difftext/x-patch; charset=us-asciiDownload
diff --git b/doc/src/sgml/libpq.sgml a/doc/src/sgml/libpq.sgml
index 1245e85..5ef89e7 100644
--- b/doc/src/sgml/libpq.sgml
+++ a/doc/src/sgml/libpq.sgml
@@ -7353,7 +7353,16 @@ typedef struct
        releases the PGresult under construction. If you set any
        message with <function>PQresultSetErrMsg</function>, it is set
        as the PGconn's error message and the PGresult will be
-       preserved.
+       preserved.  When non-blocking API is in use, it can also return
+       2 for early exit from <function>PQisBusy</function> function.
+       The supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
      </para>
 
      <para>
diff --git b/src/interfaces/libpq/fe-protocol2.c a/src/interfaces/libpq/fe-protocol2.c
index 6555f85..36773cb 100644
--- b/src/interfaces/libpq/fe-protocol2.c
+++ a/src/interfaces/libpq/fe-protocol2.c
@@ -828,6 +828,9 @@ getAnotherTuple(PGconn *conn, bool binary)
 	rp= conn->rowProcessor(result, rowbuf, conn->rowProcessorParam);
 	if (rp == 1)
 		return 0;
+	else if (rp == 2 && pqIsnonblocking(conn))
+		/* processor requested early exit */
+		return EOF;
 	else if (rp == 0)
 	{
 		errmsg = result->errMsg;
diff --git b/src/interfaces/libpq/fe-protocol3.c a/src/interfaces/libpq/fe-protocol3.c
index 3725de2..2693ce0 100644
--- b/src/interfaces/libpq/fe-protocol3.c
+++ a/src/interfaces/libpq/fe-protocol3.c
@@ -698,6 +698,11 @@ getAnotherTuple(PGconn *conn, int msgLength)
 		/* everything is good */
 		return 0;
 	}
+	if (rp == 2 && pqIsnonblocking(conn))
+	{
+		/* processor requested early exit */
+		return EOF;
+	}
 
 	/* there was some problem */
 	if (rp == 0)
#79Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#78)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Feb 28, 2012 at 05:04:44PM +0900, Kyotaro HORIGUCHI wrote:

There is still one EOF in v3 getAnotherTuple() -
pqGetInt(tupnfields), please turn that one also to
protocolerror.

pqGetInt() returns EOF only when it wants additional reading from
network if the parameter `bytes' is appropreate. Non-zero return
from it seems should be handled as EOF, not a protocol error. The
one point I had modified bugilly is also restored. The so-called
'protocol error' has been vanished eventually.

But it's broken in V3 protocol - getAnotherTuple() will be called
only if the packet is fully read. If the packet contents do not
agree with packet header, it's protocol error. Only valid EOF
return in V3 getAnotherTuple() is when row processor asks
for early exit.

Is there someting left undone?

* Convert old EOFs to protocol errors in V3 getAnotherTuple()

* V2 getAnotherTuple() can leak PGresult when handling custom
error from row processor.

* remove pqIsnonblocking(conn) check when row processor returned 2.
I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
on sync connection.

* It seems the return codes from callback should be remapped,
(0, 1, 2) is unusual pattern. Better would be:

-1 - error
0 - stop parsing / early exit ("I'm not done yet")
1 - OK ("I'm done with the row")

* Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().
Main problem is that it needs to be synced with error handling
in rest of libpq, which is unlike the rest of row processor patch,
which consists only of local changes. All solutions here
are either ugly hacks or too complex to be part of this patch.

Also considering that we have working exceptions and PQgetRow,
I don't see much need for custom error messages. If really needed,
it should be introduced as separate patch, as the area of code it
affects is completely different.

Currently the custom error messaging seems to be the blocker for
this patch, because of raised complexity when implementing it and
when reviewing it. Considering how unimportant the provided
functionality is, compared to rest of the patch, I think we should
simply drop it.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then. This allows internal pqAddRow
to set regular "out of memory" error. Otherwise give generic
"row processor error".

By the way, I noticed that dblink always says that the current
connection is 'unnamed' in messages the errors in
dblink_record_internal@dblink. I could see that
dblink_record_internal defines the local variable conname = NULL
and pass it to dblink_res_error to display the error message. But
no assignment on it in the function.

It seemed properly shown when I added the code to set conname
from PG_GETARG_TEXT_PP(0) if available, in other words do that
just after DBLINK_GET_CONN/DBLINK_GET_NAMED_CONN's. It seems the
dblink's manner... This is not included in this patch.

Furthurmore dblink_res_error looks only into returned PGresult to
display the error and always says only `Error occurred on dblink
connection..: could not execute query'..

Is it right to consider this as follows?

- dblink is wrong in error handling. A client of libpq should
see PGconn by PQerrorMessage() if (or regardless of whether?)
PGresult says nothing about error.

Yes, it seems like bug.

--
marko

#80Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#79)
Re: Speed dblink using alternate libpq tuple storage

Hello, I'm sorry for the abesnce.

But it's broken in V3 protocol - getAnotherTuple() will be called
only if the packet is fully read. If the packet contents do not
agree with packet header, it's protocol error. Only valid EOF
return in V3 getAnotherTuple() is when row processor asks
for early exit.

Original code of getAnotherTuple returns EOF when the bytes to
be read is not fully loaded. I understand that this was
inappropriately (redundant checks?) written at least for the
pqGetInt() for the field length in getAnotherTuple. But I don't
understand how to secure the rows (or table data) fully loaded at
the point of getAnotherTuple called...

Nevertheles the first pgGetInt() can return EOF when the
previsous row is fully loaded but the next row is not loaded so
the EOF-rerurn seems necessary even if the each row will passed
after fully loaded.

* Convert old EOFs to protocol errors in V3 getAnotherTuple()

Ok. I will do that.

* V2 getAnotherTuple() can leak PGresult when handling custom
error from row processor.

mmm. I will confirm it.

* remove pqIsnonblocking(conn) check when row processor returned 2.
I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
on sync connection.

mmm. EOF from getAnotherTuple makes PQgetResult try furthur
reading until asyncStatus != PGASYNC_BUSY as far as I saw. And It
seemed to do so when I tried to remove 'return 2'. I think that
it is needed at least one additional state for asyncStatus to
work EOF as desied here.

* It seems the return codes from callback should be remapped,
(0, 1, 2) is unusual pattern. Better would be:

-1 - error
0 - stop parsing / early exit ("I'm not done yet")
1 - OK ("I'm done with the row")

I almost agree with it. I will consider the suggestion related to
pqAddRow together.

* Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().
Main problem is that it needs to be synced with error handling
in rest of libpq, which is unlike the rest of row processor patch,
which consists only of local changes. All solutions here
are either ugly hacks or too complex to be part of this patch.

Ok, I will take your advice.

Also considering that we have working exceptions and PQgetRow,
I don't see much need for custom error messages. If really needed,
it should be introduced as separate patch, as the area of code it
affects is completely different.

I agree with it.

Currently the custom error messaging seems to be the blocker for
this patch, because of raised complexity when implementing it and
when reviewing it. Considering how unimportant the provided
functionality is, compared to rest of the patch, I think we should
simply drop it.

Ok.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then. This allows internal pqAddRow
to set regular "out of memory" error. Otherwise give generic
"row processor error".

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#81Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#80)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Mar 06, 2012 at 11:13:45AM +0900, Kyotaro HORIGUCHI wrote:

But it's broken in V3 protocol - getAnotherTuple() will be called
only if the packet is fully read. If the packet contents do not
agree with packet header, it's protocol error. Only valid EOF
return in V3 getAnotherTuple() is when row processor asks
for early exit.

Original code of getAnotherTuple returns EOF when the bytes to
be read is not fully loaded. I understand that this was
inappropriately (redundant checks?) written at least for the
pqGetInt() for the field length in getAnotherTuple. But I don't
understand how to secure the rows (or table data) fully loaded at
the point of getAnotherTuple called...

Look into pqParseInput3():

if (avail < msgLength)
{
...
return;
}

* remove pqIsnonblocking(conn) check when row processor returned 2.
I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
on sync connection.

mmm. EOF from getAnotherTuple makes PQgetResult try furthur
reading until asyncStatus != PGASYNC_BUSY as far as I saw. And It
seemed to do so when I tried to remove 'return 2'. I think that
it is needed at least one additional state for asyncStatus to
work EOF as desied here.

No. It's valid to do PQisBusy() + PQconsumeInput() loop until
PQisBusy() returns 0. Otherwise, yes, PQgetResult() will
block until final result is available. But thats OK.

--
marko

#82Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#80)
Re: Speed dblink using alternate libpq tuple storage

Hello,

But I don't understand how to secure the rows (or table data)
fully loaded at the point of getAnotherTuple called...

I found how pqParseInput ensures the entire message is loaded
before getAnotherTuple called.

fe-protocol3.c:107
| avail = conn->inEnd - conn->inCursor;
| if (avail < msgLength)
| {
| if (pqCheckInBufferSpace(conn->inCursor + (size_t)msgLength, conn))

So now I convinced that the whole row data is loaded at the point
that getAnotherTuple is called. I agree that getAnotherTuple
should not return EOF to request for unloaded part of the
message.

Please wait for a while for the new patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#83Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#82)
Re: Speed dblink using alternate libpq tuple storage

Hello,

Nevertheless, the problem that exit from parseInput() caused by
non-zero return of getAnotherTuple() results in immediate
re-enter into getAnotherTuple() via parseInput() and no other
side effect is still there. But I will do that in the next patch,
though.

So now I convinced that the whole row data is loaded at the point

^am

that getAnotherTuple is called. I agree that getAnotherTuple
should not return EOF to request for unloaded part of the
message.

Please wait for a while for the new patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#84Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#80)
3 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, this is new version of the patch.

# early_exit.diff is not included for this time and maybe also
# later. The set of the return values of PQrowProcessor looks
# unnatural if the 0 is removed.

* Convert old EOFs to protocol errors in V3 getAnotherTuple()

done.

* V2 getAnotherTuple() can leak PGresult when handling custom
error from row processor.

Custom error message is removed from both V2 and V3.

* remove pqIsnonblocking(conn) check when row processor returned 2.
I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
on sync connection.

done. This affects PQisBusy, but PQgetResult won't be affected as
far as I see. I have no idea for PQconsumeInput()..

* It seems the return codes from callback should be remapped,
(0, 1, 2) is unusual pattern. Better would be:

-1 - error
0 - stop parsing / early exit ("I'm not done yet")
1 - OK ("I'm done with the row")

done.

This might be described more precisely as follows,

| -1 - error - erase result and change result status to
| - FATAL_ERROR All the rest rows in current result
| - will skipped(consumed).
| 0 - stop parsing / early exit ("I'm not done yet")
| - getAnotherTuple returns EOF without dropping PGresult.
# - We expect PQisBusy(), PQconsumeInput()(?) and
# - PQgetResult() to exit immediately and we can
# - call PQgetResult(), PQskipResult() or
# - PQisBusy() after.
| 1 - OK ("I'm done with the row")
| - save result and getAnotherTuple returns 0.

The lines prefixed with '#' is the desirable behavior I have
understood from the discussion so far. And I doubt that it works
as we expected for PQgetResult().

* Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().

done.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then. This allows internal pqAddRow
to set regular "out of memory" error. Otherwise give generic
"row processor error".

Current implement seems already doing this in
parseInput3(). Could you give me further explanation?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

libpq_rowproc_20120307.patchtext/x-patch; charset=us-asciiDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 1af8df6..a6418ec 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -160,3 +160,6 @@ PQconnectStartParams      157
 PQping                    158
 PQpingParams              159
 PQlibVersion              160
+PQsetRowProcessor	  	  161
+PQgetRowProcessor	  	  162
+PQskipResult		  	  163
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 27a9805..4605e49 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2693,6 +2693,9 @@ makeEmptyPGconn(void)
 	conn->wait_ssl_try = false;
 #endif
 
+	/* set default row processor */
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	/*
 	 * We try to send at least 8K at a time, which is the usual size of pipe
 	 * buffers on Unix systems.  That way, when we are sending a large amount
@@ -2711,8 +2714,13 @@ makeEmptyPGconn(void)
 	initPQExpBuffer(&conn->errorMessage);
 	initPQExpBuffer(&conn->workBuffer);
 
+	/* set up initial row buffer */
+	conn->rowBufLen = 32;
+	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+
 	if (conn->inBuffer == NULL ||
 		conn->outBuffer == NULL ||
+		conn->rowBuf == NULL ||
 		PQExpBufferBroken(&conn->errorMessage) ||
 		PQExpBufferBroken(&conn->workBuffer))
 	{
@@ -2814,6 +2822,8 @@ freePGconn(PGconn *conn)
 		free(conn->inBuffer);
 	if (conn->outBuffer)
 		free(conn->outBuffer);
+	if (conn->rowBuf)
+		free(conn->rowBuf);
 	termPQExpBuffer(&conn->errorMessage);
 	termPQExpBuffer(&conn->workBuffer);
 
@@ -5078,3 +5088,4 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index b743566..161d210 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -66,6 +66,7 @@ static PGresult *PQexecFinish(PGconn *conn);
 static int PQsendDescribe(PGconn *conn, char desc_type,
 			   const char *desc_target);
 static int	check_field_number(const PGresult *res, int field_num);
+static int	pqAddRow(PGresult *res, PGrowValue *columns, void *param);
 
 
 /* ----------------
@@ -701,7 +702,6 @@ pqClearAsyncResult(PGconn *conn)
 	if (conn->result)
 		PQclear(conn->result);
 	conn->result = NULL;
-	conn->curTuple = NULL;
 }
 
 /*
@@ -756,7 +756,6 @@ pqPrepareAsyncResult(PGconn *conn)
 	 */
 	res = conn->result;
 	conn->result = NULL;		/* handing over ownership to caller */
-	conn->curTuple = NULL;		/* just in case */
 	if (!res)
 		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
 	else
@@ -828,6 +827,71 @@ pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
 }
 
 /*
+ * PQsetRowProcessor
+ *   Set function that copies column data out from network buffer.
+ */
+void
+PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+{
+	conn->rowProcessor = (func ? func : pqAddRow);
+	conn->rowProcessorParam = param;
+}
+
+/*
+ * PQgetRowProcessor
+ *   Get current row processor of conn. set pointer to current parameter for
+ *   row processor to param if not NULL.
+ */
+PQrowProcessor
+PQgetRowProcessor(PGconn *conn, void **param)
+{
+	if (param)
+		*param = conn->rowProcessorParam;
+
+	return conn->rowProcessor;
+}
+
+/*
+ * pqAddRow
+ *	  add a row to the PGresult structure, growing it if necessary
+ *	  Returns 1 if OK, -1 if error occurred.
+ */
+static int
+pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+{
+	PGresAttValue *tup;
+	int			nfields = res->numAttributes;
+	int			i;
+
+	tup = (PGresAttValue *)
+		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+	if (tup == NULL)
+		return -1;
+
+	for (i = 0 ; i < nfields ; i++)
+	{
+		tup[i].len = columns[i].len;
+		if (tup[i].len == NULL_LEN)
+		{
+			tup[i].value = res->null_field;
+		}
+		else
+		{
+			bool isbinary = (res->attDescs[i].format != 0);
+			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+			if (tup[i].value == NULL)
+				return -1;
+
+			memcpy(tup[i].value, columns[i].value, tup[i].len);
+			/* We have to terminate this ourselves */
+			tup[i].value[tup[i].len] = '\0';
+		}
+	}
+
+	return (pqAddTuple(res, tup) ? 1 : -1);
+}
+
+/*
  * pqAddTuple
  *	  add a row pointer to the PGresult structure, growing it if necessary
  *	  Returns TRUE if OK, FALSE if not enough memory to add the row
@@ -1223,7 +1287,6 @@ PQsendQueryStart(PGconn *conn)
 
 	/* initialize async result-accumulation state */
 	conn->result = NULL;
-	conn->curTuple = NULL;
 
 	/* ready to send command message */
 	return true;
@@ -1831,6 +1894,55 @@ PQexecFinish(PGconn *conn)
 	return lastResult;
 }
 
+
+/*
+ * Do-nothing row processor for PQskipResult
+ */
+static int
+dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+{
+	return 1;
+}
+
+/*
+ * Exaust remaining Data Rows in curret conn.
+ * 
+ * Exaust current result if skipAll is false and all succeeding results if
+ * true.
+ */
+int
+PQskipResult(PGconn *conn, int skipAll)
+{
+	PQrowProcessor savedRowProcessor;
+	void * savedRowProcParam;
+	PGresult *res;
+	int ret = 0;
+
+	/* save the current row processor settings and set dummy processor */
+	savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+	PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+	
+	/*
+	 * Throw away the remaining rows in current result, or all succeeding
+	 * results if skipAll is not FALSE.
+	 */
+	if (skipAll)
+	{
+		while ((res = PQgetResult(conn)) != NULL)
+			PQclear(res);
+	}
+	else if ((res = PQgetResult(conn)) != NULL)
+	{
+		PQclear(res);
+		ret = 1;
+	}
+	
+	PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+
+	return ret;
+}
+
+
 /*
  * PQdescribePrepared
  *	  Obtain information about a previously prepared statement
diff --git a/src/interfaces/libpq/fe-misc.c b/src/interfaces/libpq/fe-misc.c
index ce0eac3..d11cb3c 100644
--- a/src/interfaces/libpq/fe-misc.c
+++ b/src/interfaces/libpq/fe-misc.c
@@ -219,6 +219,25 @@ pqGetnchar(char *s, size_t len, PGconn *conn)
 }
 
 /*
+ * pqGetnchar:
+ *	skip len bytes in input buffer.
+ */
+int
+pqSkipnchar(size_t len, PGconn *conn)
+{
+	if (len > (size_t) (conn->inEnd - conn->inCursor))
+		return EOF;
+
+	conn->inCursor += len;
+
+	if (conn->Pfdebug)
+		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+				(unsigned long) len);
+
+	return 0;
+}
+
+/*
  * pqPutnchar:
  *	write exactly len bytes to the current message
  */
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index a7c3899..7fcb10f 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -569,6 +569,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, FALSE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -585,6 +587,8 @@ pqParseInput2(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, TRUE))
 							return;
+						/* getAnotherTuple moves inStart itself */
+						continue;
 					}
 					else
 					{
@@ -703,19 +707,18 @@ failure:
 
 /*
  * parseInput subroutine to read a 'B' or 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
+ * It fills rowbuf with column pointers and then calls row processor.
  * Returns: 0 if completed message, EOF if error or not enough data yet.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, bool binary)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 
 	/* the backend sends us a bitmap of which attributes are null */
 	char		std_bitmap[64]; /* used unless it doesn't fit */
@@ -726,29 +729,32 @@ getAnotherTuple(PGconn *conn, bool binary)
 	int			bitmap_index;	/* Its index */
 	int			bitcnt;			/* number of bits examined in current byte */
 	int			vlen;			/* length of the current field value */
+	char        *errmsg = "unknown error\n";
 
-	result->binary = binary;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
+	/* resize row buffer if needed */
+	if (nfields > conn->rowBufLen)
 	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-
-		/*
-		 * If it's binary, fix the column format indicators.  We assume the
-		 * backend will consistently send either B or D, not a mix.
-		 */
-		if (binary)
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
 		{
-			for (i = 0; i < nfields; i++)
-				result->attDescs[i].format = 1;
+			errmsg = "out of memory for query result\n";
+			goto error_clearresult;
 		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
+	}
+	else
+	{
+		rowbuf = conn->rowBuf;
+	}
+
+	result->binary = binary;
+
+	if (binary)
+	{
+		for (i = 0; i < nfields; i++)
+			result->attDescs[i].format = 1;
 	}
-	tup = conn->curTuple;
 
 	/* Get the null-value bitmap */
 	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
@@ -757,11 +763,15 @@ getAnotherTuple(PGconn *conn, bool binary)
 	{
 		bitmap = (char *) malloc(nbytes);
 		if (!bitmap)
-			goto outOfMemory;
+		{
+			errmsg = "out of memory for query result\n";
+			goto error_clearresult;
+		}
 	}
 
 	if (pqGetnchar(bitmap, nbytes, conn))
-		goto EOFexit;
+		goto error_clearresult;
+
 
 	/* Scan the fields */
 	bitmap_index = 0;
@@ -771,34 +781,29 @@ getAnotherTuple(PGconn *conn, bool binary)
 	for (i = 0; i < nfields; i++)
 	{
 		if (!(bmap & 0200))
-		{
-			/* if the field value is absent, make it a null string */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-		}
+			vlen = NULL_LEN;
+		else if (pqGetInt(&vlen, 4, conn))
+				goto EOFexit;
 		else
 		{
-			/* get the value length (the first four bytes are for length) */
-			if (pqGetInt(&vlen, 4, conn))
-				goto EOFexit;
 			if (!binary)
 				vlen = vlen - 4;
 			if (vlen < 0)
 				vlen = 0;
-			if (tup[i].value == NULL)
-			{
-				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
-				if (tup[i].value == NULL)
-					goto outOfMemory;
-			}
-			tup[i].len = vlen;
-			/* read in the value */
-			if (vlen > 0)
-				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-					goto EOFexit;
-			/* we have to terminate this ourselves */
-			tup[i].value[vlen] = '\0';
 		}
+
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip the value */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+			goto EOFexit;
+
 		/* advance the bitmap stuff */
 		bitcnt++;
 		if (bitcnt == BITS_PER_BYTE)
@@ -811,33 +816,51 @@ getAnotherTuple(PGconn *conn, bool binary)
 			bmap <<= 1;
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
-
 	if (bitmap != std_bitmap)
 		free(bitmap);
-	return 0;
+	bitmap = NULL;
+
+	/* tag the row as parsed */
+	conn->inStart = conn->inCursor;
+
+	/* Pass the completed row values to rowProcessor */
+	switch (conn->rowProcessor(result, rowbuf, conn->rowProcessorParam))
+	{
+		case 1:
+			/* everything is good */
+			return 0;
 
-outOfMemory:
-	/* Replace partially constructed result with an error result */
+		case 0:
+			/* processor requested early exit */
+			return EOF;
+			
+		case -1:
+			errmsg = "error in row processor";
+			goto error_clearresult;
 
+		default:
+			/* Illega reurn code */
+			errmsg = "invalid return value from row processor\n";
+			goto error_clearresult;
+	}
+
+error_clearresult:
 	/*
 	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
 	 * there's not enough memory to concatenate messages...
 	 */
 	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
 
+	printfPQExpBuffer(&conn->errorMessage, "%s", libpq_gettext(errmsg));
+	
 	/*
 	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
 	 * do to recover...
 	 */
 	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+
 	conn->asyncStatus = PGASYNC_READY;
+
 	/* Discard the failed message --- good idea? */
 	conn->inStart = conn->inEnd;
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 892dcbc..b51b04c 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -327,6 +327,9 @@ pqParseInput3(PGconn *conn)
 						/* Read another tuple of a normal query response */
 						if (getAnotherTuple(conn, msgLength))
 							return;
+
+						/* getAnotherTuple() moves inStart itself */
+						continue;
 					}
 					else if (conn->result != NULL &&
 							 conn->result->resultStatus == PGRES_FATAL_ERROR)
@@ -613,47 +616,49 @@ failure:
 
 /*
  * parseInput subroutine to read a 'D' (row data) message.
- * We add another tuple to the existing PGresult structure.
- * Returns: 0 if completed message, EOF if error or not enough data yet.
+ * It fills rowbuf with column pointers and then calls row processor.
+ * Returns: 0 if completed message, 1 if error.
  *
  * Note that if we run out of data, we have to suspend and reprocess
- * the message after more data is received.  We keep a partially constructed
- * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
+ * the message after more data is received.
  */
 static int
 getAnotherTuple(PGconn *conn, int msgLength)
 {
 	PGresult   *result = conn->result;
 	int			nfields = result->numAttributes;
-	PGresAttValue *tup;
+	PGrowValue  *rowbuf;
 	int			tupnfields;		/* # fields from tuple */
 	int			vlen;			/* length of the current field value */
 	int			i;
-
-	/* Allocate tuple space if first time for this data message */
-	if (conn->curTuple == NULL)
-	{
-		conn->curTuple = (PGresAttValue *)
-			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
-		if (conn->curTuple == NULL)
-			goto outOfMemory;
-		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
-	}
-	tup = conn->curTuple;
+	char        *errmsg = "unknown error\n";
 
 	/* Get the field count and make sure it's what we expect */
 	if (pqGetInt(&tupnfields, 2, conn))
-		return EOF;
+	{
+		/* Whole the message must be loaded on the buffer here */
+		errmsg = "protocol error\n";
+		goto error_saveresult;
+	}
 
 	if (tupnfields != nfields)
 	{
-		/* Replace partially constructed result with an error result */
-		printfPQExpBuffer(&conn->errorMessage,
-				 libpq_gettext("unexpected field count in \"D\" message\n"));
-		pqSaveErrorResult(conn);
-		/* Discard the failed message by pretending we read it */
-		conn->inCursor = conn->inStart + 5 + msgLength;
-		return 0;
+		errmsg = "unexpected field count in \"D\" message\n";
+		goto error_and_forward;
+	}
+
+	/* resize row buffer if needed */
+	rowbuf = conn->rowBuf;
+	if (nfields > conn->rowBufLen)
+	{
+		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
+		if (!rowbuf)
+		{
+			errmsg = "out of memory for query result\n";
+			goto error_and_forward;
+		}
+		conn->rowBuf = rowbuf;
+		conn->rowBufLen = nfields;
 	}
 
 	/* Scan the fields */
@@ -661,54 +666,78 @@ getAnotherTuple(PGconn *conn, int msgLength)
 	{
 		/* get the value length */
 		if (pqGetInt(&vlen, 4, conn))
-			return EOF;
-		if (vlen == -1)
 		{
-			/* null field */
-			tup[i].value = result->null_field;
-			tup[i].len = NULL_LEN;
-			continue;
+			/* Whole the message must be loaded on the buffer here */
+			errmsg = "protocol error\n";
+			goto error_saveresult;
 		}
-		if (vlen < 0)
+
+		if (vlen == -1)
+			vlen = NULL_LEN;
+		else if (vlen < 0)
 			vlen = 0;
-		if (tup[i].value == NULL)
-		{
-			bool		isbinary = (result->attDescs[i].format != 0);
 
-			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
-			if (tup[i].value == NULL)
-				goto outOfMemory;
+		/*
+		 * rowbuf[i].value always points to the next address of the
+		 * length field even if the value is NULL, to allow safe
+		 * size estimates and data copy.
+		 */
+		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+		rowbuf[i].len = vlen;
+
+		/* Skip to the next length field */
+		if (vlen > 0 && pqSkipnchar(vlen, conn))
+		{
+			/* Whole the message must be loaded on the buffer here */
+			errmsg = "protocol error\n";
+			goto error_saveresult;
 		}
-		tup[i].len = vlen;
-		/* read in the value */
-		if (vlen > 0)
-			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
-				return EOF;
-		/* we have to terminate this ourselves */
-		tup[i].value[vlen] = '\0';
 	}
 
-	/* Success!  Store the completed tuple in the result */
-	if (!pqAddTuple(result, tup))
-		goto outOfMemory;
-	/* and reset for a new message */
-	conn->curTuple = NULL;
+	/* tag the row as parsed, check if correctly */
+	conn->inStart += 5 + msgLength;
+	if (conn->inCursor != conn->inStart)
+	{
+		errmsg = "invalid row contents\n";
+		goto error_clearresult;
+	}
+
+	/* Pass the completed row values to rowProcessor */
+	switch (conn->rowProcessor(result, rowbuf, conn->rowProcessorParam))
+	{
+		case 1:
+			/* everything is good */
+			return 0;
+
+		case 0:
+			/* processor requested early exit - stop parsing without error*/
+			return EOF;
+
+		case -1:
+			errmsg = "error in row processor";
+			goto error_clearresult;
+
+		default:
+			/* Illega reurn code */
+			errmsg = "invalid return value from row processor\n";
+			goto error_clearresult;
+	}
 
-	return 0;
 
-outOfMemory:
+error_and_forward:
+	/* Discard the failed message by pretending we read it */
+	conn->inCursor = conn->inStart + 5 + msgLength;
 
+error_clearresult:
+	pqClearAsyncResult(conn);
+	
+error_saveresult:
 	/*
 	 * Replace partially constructed result with an error result. First
 	 * discard the old result to try to win back some memory.
 	 */
-	pqClearAsyncResult(conn);
-	printfPQExpBuffer(&conn->errorMessage,
-					  libpq_gettext("out of memory for query result\n"));
+	printfPQExpBuffer(&conn->errorMessage, "%s", libpq_gettext(errmsg));
 	pqSaveErrorResult(conn);
-
-	/* Discard the failed message by pretending we read it */
-	conn->inCursor = conn->inStart + 5 + msgLength;
 	return 0;
 }
 
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index ef26ab9..e1d3339 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -149,6 +149,17 @@ typedef struct pgNotify
 	struct pgNotify *next;		/* list link */
 } PGnotify;
 
+/* PGrowValue points a column value of in network buffer.
+ * Value is a string without null termination and length len.
+ * NULL is represented as len < 0, value points then to place
+ * where value would have been.
+ */
+typedef struct pgRowValue
+{
+	int			len;			/* length in bytes of the value */
+	char	   *value;			/* actual value, without null termination */
+} PGrowValue;
+
 /* Function types for notice-handling callbacks */
 typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
 typedef void (*PQnoticeProcessor) (void *arg, const char *message);
@@ -416,6 +427,38 @@ extern PGPing PQping(const char *conninfo);
 extern PGPing PQpingParams(const char *const * keywords,
 			 const char *const * values, int expand_dbname);
 
+/*
+ * Typedef for alternative row processor.
+ *
+ * Columns array will contain PQnfields() entries, each one pointing
+ * to particular column data in network buffer.  This function is
+ * supposed to copy data out from there and store somewhere.  NULL is
+ * signified with len<0.
+ *
+ * This function must return 1 for success and -1 for failure and the
+ * caller relreases the current PGresult for the case. Returning 0
+ * instructs libpq to exit immediately from the topmost libpq function
+ * without releasing PGresult under work.
+ */
+typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                              void *param);
+
+/*
+ * Set alternative row data processor for PGconn.
+ *
+ * By registering this function, pg_result disables its own result
+ * store and calls it for rows one by one.
+ *
+ * func is row processor function. See the typedef RowProcessor.
+ *
+ * rowProcessorParam is the contextual variable that passed to
+ * RowProcessor.
+ */
+extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+								   void *rowProcessorParam);
+extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+extern int  PQskipResult(PGconn *conn, int skipAll);
+
 /* Force the write buffer to be written (or at least try) */
 extern int	PQflush(PGconn *conn);
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 987311e..9cabd20 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -398,7 +398,6 @@ struct pg_conn
 
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
-	PGresAttValue *curTuple;	/* tuple currently being read */
 
 #ifdef USE_SSL
 	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
@@ -443,6 +442,14 @@ struct pg_conn
 
 	/* Buffer for receiving various parts of messages */
 	PQExpBufferData workBuffer; /* expansible string */
+
+	/*
+	 * Read column data from network buffer.
+	 */
+	PQrowProcessor rowProcessor;/* Function pointer */
+	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+	int rowBufLen;				/* Number of columns allocated in rowBuf */
 };
 
 /* PGcancel stores all data necessary to cancel a connection. A copy of this
@@ -560,6 +567,7 @@ extern int	pqGets(PQExpBuffer buf, PGconn *conn);
 extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
 extern int	pqPuts(const char *s, PGconn *conn);
 extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+extern int	pqSkipnchar(size_t len, PGconn *conn);
 extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
 extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
 extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
libpq_rowproc_doc_20120307.patchtext/x-patch; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 72c9384..e9233bd 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7233,6 +7233,281 @@ int PQisthreadsafe();
  </sect1>
 
 
+ <sect1 id="libpq-altrowprocessor">
+  <title>Alternative row processor</title>
+
+  <indexterm zone="libpq-altrowprocessor">
+   <primary>PGresult</primary>
+   <secondary>PGconn</secondary>
+  </indexterm>
+
+  <para>
+   As the standard usage, rows are stored into <type>PGresult</type>
+   until full resultset is received.  Then such completely-filled
+   <type>PGresult</type> is passed to user.  This behavior can be
+   changed by registering alternative row processor function,
+   that will see each row data as soon as it is received
+   from network.  It has the option of processing the data
+   immediately, or storing it into custom container.
+  </para>
+
+  <para>
+   Note - as row processor sees rows as they arrive, it cannot know
+   whether the SQL statement actually finishes successfully on server
+   or not.  So some care must be taken to get proper
+   transactionality.
+  </para>
+
+  <variablelist>
+   <varlistentry id="libpq-pqsetrowprocessor">
+    <term>
+     <function>PQsetRowProcessor</function>
+     <indexterm>
+      <primary>PQsetRowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       Sets a callback function to process each row.
+<synopsis>
+void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+</synopsis>
+     </para>
+     
+     <para>
+       <variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object to set the row processor function.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>func</parameter></term>
+	   <listitem>
+	     <para>
+	       Storage handler function to set. NULL means to use the
+	       default processor.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+	       A pointer to contextual parameter passed
+	       to <parameter>func</parameter>.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+       </variablelist>
+     </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqrowprocessor">
+    <term>
+     <type>PQrowProcessor</type>
+     <indexterm>
+      <primary>PQrowProcessor</primary>
+     </indexterm>
+    </term>
+
+    <listitem>
+     <para>
+       The type for the row processor callback function.
+<synopsis>
+int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+
+typedef struct
+{
+    int         len;            /* length in bytes of the value, -1 if NULL */
+    char       *value;          /* actual value, without null termination */
+} PGrowValue;
+</synopsis>
+     </para>
+
+     <para>
+      The <parameter>columns</parameter> array will have PQnfields()
+      elements, each one pointing to column value in network buffer.
+      The <parameter>len</parameter> field will contain number of
+      bytes in value.  If the field value is NULL then
+      <parameter>len</parameter> will be -1 and value will point
+      to position where the value would have been in buffer.
+      This allows estimating row size by pointer arithmetic.
+     </para>
+
+     <para>
+       This function must process or copy row values away from network
+       buffer before it returns, as next row might overwrite them.
+     </para>
+
+     <para>
+       This function must return 1 for success, and -1 for failure and
+       the caller releases the PGresult under work for the
+       case. It can also return 0 for early exit
+       from <function>PQisBusy</function> function.  The
+       supplied <parameter>res</parameter>
+       and <parameter>columns</parameter> values will stay valid so
+       row can be processed outside of callback.  Caller is
+       responsible for tracking whether
+       the <parameter>PQisBusy</parameter> returned early from
+       callback or for other reasons.  Usually this should happen via
+       setting cached values to NULL before
+       calling <function>PQisBusy</function>.
+     </para>
+
+     <para>
+       The function is allowed to exit via exception (setjmp/longjmp).
+       The connection and row are guaranteed to be in valid state.
+       The connection can later be closed
+       via <function>PQfinish</function>.  Processing can also be
+       continued without closing the connection,
+       call <function>getResult</function> on synchronous mode,
+       <function>PQisBusy</function> on asynchronous connection.  Then
+       processing will continue with new row, previous row that got
+       exception will be skipped. Or you can discard all remaining
+       rows by calling <function>PQskipResult</function> without
+       closing connection.
+     </para>
+
+     <variablelist>
+       <varlistentry>
+
+	 <term><parameter>res</parameter></term>
+	 <listitem>
+	   <para>
+	     A pointer to the <type>PGresult</type> object.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>columns</parameter></term>
+	 <listitem>
+	   <para>
+	     Column values of the row to process.  Column values
+	     are located in network buffer, the processor must
+	     copy them out from there.
+	   </para>
+	   <para>
+	     Column values are not null-terminated, so processor cannot
+	     use C string functions on them directly.
+	   </para>
+	 </listitem>
+       </varlistentry>
+       <varlistentry>
+
+	 <term><parameter>param</parameter></term>
+	 <listitem>
+	   <para>
+	     Extra parameter that was given to <function>PQsetRowProcessor</function>.
+	   </para>
+	 </listitem>
+       </varlistentry>
+     </variablelist>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqskipresult">
+    <term>
+     <function>PQskipResult</function>
+     <indexterm>
+      <primary>PQskipResult</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+		Discard all the remaining row data
+		after <function>PQexec</function>
+		or <function>PQgetResult</function> exits by the exception raised
+		in <type>RowProcessor</type> without closing connection.
+<synopsis>
+void PQskipResult(PGconn *conn, int skipAll)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>skipAll</parameter></term>
+	   <listitem>
+	     <para>
+	       Skip remaining rows in current result
+	       if <parameter>skipAll</parameter> is false(0). Skip
+	       remaining rows in current result and all rows in
+	       succeeding results if true(non-zero).
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+  <variablelist>
+   <varlistentry id="libpq-pqgetrowprcessor">
+    <term>
+     <function>PQgetRowProcessor</function>
+     <indexterm>
+      <primary>PQgetRowProcessor</primary>
+     </indexterm>
+    </term>
+    <listitem>
+      <para>
+       Get row processor and its context parameter currently set to
+       the connection.
+<synopsis>
+PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param)
+</synopsis>
+      </para>
+      <para>
+	<variablelist>
+	 <varlistentry>
+	   <term><parameter>conn</parameter></term>
+	   <listitem>
+	     <para>
+	       The connection object.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	 <varlistentry>
+	   <term><parameter>param</parameter></term>
+	   <listitem>
+	     <para>
+              Set the current row processor parameter of the
+              connection here if not NULL.
+	     </para>
+	   </listitem>
+	 </varlistentry>
+
+	</variablelist>
+      </para>
+    </listitem>
+   </varlistentry>
+  </variablelist>
+
+ </sect1>
+
+
  <sect1 id="libpq-build">
   <title>Building <application>libpq</application> Programs</title>
 
dblink_use_rowproc_20120307.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..09d6de8 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char** valbuf;
+	int *valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -503,6 +519,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +576,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +632,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +693,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -660,6 +714,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		{
 			/* text,text,bool */
 			DBLINK_GET_CONN;
+			conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			fail = PG_GETARG_BOOL(2);
 		}
@@ -675,6 +730,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 			else
 			{
 				DBLINK_GET_CONN;
+				conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 				sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			}
 		}
@@ -705,6 +761,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		else
 			/* shouldn't happen */
 			elog(ERROR, "wrong number of arguments");
+
+		conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	}
 
 	if (!conn)
@@ -715,164 +773,251 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
 	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, FALSE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK))
+		{
+			/* finishStoreInfo saves the fields referred to below. */
+			if (storeinfo.nummismatch)
+			{
+				/* This is only for backward compatibility */
+				ereport(ERROR,
+						(errcode(ERRCODE_DATATYPE_MISMATCH),
+						 errmsg("remote query result rowtype does not match "
+								"the specified FROM clause rowtype")));
+			}
+
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
+	int i;
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
+		case TYPEFUNC_COMPOSITE:
+			/* success */
+			break;
+		case TYPEFUNC_RECORD:
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
 
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
-			tupdesc = CreateTemplateTupleDesc(1, false);
-			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
-							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+	/* make sure we have a persistent copy of the tupdesc */
+	tupdesc = CreateTupleDescCopy(tupdesc);
 
-			is_sql_cmd = false;
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->nattrs = tupdesc->natts;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuf = NULL;
+	sinfo->valbuflen = NULL;
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+	/* Preallocate memory of same size with c string array for values. */
+	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
+	if (sinfo->valbuf)
+		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
+	if (sinfo->valbuflen)
+		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
-		}
+	if (sinfo->cstrs == NULL)
+	{
+		if (sinfo->valbuf)
+			free(sinfo->valbuf);
+		if (sinfo->valbuflen)
+			free(sinfo->valbuflen);
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+		ereport(ERROR,
+				(errcode(ERRCODE_OUT_OF_MEMORY),
+				 errmsg("out of memory")));
+	}
+
+	for (i = 0 ; i < sinfo->nattrs ; i++)
+	{
+		sinfo->valbuf[i] = NULL;
+		sinfo->valbuflen[i] = -1;
+	}
 
-		if (ntuples > 0)
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
+
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	int i;
+
+	if (sinfo->valbuf)
+	{
+		for (i = 0 ; i < sinfo->nattrs ; i++)
 		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+			if (sinfo->valbuf[i])
+				free(sinfo->valbuf[i]);
+		}
+		free(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	if (sinfo->valbuflen)
+	{
+		free(sinfo->valbuflen);
+		sinfo->valbuflen = NULL;
+	}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+	if (sinfo->cstrs)
+	{
+		free(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+/* Prototype of this function is PQrowProcessor */
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        fields = PQnfields(res);
+	int        i;
+	char      **cstrs = sinfo->cstrs;
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
-			}
+	if (sinfo->error_occurred)
+		return -1;
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
 
-		PQclear(res);
+		/* This error will be processed in dblink_record_internal() */
+		return -1;
 	}
-	PG_CATCH();
+
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
+	for(i = 0 ; i < fields ; i++)
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			char *tmp = sinfo->valbuf[i];
+			int tmplen = sinfo->valbuflen[i];
+
+			/*
+			 * Divide calls to malloc and realloc so that things will
+			 * go fine even on the systems of which realloc() does not
+			 * accept NULL as old memory block.
+			 *
+			 * Also try to (re)allocate in bigger steps to
+			 * avoid flood of allocations on weird data.
+			 */
+			if (tmp == NULL)
+			{
+				tmplen = len + 1;
+				if (tmplen < 64)
+					tmplen = 64;
+				tmp = (char *)malloc(tmplen);
+			}
+			else if (tmplen < len + 1)
+			{
+				if (len + 1 > tmplen * 2)
+					tmplen = len + 1;
+				else
+					tmplen = tmplen * 2;
+				tmp = (char *)realloc(tmp, tmplen);
+			}
+
+			/*
+			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
+			 * when realloc returns NULL.
+			 */
+			if (tmp == NULL)
+				return -1;  /* Inform out of memory to the caller */
+
+			sinfo->valbuf[i] = tmp;
+			sinfo->valbuflen[i] = tmplen;
+
+			cstrs[i] = sinfo->valbuf[i];
+			memcpy(cstrs[i], columns[i].value, len);
+			cstrs[i][len] = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return 1;
 }
 
 /*
#85Marko Kreen
markokr@gmail.com
In reply to: Kyotaro HORIGUCHI (#84)
4 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote:

* remove pqIsnonblocking(conn) check when row processor returned 2.
I missed that it's valid to call PQisBusy/PQconsumeInput/PQgetResult
on sync connection.

done. This affects PQisBusy, but PQgetResult won't be affected as
far as I see. I have no idea for PQconsumeInput()..

PQconsumeInput does not parse the row and there is no need to
do anything with PQgetResult().

* It seems the return codes from callback should be remapped,
(0, 1, 2) is unusual pattern. Better would be:

-1 - error
0 - stop parsing / early exit ("I'm not done yet")
1 - OK ("I'm done with the row")

done.

This might be described more precisely as follows,

| -1 - error - erase result and change result status to
| - FATAL_ERROR All the rest rows in current result
| - will skipped(consumed).
| 0 - stop parsing / early exit ("I'm not done yet")
| - getAnotherTuple returns EOF without dropping PGresult.
# - We expect PQisBusy(), PQconsumeInput()(?) and
# - PQgetResult() to exit immediately and we can
# - call PQgetResult(), PQskipResult() or
# - PQisBusy() after.
| 1 - OK ("I'm done with the row")
| - save result and getAnotherTuple returns 0.

The lines prefixed with '#' is the desirable behavior I have
understood from the discussion so far. And I doubt that it works
as we expected for PQgetResult().

No, the desirable behavior is already implemented and documented -
the "stop parsing" return code affects only PQisBusy(). As that
is the function that does the actual parsing.

The main plus if such scheme is that we do not change the behaviour
of any existing APIs.

* Please drop PQsetRowProcessorErrMsg() / PQresultSetErrMsg().

done.

You optimized libpq_gettext() calls, but it's wrong - they must
wrap the actual strings so that the strings can be extracted
for translating.

Fixed in attached patch.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then. This allows internal pqAddRow
to set regular "out of memory" error. Otherwise give generic
"row processor error".

Current implement seems already doing this in
parseInput3(). Could you give me further explanation?

The suggestion was about getAnotherTuple() - currently it sets
always "error in row processor". With such check, the callback
can set the error result itself. Currently only callbacks that
live inside libpq can set errors, but if we happen to expose
error-setting function in outside API, then the getAnotherTuple()
would already be ready for it.

See attached patch, I did some generic comment/docs cleanups
but also minor code cleanups:

- PQsetRowProcessor(NULL,NULL) sets Param to PGconn, instead NULL,
this makes PGconn available to pqAddRow.

- pqAddRow sets "out of memory" error itself on PGconn.

- getAnotherTuple(): when callback returns -1, it checks if error
message is set, does nothing then.

- put libpq_gettext() back around strings.

- dropped the error_saveresult label, it was unnecessary branch.

- made functions survive conn==NULL.

- dblink: changed skipAll parameter for PQskipResult() to TRUE,
as dblink uses PQexec which can send several queries.

- dblink: refreshed regtest result, as now we get actual
connection name in error message.

- Synced PQgetRow patch with return value changes.

- Synced demos at https://github.com/markokr/libpq-rowproc-demos
with return value changes.

I'm pretty happy with current state. So tagging it
ReadyForCommitter.

--
marko

Attachments:

libpq_rowproc_doc_2012-03-07v2.difftext/x-diff; charset=us-asciiDownload
*** a/doc/src/sgml/libpq.sgml
--- b/doc/src/sgml/libpq.sgml
***************
*** 7233,7238 **** int PQisthreadsafe();
--- 7233,7527 ----
   </sect1>
  
  
+  <sect1 id="libpq-altrowprocessor">
+   <title>Alternative row processor</title>
+ 
+   <indexterm zone="libpq-altrowprocessor">
+    <primary>PGresult</primary>
+    <secondary>PGconn</secondary>
+   </indexterm>
+ 
+   <para>
+    As the standard usage, rows are stored into <type>PGresult</type>
+    until full resultset is received.  Then such completely-filled
+    <type>PGresult</type> is passed to user.  This behavior can be
+    changed by registering alternative row processor function,
+    that will see each row data as soon as it is received
+    from network.  It has the option of processing the data
+    immediately, or storing it into custom container.
+   </para>
+ 
+   <para>
+    Note - as row processor sees rows as they arrive, it cannot know
+    whether the SQL statement actually finishes successfully on server
+    or not.  So some care must be taken to get proper
+    transactionality.
+   </para>
+ 
+   <variablelist>
+    <varlistentry id="libpq-pqsetrowprocessor">
+     <term>
+      <function>PQsetRowProcessor</function>
+      <indexterm>
+       <primary>PQsetRowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        Sets a callback function to process each row.
+ <synopsis>
+ void PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param);
+ </synopsis>
+      </para>
+      
+      <para>
+        <variablelist>
+          <varlistentry>
+            <term><parameter>conn</parameter></term>
+            <listitem>
+              <para>
+                The connection object to set the row processor function.
+              </para>
+            </listitem>
+          </varlistentry>
+          <varlistentry>
+            <term><parameter>func</parameter></term>
+            <listitem>
+              <para>
+                Storage handler function to set. NULL means to use the
+                default processor.
+              </para>
+            </listitem>
+          </varlistentry>
+          <varlistentry>
+            <term><parameter>param</parameter></term>
+            <listitem>
+              <para>
+                A pointer to contextual parameter passed
+                to <parameter>func</parameter>.
+              </para>
+            </listitem>
+          </varlistentry>
+        </variablelist>
+      </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry id="libpq-pqrowprocessor">
+     <term>
+      <type>PQrowProcessor</type>
+      <indexterm>
+       <primary>PQrowProcessor</primary>
+      </indexterm>
+     </term>
+ 
+     <listitem>
+      <para>
+        The type for the row processor callback function.
+ <synopsis>
+ int (*PQrowProcessor)(PGresult *res, PGrowValue *columns, void *param);
+ 
+ typedef struct
+ {
+     int         len;            /* length in bytes of the value, -1 if NULL */
+     char       *value;          /* actual value, without null termination */
+ } PGrowValue;
+ </synopsis>
+      </para>
+ 
+      <para>
+       The <parameter>columns</parameter> array will have PQnfields()
+       elements, each one pointing to column value in network buffer.
+       The <parameter>len</parameter> field will contain number of
+       bytes in value.  If the field value is NULL then
+       <parameter>len</parameter> will be -1 and value will point
+       to position where the value would have been in buffer.
+       This allows estimating row size by pointer arithmetic.
+      </para>
+ 
+      <para>
+        This function must process or copy away row values in network
+        buffer before it returns, as next row might overwrite them.
+      </para>
+ 
+      <para>
+        This function can return 1 for success for success, and -1 for failure.
+        It can also return 0 to stop parsing network data and immediate exit
+        from <function>PQisBusy</function> function.  The supplied
+        <parameter>res</parameter> and <parameter>columns</parameter> values
+        will stay valid so row can be processed outside of callback.  Caller is
+        responsible for tracking whether the <function>PQisBusy</function>
+        returned because callback requested it or for other reasons.  Usually
+        this should happen via setting cached values to NULL before calling
+        <function>PQisBusy</function>.
+      </para>
+ 
+      <para>
+        The function is allowed to exit via exception (eg: setjmp/longjmp).
+        The connection and row are guaranteed to be in valid state.
+        The connection can later be closed
+        via <function>PQfinish</function>.  Processing can also be
+        continued without closing the connection, either with 
+        <function>getResult</function> or <function>PQisBusy</function>.
+        Then processing will continue with next row, old row that got
+        exception will be skipped.  Or you can discard all remaining
+        rows by calling <function>PQskipResult</function>.
+      </para>
+ 
+      <variablelist>
+ 
+        <varlistentry>
+          <term><parameter>res</parameter></term>
+          <listitem>
+            <para>
+              A pointer to <type>PGresult</type> object that describes
+              the row structure.
+            </para>
+            <para>
+              This object is owned by <type>PGconn</type>
+              so callback must not free it.
+            </para>
+          </listitem>
+        </varlistentry>
+ 
+        <varlistentry>
+          <term><parameter>columns</parameter></term>
+          <listitem>
+            <para>
+              Column values of the row to process.  Column values
+              are located in network buffer, the processor must
+              copy them out from there.
+            </para>
+            <para>
+              Column values are not null-terminated, so processor cannot
+              use C string functions on them directly.
+            </para>
+            <para>
+              This pointer is owned by <type>PGconn</type>
+              so callback must not free it.
+            </para>
+          </listitem>
+        </varlistentry>
+ 
+        <varlistentry>
+          <term><parameter>param</parameter></term>
+          <listitem>
+            <para>
+              Extra parameter that was given to
+ 	     <function>PQsetRowProcessor</function>.
+            </para>
+          </listitem>
+        </varlistentry>
+ 
+      </variablelist>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry id="libpq-pqgetrowprocessor">
+     <term>
+      <function>PQgetRowProcessor</function>
+      <indexterm>
+       <primary>PQgetRowProcessor</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+        Get row processor and its context parameter currently set to
+        the connection.
+ <synopsis>
+ PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+ </synopsis>
+       </para>
+       <para>
+         <variablelist>
+          <varlistentry>
+            <term><parameter>conn</parameter></term>
+            <listitem>
+              <para>
+                The connection object.
+              </para>
+            </listitem>
+          </varlistentry>
+ 
+          <varlistentry>
+            <term><parameter>param</parameter></term>
+            <listitem>
+              <para>
+               Location to store the extra parameter of current row processor.
+               If NULL, then parameter is not stored.
+              </para>
+            </listitem>
+          </varlistentry>
+ 
+         </variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+ 
+    <varlistentry id="libpq-pqskipresult">
+     <term>
+      <function>PQskipResult</function>
+      <indexterm>
+       <primary>PQskipResult</primary>
+      </indexterm>
+     </term>
+     <listitem>
+       <para>
+         Discard all the remaining rows in incoming resultset.
+ <synopsis>
+ int PQskipResult(PGconn *conn, int skipAll);
+ </synopsis>
+       </para>
+       <para>
+        The function is simple convinience method to drop one or more
+        incoming resultsets.  It is useful when handling exceptions
+        from row processor.  It is implementing by simply calling
+        <function>PQgetResult</function> once or in a loop.
+       </para>
+       <para>
+        It returns 1 if the last <function>PQgetResult</function>
+        retutrned non-NULL.  This can happen only when
+        <parameter>skipAll</parameter> is false.
+       </para>
+ 
+       <para>
+         <variablelist>
+          <varlistentry>
+            <term><parameter>conn</parameter></term>
+            <listitem>
+              <para>
+                The connection object.
+              </para>
+            </listitem>
+          </varlistentry>
+ 
+          <varlistentry>
+            <term><parameter>skipAll</parameter></term>
+            <listitem>
+              <para>
+                If true, then <function>PQgetResult</function>
+                is called until it returns NULL.  That means
+                there are no more incoming resultsets.
+              </para>
+              <para>
+                If false, <function>PQgetResult</function> is called
+                only once.  The return value shows whether
+                there was result available or not.
+              </para>
+            </listitem>
+          </varlistentry>
+ 
+         </variablelist>
+       </para>
+     </listitem>
+    </varlistentry>
+ 
+   </variablelist>
+ 
+  </sect1>
+ 
+ 
   <sect1 id="libpq-build">
    <title>Building <application>libpq</application> Programs</title>
  
libpq_rowproc_2012-03-07v2.difftext/x-diff; charset=us-asciiDownload
*** a/src/interfaces/libpq/exports.txt
--- b/src/interfaces/libpq/exports.txt
***************
*** 160,162 **** PQconnectStartParams      157
--- 160,165 ----
  PQping                    158
  PQpingParams              159
  PQlibVersion              160
+ PQsetRowProcessor	  	  161
+ PQgetRowProcessor	  	  162
+ PQskipResult		  	  163
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***************
*** 2693,2698 **** makeEmptyPGconn(void)
--- 2693,2701 ----
  	conn->wait_ssl_try = false;
  #endif
  
+ 	/* set default row processor */
+ 	PQsetRowProcessor(conn, NULL, NULL);
+ 
  	/*
  	 * We try to send at least 8K at a time, which is the usual size of pipe
  	 * buffers on Unix systems.  That way, when we are sending a large amount
***************
*** 2711,2718 **** makeEmptyPGconn(void)
--- 2714,2726 ----
  	initPQExpBuffer(&conn->errorMessage);
  	initPQExpBuffer(&conn->workBuffer);
  
+ 	/* set up initial row buffer */
+ 	conn->rowBufLen = 32;
+ 	conn->rowBuf = (PGrowValue *)malloc(conn->rowBufLen * sizeof(PGrowValue));
+ 
  	if (conn->inBuffer == NULL ||
  		conn->outBuffer == NULL ||
+ 		conn->rowBuf == NULL ||
  		PQExpBufferBroken(&conn->errorMessage) ||
  		PQExpBufferBroken(&conn->workBuffer))
  	{
***************
*** 2814,2819 **** freePGconn(PGconn *conn)
--- 2822,2829 ----
  		free(conn->inBuffer);
  	if (conn->outBuffer)
  		free(conn->outBuffer);
+ 	if (conn->rowBuf)
+ 		free(conn->rowBuf);
  	termPQExpBuffer(&conn->errorMessage);
  	termPQExpBuffer(&conn->workBuffer);
  
***************
*** 5078,5080 **** PQregisterThreadLock(pgthreadlock_t newhandler)
--- 5088,5091 ----
  
  	return prev;
  }
+ 
*** a/src/interfaces/libpq/fe-exec.c
--- b/src/interfaces/libpq/fe-exec.c
***************
*** 66,71 **** static PGresult *PQexecFinish(PGconn *conn);
--- 66,72 ----
  static int PQsendDescribe(PGconn *conn, char desc_type,
  			   const char *desc_target);
  static int	check_field_number(const PGresult *res, int field_num);
+ static int	pqAddRow(PGresult *res, PGrowValue *columns, void *param);
  
  
  /* ----------------
***************
*** 701,707 **** pqClearAsyncResult(PGconn *conn)
  	if (conn->result)
  		PQclear(conn->result);
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  }
  
  /*
--- 702,707 ----
***************
*** 756,762 **** pqPrepareAsyncResult(PGconn *conn)
  	 */
  	res = conn->result;
  	conn->result = NULL;		/* handing over ownership to caller */
- 	conn->curTuple = NULL;		/* just in case */
  	if (!res)
  		res = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	else
--- 756,761 ----
***************
*** 828,833 **** pqInternalNotice(const PGNoticeHooks *hooks, const char *fmt,...)
--- 827,921 ----
  }
  
  /*
+  * PQsetRowProcessor
+  *   Set function that copies column data out from network buffer.
+  */
+ void
+ PQsetRowProcessor(PGconn *conn, PQrowProcessor func, void *param)
+ {
+ 	if (!conn)
+ 		return;
+ 
+ 	if (func)
+ 	{
+ 		/* set custom row processor */
+ 		conn->rowProcessor = func;
+ 		conn->rowProcessorParam = param;
+ 	}
+ 	else
+ 	{
+ 		/* set default row processor */
+ 		conn->rowProcessor = pqAddRow;
+ 		conn->rowProcessorParam = conn;
+ 	}
+ }
+ 
+ /*
+  * PQgetRowProcessor
+  *   Get current row processor of conn. set pointer to current parameter for
+  *   row processor to param if not NULL.
+  */
+ PQrowProcessor
+ PQgetRowProcessor(PGconn *conn, void **param)
+ {
+ 	if (!conn)
+ 		return NULL;
+ 
+ 	if (param)
+ 		*param = conn->rowProcessorParam;
+ 
+ 	return conn->rowProcessor;
+ }
+ 
+ /*
+  * pqAddRow
+  *	  add a row to the PGresult structure, growing it if necessary
+  *	  Returns 1 if OK, -1 if error occurred.
+  */
+ static int
+ pqAddRow(PGresult *res, PGrowValue *columns, void *param)
+ {
+ 	PGconn		*conn = param;
+ 	PGresAttValue *tup;
+ 	int			nfields = res->numAttributes;
+ 	int			i;
+ 
+ 	tup = (PGresAttValue *)
+ 		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
+ 	if (tup == NULL)
+ 		goto no_memory;
+ 
+ 	for (i = 0 ; i < nfields ; i++)
+ 	{
+ 		tup[i].len = columns[i].len;
+ 		if (tup[i].len == NULL_LEN)
+ 		{
+ 			tup[i].value = res->null_field;
+ 		}
+ 		else
+ 		{
+ 			bool isbinary = (res->attDescs[i].format != 0);
+ 			tup[i].value = (char *)pqResultAlloc(res, tup[i].len + 1, isbinary);
+ 			if (tup[i].value == NULL)
+ 				goto no_memory;
+ 
+ 			/* copy and zero-terminate */
+ 			memcpy(tup[i].value, columns[i].value, tup[i].len);
+ 			tup[i].value[tup[i].len] = '\0';
+ 		}
+ 	}
+ 
+ 	if (pqAddTuple(res, tup))
+ 		return 1;
+ 
+ no_memory:
+ 	printfPQExpBuffer(&conn->errorMessage,
+ 					  libpq_gettext("out of memory for query result\n"));
+ 	pqSaveErrorResult(conn);
+ 	return -1;
+ }
+ 
+ /*
   * pqAddTuple
   *	  add a row pointer to the PGresult structure, growing it if necessary
   *	  Returns TRUE if OK, FALSE if not enough memory to add the row
***************
*** 1223,1229 **** PQsendQueryStart(PGconn *conn)
  
  	/* initialize async result-accumulation state */
  	conn->result = NULL;
- 	conn->curTuple = NULL;
  
  	/* ready to send command message */
  	return true;
--- 1311,1316 ----
***************
*** 1831,1836 **** PQexecFinish(PGconn *conn)
--- 1918,1975 ----
  	return lastResult;
  }
  
+ 
+ /*
+  * Do-nothing row processor for PQskipResult
+  */
+ static int
+ dummyRowProcessor(PGresult *res, PGrowValue *columns, void *param)
+ {
+ 	return 1;
+ }
+ 
+ /*
+  * Exhaust remaining Data Rows in current connection.
+  * 
+  * Exhaust only one resultset if skipAll is false and all
+  * succeeding results if true.
+  */
+ int
+ PQskipResult(PGconn *conn, int skipAll)
+ {
+ 	PQrowProcessor savedRowProcessor;
+ 	void * savedRowProcParam;
+ 	PGresult *res;
+ 	int ret = 0;
+ 
+ 	if (!conn)
+ 		return 0;
+ 
+ 	/* save the current row processor settings and set dummy processor */
+ 	savedRowProcessor = PQgetRowProcessor(conn, &savedRowProcParam);
+ 	PQsetRowProcessor(conn, dummyRowProcessor, NULL);
+ 	
+ 	/*
+ 	 * Throw away the remaining rows in current result, or all succeeding
+ 	 * results if skipAll is not FALSE.
+ 	 */
+ 	if (skipAll)
+ 	{
+ 		while ((res = PQgetResult(conn)) != NULL)
+ 			PQclear(res);
+ 	}
+ 	else if ((res = PQgetResult(conn)) != NULL)
+ 	{
+ 		PQclear(res);
+ 		ret = 1;
+ 	}
+ 	
+ 	PQsetRowProcessor(conn, savedRowProcessor, savedRowProcParam);
+ 
+ 	return ret;
+ }
+ 
+ 
  /*
   * PQdescribePrepared
   *	  Obtain information about a previously prepared statement
*** a/src/interfaces/libpq/fe-misc.c
--- b/src/interfaces/libpq/fe-misc.c
***************
*** 219,224 **** pqGetnchar(char *s, size_t len, PGconn *conn)
--- 219,243 ----
  }
  
  /*
+  * pqGetnchar:
+  *	skip len bytes in input buffer.
+  */
+ int
+ pqSkipnchar(size_t len, PGconn *conn)
+ {
+ 	if (len > (size_t) (conn->inEnd - conn->inCursor))
+ 		return EOF;
+ 
+ 	conn->inCursor += len;
+ 
+ 	if (conn->Pfdebug)
+ 		fprintf(conn->Pfdebug, "From backend (%lu skipped)\n",
+ 				(unsigned long) len);
+ 
+ 	return 0;
+ }
+ 
+ /*
   * pqPutnchar:
   *	write exactly len bytes to the current message
   */
*** a/src/interfaces/libpq/fe-protocol2.c
--- b/src/interfaces/libpq/fe-protocol2.c
***************
*** 569,574 **** pqParseInput2(PGconn *conn)
--- 569,576 ----
  						/* Read another tuple of a normal query response */
  						if (getAnotherTuple(conn, FALSE))
  							return;
+ 						/* getAnotherTuple moves inStart itself */
+ 						continue;
  					}
  					else
  					{
***************
*** 585,590 **** pqParseInput2(PGconn *conn)
--- 587,594 ----
  						/* Read another tuple of a normal query response */
  						if (getAnotherTuple(conn, TRUE))
  							return;
+ 						/* getAnotherTuple moves inStart itself */
+ 						continue;
  					}
  					else
  					{
***************
*** 703,721 **** failure:
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
--- 707,724 ----
  
  /*
   * parseInput subroutine to read a 'B' or 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
   * Returns: 0 if completed message, EOF if error or not enough data yet.
   *
   * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.
   */
  static int
  getAnotherTuple(PGconn *conn, bool binary)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  
  	/* the backend sends us a bitmap of which attributes are null */
  	char		std_bitmap[64]; /* used unless it doesn't fit */
***************
*** 726,754 **** getAnotherTuple(PGconn *conn, bool binary)
  	int			bitmap_index;	/* Its index */
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
  
! 	result->binary = binary;
! 
! 	/* Allocate tuple space if first time for this data message */
! 	if (conn->curTuple == NULL)
  	{
! 		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
! 		if (conn->curTuple == NULL)
! 			goto outOfMemory;
! 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
! 
! 		/*
! 		 * If it's binary, fix the column format indicators.  We assume the
! 		 * backend will consistently send either B or D, not a mix.
! 		 */
! 		if (binary)
  		{
! 			for (i = 0; i < nfields; i++)
! 				result->attDescs[i].format = 1;
  		}
  	}
- 	tup = conn->curTuple;
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
--- 729,760 ----
  	int			bitmap_index;	/* Its index */
  	int			bitcnt;			/* number of bits examined in current byte */
  	int			vlen;			/* length of the current field value */
+ 	const char *errmsg = libpq_gettext("unknown error\n");
  
! 	/* resize row buffer if needed */
! 	if (nfields > conn->rowBufLen)
  	{
! 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
! 		if (!rowbuf)
  		{
! 			errmsg = libpq_gettext("out of memory for query result\n");
! 			goto error_clearresult;
  		}
+ 		conn->rowBuf = rowbuf;
+ 		conn->rowBufLen = nfields;
+ 	}
+ 	else
+ 	{
+ 		rowbuf = conn->rowBuf;
+ 	}
+ 
+ 	result->binary = binary;
+ 
+ 	if (binary)
+ 	{
+ 		for (i = 0; i < nfields; i++)
+ 			result->attDescs[i].format = 1;
  	}
  
  	/* Get the null-value bitmap */
  	nbytes = (nfields + BITS_PER_BYTE - 1) / BITS_PER_BYTE;
***************
*** 757,767 **** getAnotherTuple(PGconn *conn, bool binary)
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 			goto outOfMemory;
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
! 		goto EOFexit;
  
  	/* Scan the fields */
  	bitmap_index = 0;
--- 763,777 ----
  	{
  		bitmap = (char *) malloc(nbytes);
  		if (!bitmap)
! 		{
! 			errmsg = libpq_gettext("out of memory for query result\n");
! 			goto error_clearresult;
! 		}
  	}
  
  	if (pqGetnchar(bitmap, nbytes, conn))
! 		goto error_clearresult;
! 
  
  	/* Scan the fields */
  	bitmap_index = 0;
***************
*** 771,804 **** getAnotherTuple(PGconn *conn, bool binary)
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 		{
! 			/* if the field value is absent, make it a null string */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 		}
  		else
  		{
- 			/* get the value length (the first four bytes are for length) */
- 			if (pqGetInt(&vlen, 4, conn))
- 				goto EOFexit;
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
- 			if (tup[i].value == NULL)
- 			{
- 				tup[i].value = (char *) pqResultAlloc(result, vlen + 1, binary);
- 				if (tup[i].value == NULL)
- 					goto outOfMemory;
- 			}
- 			tup[i].len = vlen;
- 			/* read in the value */
- 			if (vlen > 0)
- 				if (pqGetnchar((char *) (tup[i].value), vlen, conn))
- 					goto EOFexit;
- 			/* we have to terminate this ourselves */
- 			tup[i].value[vlen] = '\0';
  		}
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
--- 781,809 ----
  	for (i = 0; i < nfields; i++)
  	{
  		if (!(bmap & 0200))
! 			vlen = NULL_LEN;
! 		else if (pqGetInt(&vlen, 4, conn))
! 				goto EOFexit;
  		else
  		{
  			if (!binary)
  				vlen = vlen - 4;
  			if (vlen < 0)
  				vlen = 0;
  		}
+ 
+ 		/*
+ 		 * rowbuf[i].value always points to the next address of the
+ 		 * length field even if the value is NULL, to allow safe
+ 		 * size estimates and data copy.
+ 		 */
+ 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
+ 		rowbuf[i].len = vlen;
+ 
+ 		/* Skip the value */
+ 		if (vlen > 0 && pqSkipnchar(vlen, conn))
+ 			goto EOFexit;
+ 
  		/* advance the bitmap stuff */
  		bitcnt++;
  		if (bitcnt == BITS_PER_BYTE)
***************
*** 811,843 **** getAnotherTuple(PGconn *conn, bool binary)
  			bmap <<= 1;
  	}
  
- 	/* Success!  Store the completed tuple in the result */
- 	if (!pqAddTuple(result, tup))
- 		goto outOfMemory;
- 	/* and reset for a new message */
- 	conn->curTuple = NULL;
- 
  	if (bitmap != std_bitmap)
  		free(bitmap);
! 	return 0;
  
! outOfMemory:
! 	/* Replace partially constructed result with an error result */
  
  	/*
  	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
- 	printfPQExpBuffer(&conn->errorMessage,
- 					  libpq_gettext("out of memory for query result\n"));
  
  	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
  	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
  	conn->asyncStatus = PGASYNC_READY;
  	/* Discard the failed message --- good idea? */
  	conn->inStart = conn->inEnd;
  
--- 816,869 ----
  			bmap <<= 1;
  	}
  
  	if (bitmap != std_bitmap)
  		free(bitmap);
! 	bitmap = NULL;
  
! 	/* tag the row as parsed */
! 	conn->inStart = conn->inCursor;
  
+ 	/* Pass the completed row values to rowProcessor */
+ 	switch (conn->rowProcessor(result, rowbuf, conn->rowProcessorParam))
+ 	{
+ 		case 1:
+ 			/* everything is good */
+ 			return 0;
+ 
+ 		case 0:
+ 			/* processor requested early exit */
+ 			return EOF;
+ 			
+ 		case -1:
+ 			/* if row processor already set error, then simply exit */
+ 			if (conn->result->resultStatus != PGRES_TUPLES_OK)
+ 				return EOF;
+ 			errmsg = libpq_gettext("error in row processor\n");
+ 			break;
+ 
+ 		default:
+ 			/* Illegal return code */
+ 			errmsg = libpq_gettext("invalid return value from row processor\n");
+ 			break;
+ 	}
+ 
+ error_clearresult:
  	/*
  	 * we do NOT use pqSaveErrorResult() here, because of the likelihood that
  	 * there's not enough memory to concatenate messages...
  	 */
  	pqClearAsyncResult(conn);
  
+ 	printfPQExpBuffer(&conn->errorMessage, "%s", errmsg);
+ 	
  	/*
  	 * XXX: if PQmakeEmptyPGresult() fails, there's probably not much we can
  	 * do to recover...
  	 */
  	conn->result = PQmakeEmptyPGresult(conn, PGRES_FATAL_ERROR);
+ 
  	conn->asyncStatus = PGASYNC_READY;
+ 
  	/* Discard the failed message --- good idea? */
  	conn->inStart = conn->inEnd;
  
*** a/src/interfaces/libpq/fe-protocol3.c
--- b/src/interfaces/libpq/fe-protocol3.c
***************
*** 327,332 **** pqParseInput3(PGconn *conn)
--- 327,335 ----
  						/* Read another tuple of a normal query response */
  						if (getAnotherTuple(conn, msgLength))
  							return;
+ 
+ 						/* getAnotherTuple() moves inStart itself */
+ 						continue;
  					}
  					else if (conn->result != NULL &&
  							 conn->result->resultStatus == PGRES_FATAL_ERROR)
***************
*** 613,659 **** failure:
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * We add another tuple to the existing PGresult structure.
!  * Returns: 0 if completed message, EOF if error or not enough data yet.
!  *
!  * Note that if we run out of data, we have to suspend and reprocess
!  * the message after more data is received.  We keep a partially constructed
!  * tuple in conn->curTuple, and avoid reallocating already-allocated storage.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGresAttValue *tup;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
! 
! 	/* Allocate tuple space if first time for this data message */
! 	if (conn->curTuple == NULL)
! 	{
! 		conn->curTuple = (PGresAttValue *)
! 			pqResultAlloc(result, nfields * sizeof(PGresAttValue), TRUE);
! 		if (conn->curTuple == NULL)
! 			goto outOfMemory;
! 		MemSet(conn->curTuple, 0, nfields * sizeof(PGresAttValue));
! 	}
! 	tup = conn->curTuple;
  
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
! 		return EOF;
  
  	if (tupnfields != nfields)
  	{
! 		/* Replace partially constructed result with an error result */
! 		printfPQExpBuffer(&conn->errorMessage,
! 				 libpq_gettext("unexpected field count in \"D\" message\n"));
! 		pqSaveErrorResult(conn);
! 		/* Discard the failed message by pretending we read it */
! 		conn->inCursor = conn->inStart + 5 + msgLength;
! 		return 0;
  	}
  
  	/* Scan the fields */
--- 616,661 ----
  
  /*
   * parseInput subroutine to read a 'D' (row data) message.
!  * It fills rowbuf with column pointers and then calls row processor.
!  * Returns: 0 if completed message, EOF to cancel parsing.
   */
  static int
  getAnotherTuple(PGconn *conn, int msgLength)
  {
  	PGresult   *result = conn->result;
  	int			nfields = result->numAttributes;
! 	PGrowValue  *rowbuf;
  	int			tupnfields;		/* # fields from tuple */
  	int			vlen;			/* length of the current field value */
  	int			i;
! 	const char *errmsg = libpq_gettext("unknown error\n");
  
  	/* Get the field count and make sure it's what we expect */
  	if (pqGetInt(&tupnfields, 2, conn))
! 	{
! 		/* Whole the message must be loaded on the buffer here */
! 		errmsg = libpq_gettext("protocol error\n");
! 		goto error_and_forward;
! 	}
  
  	if (tupnfields != nfields)
  	{
! 		errmsg = libpq_gettext("unexpected field count in \"D\" message\n");
! 		goto error_and_forward;
! 	}
! 
! 	/* resize row buffer if needed */
! 	rowbuf = conn->rowBuf;
! 	if (nfields > conn->rowBufLen)
! 	{
! 		rowbuf = realloc(conn->rowBuf, nfields * sizeof(PGrowValue));
! 		if (!rowbuf)
! 		{
! 			errmsg = libpq_gettext("out of memory for query result\n");
! 			goto error_and_forward;
! 		}
! 		conn->rowBuf = rowbuf;
! 		conn->rowBufLen = nfields;
  	}
  
  	/* Scan the fields */
***************
*** 661,714 **** getAnotherTuple(PGconn *conn, int msgLength)
  	{
  		/* get the value length */
  		if (pqGetInt(&vlen, 4, conn))
- 			return EOF;
- 		if (vlen == -1)
  		{
! 			/* null field */
! 			tup[i].value = result->null_field;
! 			tup[i].len = NULL_LEN;
! 			continue;
  		}
! 		if (vlen < 0)
  			vlen = 0;
- 		if (tup[i].value == NULL)
- 		{
- 			bool		isbinary = (result->attDescs[i].format != 0);
  
! 			tup[i].value = (char *) pqResultAlloc(result, vlen + 1, isbinary);
! 			if (tup[i].value == NULL)
! 				goto outOfMemory;
  		}
- 		tup[i].len = vlen;
- 		/* read in the value */
- 		if (vlen > 0)
- 			if (pqGetnchar((char *) (tup[i].value), vlen, conn))
- 				return EOF;
- 		/* we have to terminate this ourselves */
- 		tup[i].value[vlen] = '\0';
  	}
  
! 	/* Success!  Store the completed tuple in the result */
! 	if (!pqAddTuple(result, tup))
! 		goto outOfMemory;
! 	/* and reset for a new message */
! 	conn->curTuple = NULL;
  
! 	return 0;
  
! outOfMemory:
  
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage,
! 					  libpq_gettext("out of memory for query result\n"));
  	pqSaveErrorResult(conn);
- 
- 	/* Discard the failed message by pretending we read it */
- 	conn->inCursor = conn->inStart + 5 + msgLength;
  	return 0;
  }
  
--- 663,743 ----
  	{
  		/* get the value length */
  		if (pqGetInt(&vlen, 4, conn))
  		{
! 			/* Whole the message must be loaded on the buffer here */
! 			errmsg = libpq_gettext("protocol error\n");
! 			goto error_and_forward;
  		}
! 
! 		if (vlen == -1)
! 			vlen = NULL_LEN;
! 		else if (vlen < 0)
  			vlen = 0;
  
! 		/*
! 		 * rowbuf[i].value always points to the next address of the
! 		 * length field even if the value is NULL, to allow safe
! 		 * size estimates and data copy.
! 		 */
! 		rowbuf[i].value = conn->inBuffer + conn->inCursor;
! 		rowbuf[i].len = vlen;
! 
! 		/* Skip to the next length field */
! 		if (vlen > 0 && pqSkipnchar(vlen, conn))
! 		{
! 			/* Whole the message must be loaded on the buffer here */
! 			errmsg = libpq_gettext("protocol error\n");
! 			goto error_and_forward;
  		}
  	}
  
! 	/* tag the row as parsed, check if correctly */
! 	conn->inStart += 5 + msgLength;
! 	if (conn->inCursor != conn->inStart)
! 	{
! 		errmsg = libpq_gettext("invalid row contents\n");
! 		goto error_clearresult;
! 	}
  
! 	/* Pass the completed row values to rowProcessor */
! 	switch (conn->rowProcessor(result, rowbuf, conn->rowProcessorParam))
! 	{
! 		case 1:
! 			/* everything is good */
! 			return 0;
! 
! 		case 0:
! 			/* processor requested early exit - stop parsing without error*/
! 			return EOF;
! 
! 		case -1:
! 			/* if row processor already set error, then simply exit */
! 			if (conn->result->resultStatus != PGRES_TUPLES_OK)
! 				return EOF;
  
! 			/* set generic row processor error */
! 			errmsg = libpq_gettext("error in row processor\n");
! 			goto error_clearresult;
  
+ 		default:
+ 			/* Illegal return code */
+ 			errmsg = libpq_gettext("invalid return value from row processor\n");
+ 			goto error_clearresult;
+ 	}
+ 
+ 
+ error_and_forward:
+ 	/* Discard the failed message by pretending we read it */
+ 	conn->inCursor = conn->inStart + 5 + msgLength;
+ 
+ error_clearresult:
  	/*
  	 * Replace partially constructed result with an error result. First
  	 * discard the old result to try to win back some memory.
  	 */
  	pqClearAsyncResult(conn);
! 	printfPQExpBuffer(&conn->errorMessage, "%s", errmsg);
  	pqSaveErrorResult(conn);
  	return 0;
  }
  
*** a/src/interfaces/libpq/libpq-fe.h
--- b/src/interfaces/libpq/libpq-fe.h
***************
*** 149,154 **** typedef struct pgNotify
--- 149,165 ----
  	struct pgNotify *next;		/* list link */
  } PGnotify;
  
+ /* PGrowValue points a column value of in network buffer.
+  * Value is a string without null termination and length len.
+  * NULL is represented as len < 0, value points then to place
+  * where value would have been.
+  */
+ typedef struct pgRowValue
+ {
+ 	int			len;			/* length in bytes of the value */
+ 	char	   *value;			/* actual value, without null termination */
+ } PGrowValue;
+ 
  /* Function types for notice-handling callbacks */
  typedef void (*PQnoticeReceiver) (void *arg, const PGresult *res);
  typedef void (*PQnoticeProcessor) (void *arg, const char *message);
***************
*** 416,421 **** extern PGPing PQping(const char *conninfo);
--- 427,464 ----
  extern PGPing PQpingParams(const char *const * keywords,
  			 const char *const * values, int expand_dbname);
  
+ /*
+  * Typedef for alternative row processor.
+  *
+  * Columns array will contain PQnfields() entries, each one pointing
+  * to particular column data in network buffer.  This function is
+  * supposed to copy data out from there and store somewhere.  NULL is
+  * signified with len<0.
+  *
+  * This function must return 1 for success and -1 for failure and the
+  * caller relreases the current PGresult for the case. Returning 0
+  * instructs libpq to exit immediately from the topmost libpq function
+  * without releasing PGresult under work.
+  */
+ typedef int (*PQrowProcessor)(PGresult *res, PGrowValue *columns,
+                               void *param);
+ 
+ /*
+  * Set alternative row data processor for PGconn.
+  *
+  * By registering this function, pg_result disables its own result
+  * store and calls it for rows one by one.
+  *
+  * func is row processor function. See the typedef RowProcessor.
+  *
+  * rowProcessorParam is the contextual variable that passed to
+  * RowProcessor.
+  */
+ extern void PQsetRowProcessor(PGconn *conn, PQrowProcessor func,
+ 								   void *rowProcessorParam);
+ extern PQrowProcessor PQgetRowProcessor(PGconn *conn, void **param);
+ extern int  PQskipResult(PGconn *conn, int skipAll);
+ 
  /* Force the write buffer to be written (or at least try) */
  extern int	PQflush(PGconn *conn);
  
*** a/src/interfaces/libpq/libpq-int.h
--- b/src/interfaces/libpq/libpq-int.h
***************
*** 398,404 **** struct pg_conn
  
  	/* Status for asynchronous result construction */
  	PGresult   *result;			/* result being constructed */
- 	PGresAttValue *curTuple;	/* tuple currently being read */
  
  #ifdef USE_SSL
  	bool		allow_ssl_try;	/* Allowed to try SSL negotiation */
--- 398,403 ----
***************
*** 441,446 **** struct pg_conn
--- 440,453 ----
  
  	/* Buffer for receiving various parts of messages */
  	PQExpBufferData workBuffer; /* expansible string */
+ 
+ 	/*
+ 	 * Read column data from network buffer.
+ 	 */
+ 	PQrowProcessor rowProcessor;/* Function pointer */
+ 	void *rowProcessorParam;	/* Contextual parameter for rowProcessor */
+ 	PGrowValue *rowBuf;			/* Buffer for passing values to rowProcessor */
+ 	int rowBufLen;				/* Number of columns allocated in rowBuf */
  };
  
  /* PGcancel stores all data necessary to cancel a connection. A copy of this
***************
*** 558,563 **** extern int	pqGets(PQExpBuffer buf, PGconn *conn);
--- 565,571 ----
  extern int	pqGets_append(PQExpBuffer buf, PGconn *conn);
  extern int	pqPuts(const char *s, PGconn *conn);
  extern int	pqGetnchar(char *s, size_t len, PGconn *conn);
+ extern int	pqSkipnchar(size_t len, PGconn *conn);
  extern int	pqPutnchar(const char *s, size_t len, PGconn *conn);
  extern int	pqGetInt(int *result, size_t bytes, PGconn *conn);
  extern int	pqPutInt(int value, size_t bytes, PGconn *conn);
dblink_rowproc_2012-03-07v2.difftext/x-diff; charset=us-asciiDownload
*** a/contrib/dblink/dblink.c
--- b/contrib/dblink/dblink.c
***************
*** 63,73 **** typedef struct remoteConn
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
- static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
--- 63,85 ----
  	bool		newXactForCursor;		/* Opened a transaction for a cursor */
  } remoteConn;
  
+ typedef struct storeInfo
+ {
+ 	Tuplestorestate *tuplestore;
+ 	int nattrs;
+ 	MemoryContext oldcontext;
+ 	AttInMetadata *attinmeta;
+ 	char** valbuf;
+ 	int *valbuflen;
+ 	char **cstrs;
+ 	bool error_occurred;
+ 	bool nummismatch;
+ } storeInfo;
+ 
  /*
   * Internal declarations
   */
  static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
  static remoteConn *getConnectionByName(const char *name);
  static HTAB *createConnHash(void);
  static void createNewConnection(const char *name, remoteConn *rconn);
***************
*** 90,95 **** static char *escape_param_str(const char *from);
--- 102,111 ----
  static void validate_pkattnums(Relation rel,
  				   int2vector *pkattnums_arg, int32 pknumatts_arg,
  				   int **pkattnums, int *pknumatts);
+ static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+ static void finishStoreInfo(storeInfo *sinfo);
+ static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+ 
  
  /* Global */
  static remoteConn *pconn = NULL;
***************
*** 503,508 **** dblink_fetch(PG_FUNCTION_ARGS)
--- 519,525 ----
  	char	   *curname = NULL;
  	int			howmany = 0;
  	bool		fail = true;	/* default to backward compatible */
+ 	storeInfo   storeinfo;
  
  	DBLINK_INIT;
  
***************
*** 559,573 **** dblink_fetch(PG_FUNCTION_ARGS)
  	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
  
  	/*
  	 * Try to execute the query.  Note that since libpq uses malloc, the
  	 * PGresult will be long-lived even though we are still in a short-lived
  	 * memory context.
  	 */
! 	res = PQexec(conn, buf.data);
  	if (!res ||
  		(PQresultStatus(res) != PGRES_COMMAND_OK &&
  		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
  		dblink_res_error(conname, res, "could not fetch from cursor", fail);
  		return (Datum) 0;
  	}
--- 576,626 ----
  	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
  
  	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
+ 	/*
  	 * Try to execute the query.  Note that since libpq uses malloc, the
  	 * PGresult will be long-lived even though we are still in a short-lived
  	 * memory context.
  	 */
! 	PG_TRY();
! 	{
! 		res = PQexec(conn, buf.data);
! 	}
! 	PG_CATCH();
! 	{
! 		ErrorData *edata;
! 
! 		finishStoreInfo(&storeinfo);
! 		edata = CopyErrorData();
! 		FlushErrorState();
! 
! 		/* Skip remaining results when storeHandler raises exception. */
! 		PQskipResult(conn, TRUE);
! 		ReThrowError(edata);
! 	}
! 	PG_END_TRY();
! 
! 	finishStoreInfo(&storeinfo);
! 
  	if (!res ||
  		(PQresultStatus(res) != PGRES_COMMAND_OK &&
  		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
+ 		/* finishStoreInfo saves the fields referred to below. */
+ 		if (storeinfo.nummismatch)
+ 		{
+ 			/* This is only for backward compatibility */
+ 			ereport(ERROR,
+ 					(errcode(ERRCODE_DATATYPE_MISMATCH),
+ 					 errmsg("remote query result rowtype does not match "
+ 							"the specified FROM clause rowtype")));
+ 		}
+ 
  		dblink_res_error(conname, res, "could not fetch from cursor", fail);
  		return (Datum) 0;
  	}
***************
*** 579,586 **** dblink_fetch(PG_FUNCTION_ARGS)
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
--- 632,639 ----
  				(errcode(ERRCODE_INVALID_CURSOR_NAME),
  				 errmsg("cursor \"%s\" does not exist", curname)));
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
***************
*** 640,645 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 693,699 ----
  	remoteConn *rconn = NULL;
  	bool		fail = true;	/* default to backward compatible */
  	bool		freeconn = false;
+ 	storeInfo   storeinfo;
  
  	/* check to see if caller supports us returning a tuplestore */
  	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
***************
*** 660,665 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 714,720 ----
  		{
  			/* text,text,bool */
  			DBLINK_GET_CONN;
+ 			conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
  			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
  			fail = PG_GETARG_BOOL(2);
  		}
***************
*** 675,680 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 730,736 ----
  			else
  			{
  				DBLINK_GET_CONN;
+ 				conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
  				sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
  			}
  		}
***************
*** 705,710 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
--- 761,768 ----
  		else
  			/* shouldn't happen */
  			elog(ERROR, "wrong number of arguments");
+ 
+ 		conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
  	}
  
  	if (!conn)
***************
*** 715,878 **** dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
  	/* synchronous query, or async result retrieval */
! 	if (!is_async)
! 		res = PQexec(conn, sql);
! 	else
  	{
! 		res = PQgetResult(conn);
! 		/* NULL means we're all done with the async results */
! 		if (!res)
! 			return (Datum) 0;
  	}
  
! 	/* if needed, close the connection to the database and cleanup */
! 	if (freeconn)
! 		PQfinish(conn);
  
! 	if (!res ||
! 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 		 PQresultStatus(res) != PGRES_TUPLES_OK))
  	{
! 		dblink_res_error(conname, res, "could not execute query", fail);
! 		return (Datum) 0;
  	}
  
- 	materializeResult(fcinfo, res);
  	return (Datum) 0;
  }
  
- /*
-  * Materialize the PGresult to return them as the function result.
-  * The res will be released in this function.
-  */
  static void
! materializeResult(FunctionCallInfo fcinfo, PGresult *res)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
  
! 	Assert(rsinfo->returnMode == SFRM_Materialize);
! 
! 	PG_TRY();
  	{
! 		TupleDesc	tupdesc;
! 		bool		is_sql_cmd = false;
! 		int			ntuples;
! 		int			nfields;
  
! 		if (PQresultStatus(res) == PGRES_COMMAND_OK)
! 		{
! 			is_sql_cmd = true;
  
! 			/*
! 			 * need a tuple descriptor representing one TEXT column to return
! 			 * the command status string as our result tuple
! 			 */
! 			tupdesc = CreateTemplateTupleDesc(1, false);
! 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
! 							   TEXTOID, -1, 0);
! 			ntuples = 1;
! 			nfields = 1;
! 		}
! 		else
! 		{
! 			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
  
! 			is_sql_cmd = false;
  
! 			/* get a tuple descriptor for our result type */
! 			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
! 			{
! 				case TYPEFUNC_COMPOSITE:
! 					/* success */
! 					break;
! 				case TYPEFUNC_RECORD:
! 					/* failed to determine actual type of RECORD */
! 					ereport(ERROR,
! 							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 						errmsg("function returning record called in context "
! 							   "that cannot accept type record")));
! 					break;
! 				default:
! 					/* result type isn't composite */
! 					elog(ERROR, "return type must be a row type");
! 					break;
! 			}
  
! 			/* make sure we have a persistent copy of the tupdesc */
! 			tupdesc = CreateTupleDescCopy(tupdesc);
! 			ntuples = PQntuples(res);
! 			nfields = PQnfields(res);
! 		}
  
! 		/*
! 		 * check result and tuple descriptor have the same number of columns
! 		 */
! 		if (nfields != tupdesc->natts)
! 			ereport(ERROR,
! 					(errcode(ERRCODE_DATATYPE_MISMATCH),
! 					 errmsg("remote query result rowtype does not match "
! 							"the specified FROM clause rowtype")));
  
! 		if (ntuples > 0)
  		{
! 			AttInMetadata *attinmeta;
! 			Tuplestorestate *tupstore;
! 			MemoryContext oldcontext;
! 			int			row;
! 			char	  **values;
! 
! 			attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 
! 			oldcontext = MemoryContextSwitchTo(
! 									rsinfo->econtext->ecxt_per_query_memory);
! 			tupstore = tuplestore_begin_heap(true, false, work_mem);
! 			rsinfo->setResult = tupstore;
! 			rsinfo->setDesc = tupdesc;
! 			MemoryContextSwitchTo(oldcontext);
  
! 			values = (char **) palloc(nfields * sizeof(char *));
  
! 			/* put all tuples into the tuplestore */
! 			for (row = 0; row < ntuples; row++)
! 			{
! 				HeapTuple	tuple;
  
! 				if (!is_sql_cmd)
! 				{
! 					int			i;
  
! 					for (i = 0; i < nfields; i++)
! 					{
! 						if (PQgetisnull(res, row, i))
! 							values[i] = NULL;
! 						else
! 							values[i] = PQgetvalue(res, row, i);
! 					}
! 				}
! 				else
! 				{
! 					values[0] = PQcmdStatus(res);
! 				}
  
! 				/* build the tuple and put it into the tuplestore. */
! 				tuple = BuildTupleFromCStrings(attinmeta, values);
! 				tuplestore_puttuple(tupstore, tuple);
! 			}
  
! 			/* clean up and return the tuplestore */
! 			tuplestore_donestoring(tupstore);
! 		}
  
! 		PQclear(res);
  	}
! 	PG_CATCH();
  	{
! 		/* be sure to release the libpq result */
! 		PQclear(res);
! 		PG_RE_THROW();
  	}
! 	PG_END_TRY();
  }
  
  /*
--- 773,1023 ----
  	rsinfo->setResult = NULL;
  	rsinfo->setDesc = NULL;
  
+ 
+ 	/*
+ 	 * Result is stored into storeinfo.tuplestore instead of
+ 	 * res->result retuned by PQexec/PQgetResult below
+ 	 */
+ 	initStoreInfo(&storeinfo, fcinfo);
+ 	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+ 
  	/* synchronous query, or async result retrieval */
! 	PG_TRY();
  	{
! 		if (!is_async)
! 			res = PQexec(conn, sql);
! 		else
! 			res = PQgetResult(conn);
  	}
+ 	PG_CATCH();
+ 	{
+ 		ErrorData *edata;
  
! 		finishStoreInfo(&storeinfo);
! 		edata = CopyErrorData();
! 		FlushErrorState();
  
! 		/* Skip remaining results when storeHandler raises exception. */
! 		PQskipResult(conn, TRUE);
! 		ReThrowError(edata);
! 	}
! 	PG_END_TRY();
! 
! 	finishStoreInfo(&storeinfo);
! 
! 	/* NULL res from async get means we're all done with the results */
! 	if (res || !is_async)
  	{
! 		if (freeconn)
! 			PQfinish(conn);
! 
! 		if (!res ||
! 			(PQresultStatus(res) != PGRES_COMMAND_OK &&
! 			 PQresultStatus(res) != PGRES_TUPLES_OK))
! 		{
! 			/* finishStoreInfo saves the fields referred to below. */
! 			if (storeinfo.nummismatch)
! 			{
! 				/* This is only for backward compatibility */
! 				ereport(ERROR,
! 						(errcode(ERRCODE_DATATYPE_MISMATCH),
! 						 errmsg("remote query result rowtype does not match "
! 								"the specified FROM clause rowtype")));
! 			}
! 
! 			dblink_res_error(conname, res, "could not execute query", fail);
! 			return (Datum) 0;
! 		}
  	}
+ 	PQclear(res);
  
  	return (Datum) 0;
  }
  
  static void
! initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
  {
  	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+ 	TupleDesc	tupdesc;
+ 	int i;
  
! 	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
  	{
! 		case TYPEFUNC_COMPOSITE:
! 			/* success */
! 			break;
! 		case TYPEFUNC_RECORD:
! 			/* failed to determine actual type of RECORD */
! 			ereport(ERROR,
! 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! 					 errmsg("function returning record called in context "
! 							"that cannot accept type record")));
! 			break;
! 		default:
! 			/* result type isn't composite */
! 			elog(ERROR, "return type must be a row type");
! 			break;
! 	}
  
! 	sinfo->oldcontext = MemoryContextSwitchTo(
! 		rsinfo->econtext->ecxt_per_query_memory);
  
! 	/* make sure we have a persistent copy of the tupdesc */
! 	tupdesc = CreateTupleDescCopy(tupdesc);
  
! 	sinfo->error_occurred = FALSE;
! 	sinfo->nummismatch = FALSE;
! 	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
! 	sinfo->nattrs = tupdesc->natts;
! 	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
! 	sinfo->valbuf = NULL;
! 	sinfo->valbuflen = NULL;
  
! 	/* Preallocate memory of same size with c string array for values. */
! 	sinfo->valbuf = (char **)malloc(sinfo->nattrs * sizeof(char*));
! 	if (sinfo->valbuf)
! 		sinfo->valbuflen = (int *)malloc(sinfo->nattrs * sizeof(int));
! 	if (sinfo->valbuflen)
! 		sinfo->cstrs = (char **)malloc(sinfo->nattrs * sizeof(char*));
  
! 	if (sinfo->cstrs == NULL)
! 	{
! 		if (sinfo->valbuf)
! 			free(sinfo->valbuf);
! 		if (sinfo->valbuflen)
! 			free(sinfo->valbuflen);
  
! 		ereport(ERROR,
! 				(errcode(ERRCODE_OUT_OF_MEMORY),
! 				 errmsg("out of memory")));
! 	}
! 
! 	for (i = 0 ; i < sinfo->nattrs ; i++)
! 	{
! 		sinfo->valbuf[i] = NULL;
! 		sinfo->valbuflen[i] = -1;
! 	}
  
! 	rsinfo->setResult = sinfo->tuplestore;
! 	rsinfo->setDesc = tupdesc;
! }
! 
! static void
! finishStoreInfo(storeInfo *sinfo)
! {
! 	int i;
! 
! 	if (sinfo->valbuf)
! 	{
! 		for (i = 0 ; i < sinfo->nattrs ; i++)
  		{
! 			if (sinfo->valbuf[i])
! 				free(sinfo->valbuf[i]);
! 		}
! 		free(sinfo->valbuf);
! 		sinfo->valbuf = NULL;
! 	}
  
! 	if (sinfo->valbuflen)
! 	{
! 		free(sinfo->valbuflen);
! 		sinfo->valbuflen = NULL;
! 	}
  
! 	if (sinfo->cstrs)
! 	{
! 		free(sinfo->cstrs);
! 		sinfo->cstrs = NULL;
! 	}
  
! 	MemoryContextSwitchTo(sinfo->oldcontext);
! }
  
! /* Prototype of this function is PQrowProcessor */
! static int
! storeHandler(PGresult *res, PGrowValue *columns, void *param)
! {
! 	storeInfo *sinfo = (storeInfo *)param;
! 	HeapTuple  tuple;
! 	int        fields = PQnfields(res);
! 	int        i;
! 	char      **cstrs = sinfo->cstrs;
  
! 	if (sinfo->error_occurred)
! 		return -1;
  
! 	if (sinfo->nattrs != fields)
! 	{
! 		sinfo->error_occurred = TRUE;
! 		sinfo->nummismatch = TRUE;
! 		finishStoreInfo(sinfo);
  
! 		/* This error will be processed in dblink_record_internal() */
! 		return -1;
  	}
! 
! 	/*
! 	 * value input functions assumes that the input string is
! 	 * terminated by zero. We should make the values to be so.
! 	 */
! 	for(i = 0 ; i < fields ; i++)
  	{
! 		int len = columns[i].len;
! 		if (len < 0)
! 			cstrs[i] = NULL;
! 		else
! 		{
! 			char *tmp = sinfo->valbuf[i];
! 			int tmplen = sinfo->valbuflen[i];
! 
! 			/*
! 			 * Divide calls to malloc and realloc so that things will
! 			 * go fine even on the systems of which realloc() does not
! 			 * accept NULL as old memory block.
! 			 *
! 			 * Also try to (re)allocate in bigger steps to
! 			 * avoid flood of allocations on weird data.
! 			 */
! 			if (tmp == NULL)
! 			{
! 				tmplen = len + 1;
! 				if (tmplen < 64)
! 					tmplen = 64;
! 				tmp = (char *)malloc(tmplen);
! 			}
! 			else if (tmplen < len + 1)
! 			{
! 				if (len + 1 > tmplen * 2)
! 					tmplen = len + 1;
! 				else
! 					tmplen = tmplen * 2;
! 				tmp = (char *)realloc(tmp, tmplen);
! 			}
! 
! 			/*
! 			 * sinfo->valbuf[n] will be freed in finishStoreInfo()
! 			 * when realloc returns NULL.
! 			 */
! 			if (tmp == NULL)
! 				return -1;  /* Inform out of memory to the caller */
! 
! 			sinfo->valbuf[i] = tmp;
! 			sinfo->valbuflen[i] = tmplen;
! 
! 			cstrs[i] = sinfo->valbuf[i];
! 			memcpy(cstrs[i], columns[i].value, len);
! 			cstrs[i][len] = '\0';
! 		}
  	}
! 
! 	/*
! 	 * These functions may throw exception. It will be caught in
! 	 * dblink_record_internal()
! 	 */
! 	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
! 	tuplestore_puttuple(sinfo->tuplestore, tuple);
! 
! 	return 1;
  }
  
  /*
*** a/contrib/dblink/expected/dblink.out
--- b/contrib/dblink/expected/dblink.out
***************
*** 371,377 **** SELECT *
  FROM dblink('myconn','SELECT * FROM foobar',false) AS t(a int, b text, c text[])
  WHERE t.a > 7;
  NOTICE:  relation "foobar" does not exist
! CONTEXT:  Error occurred on dblink connection named "unnamed": could not execute query.
   a | b | c 
  ---+---+---
  (0 rows)
--- 371,377 ----
  FROM dblink('myconn','SELECT * FROM foobar',false) AS t(a int, b text, c text[])
  WHERE t.a > 7;
  NOTICE:  relation "foobar" does not exist
! CONTEXT:  Error occurred on dblink connection named "myconn": could not execute query.
   a | b | c 
  ---+---+---
  (0 rows)
pqgetrow-v2.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 3d8aaf0..07e8900 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -4115,6 +4115,112 @@ int PQflush(PGconn *conn);
    read-ready and then read the response as described above.
   </para>
 
+  <para>
+   Above-mentioned functions always wait until full resultset has arrived
+   before makeing row data available as PGresult.  Sometimes it's
+   more useful to process rows as soon as the arrive from network.
+   For that, following functions can be used:
+   <variablelist>
+    <varlistentry id="libpq-pqgetrow">
+     <term>
+      <function>PQgetRow</function>
+      <indexterm>
+       <primary>PQgetRow</primary>
+      </indexterm>
+     </term>
+
+     <listitem>
+      <para>
+       Waits for the next row from a prior
+       <function>PQsendQuery</function>,
+       <function>PQsendQueryParams</function>,
+       <function>PQsendQueryPrepared</function> call, and returns it.
+       A null pointer is returned when no more rows are available or
+       some error happened.
+<synopsis>
+PGresult *PQgetRow(PGconn *conn);
+</synopsis>
+      </para>
+
+      <para>
+       If this function returns non-NULL result, it is a
+       <structname>PGresult</structname> that contains exactly 1 row.
+       It needs to be freed later with <function>PQclear</function>.
+      </para>
+      <para>
+       On synchronous connection, the function will wait for more
+       data from network until all resultset is done.  So it returns
+       NULL only if resultset has completely received or some error
+       happened.  In both cases, call <function>PQgetResult</function>
+       next to get final status.
+      </para>
+
+      <para>
+       On asynchronous connection the function does not read more data
+       from network.   So after NULL call <function>PQisBusy</function>
+       to see whether final <structname>PGresult</structname> is avilable
+       or more data needs to be read from network via
+       <function>PQconsumeInput</function>.  Do not call
+       <function>PQisBusy</function> before <function>PQgetRow</function>
+       has returned NULL, as <function>PQisBusy</function> will parse
+       any available rows and add them to main <function>PGresult</function>
+       that will be returned later by <function>PQgetResult</function>.
+      </para>
+
+     </listitem>
+    </varlistentry>
+
+    <varlistentry id="libpq-pqrecvrow">
+     <term>
+      <function>PQrecvRow</function>
+      <indexterm>
+       <primary>PQrecvRow</primary>
+      </indexterm>
+     </term>
+
+     <listitem>
+      <para>
+       Get row data without constructing PGresult for it.  This is the
+       underlying function for <function>PQgetRow</function>.
+<synopsis>
+int PQrecvRow(PGconn *conn, PGresult **hdr_p, PGrowValue **row_p);
+</synopsis>
+      </para>
+
+      <para>
+       It returns row data as pointers to network buffer.
+       All structures are owned by <application>libpq</application>'s
+       <structname>PGconn</structname> and must not be freed or stored
+       by user.  Instead row data should be copied to user structures, before
+       any <application>libpq</application> result-processing function
+       is called.
+      </para>
+      <para>
+       It returns 1 when row data is available.
+       Argument <parameter>hdr_p</parameter> will contain pointer
+       to empty <structname>PGresult</structname> that describes
+       row contents.  Actual data is in <parameter>row_p</parameter>.
+       For the description of structure <structname>PGrowValue</structname>
+       see <xref linkend="libpq-altrowprocessor">.
+      </para>
+      <para>It returns 0 when no more rows are avalable.  On synchronous
+       connection, it means resultset is fully arrived.  Call
+       <function>PQgetResult</function> to get final status.
+       On asynchronous connection it can also mean more data
+       needs to be read from network.  Call <function>PQisBusy</function>
+       to see whether <function>PQgetResult</function>
+       or <function>PQconsumeInput</function> needs to be called next.
+      </para>
+      <para>
+       it returns -1 if some network error occured.
+       Use connection status functions described in <xref linkend="libpq-status">
+       to check connection state.
+      </para>
+
+     </listitem>
+    </varlistentry>
+   </variablelist>
+  </para>
  </sect1>
 
  <sect1 id="libpq-cancel">
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index a6418ec..0433e4a 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -163,3 +163,5 @@ PQlibVersion              160
 PQsetRowProcessor	  	  161
 PQgetRowProcessor	  	  162
 PQskipResult		  	  163
+PQrecvRow				  164
+PQgetRow				  165
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 52beb07..ffcfd46 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1969,6 +1969,133 @@ PQskipResult(PGconn *conn, int skipAll)
 	return ret;
 }
 
+/* temp buffer to pass pointers */
+struct RecvRowBuf
+{
+	PGresult *temp_hdr;
+	PGrowValue *temp_row;
+};
+
+/* set pointers, do early exit from PQisBusy() */
+static int
+recv_row_proc(PGresult *hdr, PGrowValue *row, void *arg)
+{
+	struct RecvRowBuf *buf = arg;
+	buf->temp_hdr = hdr;
+	buf->temp_row = row;
+	return 0;
+}
+
+/*
+ * PQrecvRow
+ *
+ * Wait and return next row in resultset.
+ *
+ * Returns:
+ *   1 - got row data, the pointers are owned by PGconn
+ *   0 - no rows available, either resultset complete
+ *       or more data needed (async-only)
+ *  -1 - some problem, check connection error
+ */
+int
+PQrecvRow(PGconn *conn, PGresult **hdr_p, PGrowValue **row_p)
+{
+	struct RecvRowBuf buf;
+	int rc;
+	int ret = -1;
+	PQrowProcessor oldproc;
+	void *oldarg;
+
+	*hdr_p = NULL;
+	*row_p = NULL;
+
+	/* the query may be still pending, send it */
+	while (1)
+	{
+		rc = PQflush(conn);
+		if (rc < 0)
+			return -1;
+		if (rc == 0)
+			break;
+		if (pqWait(FALSE, TRUE, conn))
+			return -1;
+	}
+
+	/* replace existing row processor */
+	oldproc = PQgetRowProcessor(conn, &oldarg);
+	PQsetRowProcessor(conn, recv_row_proc, &buf);
+
+	/* read data */
+	while (1)
+	{
+		buf.temp_hdr = NULL;
+		buf.temp_row = NULL;
+
+		/* done with resultset? */
+		if (!PQisBusy(conn))
+			break;
+
+		/* new row available? */
+		if (buf.temp_row)
+		{
+			*hdr_p = buf.temp_hdr;
+			*row_p = buf.temp_row;
+			ret = 1;
+			goto done;
+		}
+
+		/*
+		 * More data needed
+		 */
+
+		if (pqIsnonblocking(conn))
+			/* let user worry about new data */
+			break;
+		if (pqWait(TRUE, FALSE, conn))
+			goto done;
+		if (!PQconsumeInput(conn))
+			goto done;
+	}
+	/* no more rows available */
+	ret = 0;
+done:
+	/* restore old row processor */
+	PQsetRowProcessor(conn, oldproc, oldarg);
+	return ret;
+}
+
+/*
+ * PQgetRow
+ *		Returns next available row for resultset.  NULL means
+ *		no row available, either resultset is done
+ *		or more data needed (only if async connection).
+ */
+PGresult *
+PQgetRow(PGconn *conn)
+{
+	PGresult *hdr, *res;
+	PGrowValue *row;
+
+	/* check if row is available */
+	if (PQrecvRow(conn, &hdr, &row) != 1)
+		return NULL;
+
+	/* Now make PGresult out of it */
+	res = PQcopyResult(hdr, PG_COPYRES_ATTRS);
+	if (!res)
+	{
+		printfPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("out of memory\n"));
+		pqSaveErrorResult(conn);
+		return NULL;
+	}
+
+	/* add the row, pqAddRow sets error itself */
+	if (pqAddRow(res, row, NULL))
+		return res;
+	PQclear(res);
+	return NULL;
+}
 
 /*
  * PQdescribePrepared
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index e1d3339..50872a5 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -400,6 +400,10 @@ extern int PQsendQueryPrepared(PGconn *conn,
 					int resultFormat);
 extern PGresult *PQgetResult(PGconn *conn);
 
+/* fetch single row from resultset */
+extern PGresult *PQgetRow(PGconn *conn);
+extern int PQrecvRow(PGconn *conn, PGresult **hdr_p, PGrowValue **row_p);
+
 /* Routines for managing an asynchronous query */
 extern int	PQisBusy(PGconn *conn);
 extern int	PQconsumeInput(PGconn *conn);
#86Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Marko Kreen (#85)
Re: Speed dblink using alternate libpq tuple storage

Hello,

# - We expect PQisBusy(), PQconsumeInput()(?) and
# - PQgetResult() to exit immediately and we can
# - call PQgetResult(), PQskipResult() or
# - PQisBusy() after.
| 1 - OK ("I'm done with the row")
| - save result and getAnotherTuple returns 0.

The lines prefixed with '#' is the desirable behavior I have
understood from the discussion so far. And I doubt that it works
as we expected for PQgetResult().

No, the desirable behavior is already implemented and documented -
the "stop parsing" return code affects only PQisBusy(). As that
is the function that does the actual parsing.

I am satisfied with your answer. Thank you.

The main plus if such scheme is that we do not change the behaviour
of any existing APIs.

I agree with the policy.

You optimized libpq_gettext() calls, but it's wrong - they must
wrap the actual strings so that the strings can be extracted
for translating.

Ouch. I remember to have had an build error about that before...

Fixed in attached patch.

I'm sorry to annoy you.

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then. This allows internal pqAddRow
to set regular "out of memory" error. Otherwise give generic
"row processor error".

..

The suggestion was about getAnotherTuple() - currently it sets
always "error in row processor". With such check, the callback
can set the error result itself. Currently only callbacks that
live inside libpq can set errors, but if we happen to expose
error-setting function in outside API, then the getAnotherTuple()
would already be ready for it.

I see. And I found it implemented in your patch.

See attached patch, I did some generic comment/docs cleanups
but also minor code cleanups:

- PQsetRowProcessor(NULL,NULL) sets Param to PGconn, instead NULL,
- pqAddRow sets "out of memory" error itself on PGconn.
- getAnotherTuple(): when callback returns -1, it checks if error
- dropped the error_saveresult label, it was unnecessary branch.

Ok, I've confirmed them.

- put libpq_gettext() back around strings.
- made functions survive conn==NULL.
- dblink: refreshed regtest result, as now we get actual

Thank you for fixing my bugs.

- dblink: changed skipAll parameter for PQskipResult() to TRUE,
as dblink uses PQexec which can send several queries.

I agree to the change. I intended to allow to receive the results
after skipping the current result for failure. But that seems not
only not very likely, but also to be something dangerous.

- Synced PQgetRow patch with return value changes.

- Synced demos at https://github.com/markokr/libpq-rowproc-demos
with return value changes.

I'm pretty happy with current state. So tagging it
ReadyForCommitter.

Thank you very much for all your help.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#87Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#85)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

[ patches against libpq and dblink ]

I think this patch needs a lot of work.

AFAICT it breaks async processing entirely, because many changes have been
made that fail to distinguish "insufficient data available as yet" from
"hard error". As an example, this code at the beginning of
getAnotherTuple:

/* Get the field count and make sure it's what we expect */
if (pqGetInt(&tupnfields, 2, conn))
! return EOF;

is considering three cases: it got a 2-byte integer (and can continue on),
or there aren't yet 2 more bytes available in the buffer, in which case it
should return EOF without doing anything, or pqGetInt detected a hard
error and updated the connection error state accordingly, in which case
again there is nothing to do except return EOF. In the patched code we
have:

/* Get the field count and make sure it's what we expect */
if (pqGetInt(&tupnfields, 2, conn))
! {
! /* Whole the message must be loaded on the buffer here */
! errmsg = libpq_gettext("protocol error\n");
! goto error_and_forward;
! }

which handles neither the second nor third case correctly: it thinks that
"data not here yet" is a hard error, and then makes sure it is an error by
destroying the parsing state :-(. And if in fact pqGetInt did log an
error, that possibly-useful error message is overwritten with an entirely
useless "protocol error" text.

I don't think the error return cases for the row processor have been
thought out too well either. The row processor is not in charge of what
happens to the PGresult, and it certainly has no business telling libpq to
just "exit immediately from the topmost libpq function". If we do that
we'll probably lose sync with the data stream and be unable to recover use
of the connection at all. Also, do we need to consider any error cases
for the row processor other than out-of-memory? If so it might be a good
idea for it to have some ability to store a custom error message into the
PGconn, which it cannot do given the current function API.

In the same vein, I am fairly uncomfortable with the blithe assertion that
a row processor can safely longjmp out of libpq. This was never foreseen
in the original library coding and there are any number of places that
that might break, now or in the future. Do we really need to allow it?
If we do, it would be a good idea to decorate the libpq functions that are
now expected to possibly longjmp with comments saying so. Tracing all the
potential call paths that might be aborted by a longjmp is an essential
activity anyway.

Another design deficiency is PQskipResult(). This is badly designed for
async operation because once you call it, it will absolutely not give back
control until it's read the whole query result. (It'd be better to have
it set some kind of flag that makes future library calls discard data
until the query result end is reached.) Something else that's not very
well-designed about it is that it might throw away more than just incoming
tuples. As an example, suppose that the incoming data at the time you
call it consists of half a dozen rows followed by an ErrorResponse. The
ErrorResponse will be eaten and discarded, leaving the application no clue
why its transaction has been aborted, or perhaps even the entire session
cancelled. What we probably want here is just to transiently install a
row processor that discards all incoming data, but the application is
still expected to eventually fetch a PGresult that will tell it whether
the server side of the query completed or not.

In the dblink patch, given that you have to worry about elogs coming out
of BuildTupleFromCStrings and tuplestore_puttuple anyway, it's not clear
what is the benefit of using malloc rather than palloc to manage the
intermediate buffers in storeHandler --- what that seems to mostly
accomplish is increase the risk of permanent memory leaks. I don't see
much value in per-column buffers either; it'd likely be cheaper to just
palloc one workspace that's big enough for all the column strings
together. And speaking of leaks, doesn't storeHandler leak the
constructed tuple on each call, not to mention whatever might be leaked by
the called datatype input functions?

It also seems to me that the dblink patch breaks the case formerly handled
in materializeResult() of a PGRES_COMMAND_OK rather than PGRES_TUPLES_OK
result. The COMMAND case is supposed to be converted to a single-column
text result, and I sure don't see where that's happening now.

BTW, am I right in thinking that some of the hunks in this patch are meant
to fix a pre-existing bug of failing to report the correct connection
name? If so, it'd likely be better to split those out and submit as a
separate patch, instead of conflating them with a feature addition.

Lastly, as to the pqgetrow patch, the lack of any demonstrated test case
for these functions makes me uncomfortable as to whether they are well
designed. Again, I'm unconvinced that the error handling is good or that
they work sanely in async mode. I'm inclined to just drop these for this
go-round, and to stick to providing the features that we can test via the
dblink patch.

regards, tom lane

#88Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Tom Lane (#87)
Re: Speed dblink using alternate libpq tuple storage

Thank you for picking up.

is considering three cases: it got a 2-byte integer (and can continue on),
or there aren't yet 2 more bytes available in the buffer, in which case it
should return EOF without doing anything, or pqGetInt detected a hard
error and updated the connection error state accordingly, in which case
again there is nothing to do except return EOF. In the patched code we
have:

...

which handles neither the second nor third case correctly: it thinks that
"data not here yet" is a hard error, and then makes sure it is an error by
destroying the parsing state :-(.

Marko and I think that, in protocol 3, all bytes of the incoming
message should have been surely loaded when entering
getAnotherTuple(). The following part In pqParseInput3() does
this.

| if (avail < msgLength)
| {
| /*
| * Before returning, enlarge the input buffer if needed to hold
| * the whole message.
| (snipped)..
| */
| if (pqCheckInBufferSpace(conn->inCursor + (size_t) msgLength,
| conn))
| {
| /*
| * XXX add some better recovery code...
| (snipped)..
| */
| handleSyncLoss(conn, id, msgLength);
| }
| return;

So, if cursor state is broken just after exiting
getAnotherTuple(), it had already been broken BEFORE entering
getAnotherTuple() according to current disign. That is the
'protocol error' means. pqGetInt there should not detect any
errors except for broken message.

error, that possibly-useful error message is overwritten with an entirely
useless "protocol error" text.

Plus, current pqGetInt seems to set its own error message only
for the wrong parameter 'bytes'.

On the other hand, in protocol 2 (to be removed ?) the error
handling mechanism get touched, because full-load of the message
is not guraranteed.

I don't think the error return cases for the row processor have been
thought out too well either. The row processor is not in charge of what
happens to the PGresult,

Default row processor stuffs PGresult with tuples, another (say
that of dblink) leave it empty. Row processor manages PGresult by
the extent of their own.

and it certainly has no business telling libpq to just "exit
immediately from the topmost libpq function". If we do that
we'll probably lose sync with the data stream and be unable to
recover use of the connection at all.

I don't think PGresult has any charge of error handling system in
current implement. The phrase 'exit immediately from the topmost
libpq function' should not be able to be seen in the patch.

The exit routes from row processor are following,

- Do longjmp (or PG_PG_TRY-CATCH mechanism) out of the row
processor.

- Row processor returns 0 when entered from PQisBusy(),
immediately exit from PQisBusy().

Curosor consistency will be kept in both case. The cursor already
be on the next to the last byte of the current message.

Also, do we need to consider any error cases for the row
processor other than out-of-memory? If so it might be a good
idea for it to have some ability to store a custom error
message into the PGconn, which it cannot do given the current
function API.

It seems not have so strong necessity concerning dblink or
PQgetRow comparing to expected additional complexity around. So
this patch does not include it.

In the same vein, I am fairly uncomfortable with the blithe assertion that
a row processor can safely longjmp out of libpq. This was never foreseen
in the original library coding and there are any number of places that
that might break, now or in the future. Do we really need to allow it?

To protect row processor from longjmp'ing out, I enclosed the
functions potentially throw exception by PG_TRY-CATCH clause in
the early verson. This was totally safe but the penalty was not
negligible because the TRY-CATCH was passed for every row.

If we do, it would be a good idea to decorate the libpq
functions that are now expected to possibly longjmp with
comments saying so. Tracing all the potential call paths that
might be aborted by a longjmp is an essential activity anyway.

Concerning now but the future, I can show you the trail of
confirmation process.

- There is no difference between with and without the patch at
the level of getAnotherTuple() from the view of consistency.

- Assuming pqParseInput3 detects the next message has not come
after getAnotherTuple returned. It exits immediately on reading
the length of the next message. This is the same condition to
longjumping.

if (pqGetInt(&msgLength, 4, conn))
return;

- parseInput passes it through and immediately exits in
consistent state.

- The caller of PQgetResult, PQisBusy, PQskipResult, PQnotifies,
PQputCopyData, pqHandleSendFailure gain the control finally. I
am convinced that the async status at the time must be
PGASYNC_BUSY and the conn cursor in consistent state.

So the ancestor of row processor is encouraged to call
PQfinish, PQgetResult, PQskipResult after getting longjmped in
the document. These functions should resolve the intermediate
status described above created by longjmp by restarting parsing
the stream afterwards

And about the future, altough it is a matter of cource that every
touch on the code will cause every destruction, longjmp stepping
over libpq internal code seems something complicated which is
enough to cause trouble. I will marking as 'This function is
skipped over by the longjmp invoked in the descendents.' (Better
expressions are welcome..)

Another design deficiency is PQskipResult(). This is badly
designed for async operation because once you call it, it will
absolutely not give back control until it's read the whole
query result. (It'd be better to have it set some kind of flag
that makes future library calls discard data until the query
result end is reached.)

If this function is called just after getting longjmp from row
processor, the async status of the connection at the time must be
PGASYNC_BUSY. So I think this function should always returns even
if the longjmp takes place at the last row in a result. There
must be following 'C' message if not broken.

Something else that's not very well-designed about it is that
it might throw away more than just incoming tuples. As an
example, suppose that the incoming data at the time you call it
consists of half a dozen rows followed by an ErrorResponse.
The ErrorResponse will be eaten and discarded, leaving the
application no clue why its transaction has been aborted, or
perhaps even the entire session cancelled.

If the caller needs to see the ErrorResposes following, I think
calling PQgetResult seems enough.

What we probably want here is just to transiently install a
row processor that
discards all incoming data, but the application is still
expected to eventually fetch a PGresult that will tell it
whether the server side of the query completed or not.

The all-dicarding row processor should not be visible to the
user. The users could design the row processor to behave so if
they want. This is mere a shortcut but it seems difficult to do
so without.

In the dblink patch, ... it's not clear what is the benefit of
using malloc rather than palloc to manage the intermediate
buffers in storeHandler --- what that seems to mostly
accomplish is increase the risk of permanent memory leaks.

Hmm. I thought that palloc is heavier than malloc, and the
allocated blocks looked well controlled so that there won't be
likely to leak. I will change to use palloc's counting the
discussion about block granurality below.

I don't see much value in per-column buffers either; it'd
likely be cheaper to just palloc one workspace that's big
enough for all the column strings together.

I had hated to count the total length prior to copying the
contents in the early version. In the latest the total
requirement of the memory is easily obatined. So it seems good to
alloc together. I'll do so.

And speaking of leaks, doesn't storeHandler leak the
constructed tuple on each call, not to mention whatever might
be leaked by the called datatype input functions?

I copied the process in the original materializeResult into
storeHandler. tuplestore_donestoring was removed because it is
obsoleted. So I think it is no problem about it. Am I wrong?

It also seems to me that the dblink patch breaks the case
formerly handled in materializeResult() of a PGRES_COMMAND_OK
rather than PGRES_TUPLES_OK result. The COMMAND case is
supposed to be converted to a single-column text result, and I
sure don't see where that's happening now.

I'm sorry. I will restore that route.

BTW, am I right in thinking that some of the hunks in this
patch are meant to fix a pre-existing bug of failing to report
the correct connection name? If so, it'd likely be better to
split those out and submit as a separate patch, instead of
conflating them with a feature addition.

Ok. I will split it.

Lastly, as to the pqgetrow patch, the lack of any demonstrated
test case for these functions makes me uncomfortable as to
whether they are well designed. Again, I'm unconvinced that
the error handling is good or that they work sanely in async
mode. I'm inclined to just drop these for this go-round, and
to stick to providing the features that we can test via the
dblink patch.

Testing pqGetRow via dblink?

Do you mean 'drop these' as pqGetRow? So, this part might be
droppable apart from the rest.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#89Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#87)
1 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

I saw Kyotaro already answered, but I give my view as well.

On Thu, Mar 22, 2012 at 06:07:16PM -0400, Tom Lane wrote:

AFAICT it breaks async processing entirely, because many changes have been
made that fail to distinguish "insufficient data available as yet" from
"hard error". As an example, this code at the beginning of
getAnotherTuple:

/* Get the field count and make sure it's what we expect */
if (pqGetInt(&tupnfields, 2, conn))
! return EOF;

is considering three cases: it got a 2-byte integer (and can continue on),
or there aren't yet 2 more bytes available in the buffer, in which case it
should return EOF without doing anything, or pqGetInt detected a hard
error and updated the connection error state accordingly, in which case
again there is nothing to do except return EOF. In the patched code we
have:

/* Get the field count and make sure it's what we expect */
if (pqGetInt(&tupnfields, 2, conn))
! {
! /* Whole the message must be loaded on the buffer here */
! errmsg = libpq_gettext("protocol error\n");
! goto error_and_forward;
! }

which handles neither the second nor third case correctly: it thinks that
"data not here yet" is a hard error, and then makes sure it is an error by
destroying the parsing state :-(. And if in fact pqGetInt did log an
error, that possibly-useful error message is overwritten with an entirely
useless "protocol error" text.

No, "protocol error" really is only error case here.

- pqGetInt() does not set errors.

- V3 getAnotherTuple() is called only if packet is fully in buffer.

I don't think the error return cases for the row processor have been
thought out too well either. The row processor is not in charge of what
happens to the PGresult, and it certainly has no business telling libpq to
just "exit immediately from the topmost libpq function". If we do that
we'll probably lose sync with the data stream and be unable to recover use
of the connection at all. Also, do we need to consider any error cases
for the row processor other than out-of-memory?

No, the rule is *not* "exit to topmost", but "exit PQisBusy()".

This is exactly so that if any code that does not expect row-processor
behaviour continues to work.

Also, from programmers POV, this also means row-processor callback causes
minimal changes to existing APIs.

If so it might be a good
idea for it to have some ability to store a custom error message into the
PGconn, which it cannot do given the current function API.

There already was such function, but it was row-processor specific hack
that could leak out and create confusion. I rejected it. Instead there
should be generic error setting function, equivalent to current libpq
internal error setting.

But such generic error setting function would need review all libpq
error states as it allows error state appear in new situations. Also
we need to have well-defined behaviour of client-side errors vs. incoming
server errors.

Considering that even current cut-down patch is troubling committers,
I would definitely suggest postponing such generic error setter to 9.3.

Especially as it does not change anything coding-style-wise.

In the same vein, I am fairly uncomfortable with the blithe assertion that
a row processor can safely longjmp out of libpq. This was never foreseen
in the original library coding and there are any number of places that
that might break, now or in the future. Do we really need to allow it?
If we do, it would be a good idea to decorate the libpq functions that are
now expected to possibly longjmp with comments saying so. Tracing all the
potential call paths that might be aborted by a longjmp is an essential
activity anyway.

I think we *should* allow exceptions, but in limited number of APIs.

Basically, the usefulness for users vs. effort from our side
is clearly on the side of providing it.

But its up to us to define what the *limited* means (what needs
least effort from us), so that later when users want to use exceptions
in callback, they need to pick right API.

Currently it seems only PQexec() + multiple SELECTS can give trouble,
as previous PGresult is kept in stack. Should we unsupport
PQexec or multiple SELECTS?

But such case it borken even without exceptions - or at least
very confusing. Not sure what to do with it.

In any case, "decorating" libpq functions is wrong approach. This gives
suggestion that caller of eg. PQexec() needs to take care of any possible
behaviour of unknown callback. This will not work. Instead allowed
functions should be simply listed in row-processor documentation.

Basically custom callback should be always matched by caller that
knows about it and knows how to handle it. Not sure how to put
such suggestion into documentation tho'.

Another design deficiency is PQskipResult(). This is badly designed for
async operation because once you call it, it will absolutely not give back
control until it's read the whole query result. (It'd be better to have
it set some kind of flag that makes future library calls discard data
until the query result end is reached.) Something else that's not very
well-designed about it is that it might throw away more than just incoming
tuples. As an example, suppose that the incoming data at the time you
call it consists of half a dozen rows followed by an ErrorResponse. The
ErrorResponse will be eaten and discarded, leaving the application no clue
why its transaction has been aborted, or perhaps even the entire session
cancelled. What we probably want here is just to transiently install a
row processor that discards all incoming data, but the application is
still expected to eventually fetch a PGresult that will tell it whether
the server side of the query completed or not.

I guess it's designed for rolling connection forward in exception
handler... And it's blocking-only indeed. Considering that better
approach is to drop the connection, instead trying to save it,
it's usefulness is questionable.

I'm OK with dropping it.

Lastly, as to the pqgetrow patch, the lack of any demonstrated test case
for these functions makes me uncomfortable as to whether they are well
designed. Again, I'm unconvinced that the error handling is good or that
they work sanely in async mode. I'm inclined to just drop these for this
go-round, and to stick to providing the features that we can test via the
dblink patch.

I simplified the github test cases and attached as patch.

Could you please give more concrete critique of the API?

Main idea behing PQgetRow is that it does not replace any existing
API function, instead acts as addition:

* Sync case: PQsendQuery() + PQgetResult - PQgetRow should be called
before PQgetResult until it returns NULL, then proceed with PQgetResult
to get final state.

* Async case: PQsendQuery() + PQconsumeInput() + PQisBusy() + PQgetResult().
Here PQgetRow() should be called before PQisBusy() until it returns
NULL, then proceed with PQisBusy() as usual.

It only returns rows, never any error state PGresults.

Main advantage of including PQgetRow() together with low-level
rowproc API is that it allows de-emphasizing more complex parts of
rowproc API (exceptions, early exit, skipresult, custom error msg).
And drop/undocument them or simply call them postgres-internal-only.

--
marko

Attachments:

rowproc-demos.difftext/x-diff; charset=us-asciiDownload
diff --git a/src/test/examples/Makefile b/src/test/examples/Makefile
index bbc6ee1..e0d6c41 100644
--- a/src/test/examples/Makefile
+++ b/src/test/examples/Makefile
@@ -14,9 +14,11 @@ override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS)
 override LDLIBS := $(libpq_pgport) $(LDLIBS)
 
 
-PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo
+PROGS = testlibpq testlibpq2 testlibpq3 testlibpq4 testlo \
+	rowproc-sync rowproc-async getrow-sync getrow-async
 
 all: $(PROGS)
 
 clean:
-	rm -f $(PROGS)
+	rm -f $(PROGS) *.o
+
diff --git a/src/test/examples/getrow-async.c b/src/test/examples/getrow-async.c
new file mode 100644
index 0000000..d74f77a
--- /dev/null
+++ b/src/test/examples/getrow-async.c
@@ -0,0 +1,196 @@
+/*
+ * PQgetRow async demo.
+ *
+ * usage: getrow-async [connstr [query]]
+ */
+
+#include <sys/select.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <libpq-fe.h>
+
+struct Context {
+	PGconn *db;
+	int count;
+};
+
+static void die(PGconn *db, const char *msg)
+{
+	fprintf(stderr, "%s: %s", msg, PQerrorMessage(db));
+	exit(1);
+}
+
+/* wait for event on socket */
+static void db_wait(PGconn *db, int for_write)
+{
+	int fd = PQsocket(db);
+	fd_set fds;
+	int res;
+
+retry:
+	FD_ZERO(&fds);
+	FD_SET(fd, &fds);
+	if (for_write)
+		res = select(fd+1, NULL, &fds, NULL, NULL);
+	else
+		res = select(fd+1, &fds, NULL, NULL, NULL);
+
+	if (res == 0)
+		goto retry;
+	if (res < 0 && errno == EINTR)
+		goto retry;
+	if (res < 0)
+	{
+		fprintf(stderr, "select() failed: %s", strerror(errno));
+		exit(1);
+	}
+}
+
+static void proc_row(struct Context *ctx, PGresult *res)
+{
+	const char *val = PQgetvalue(res, 0, 0);
+	ctx->count++;
+	if (0)
+	printf("column#0: %s\n", val ? val : "NULL");
+}
+
+static void proc_result(struct Context *ctx, PGresult *r)
+{
+	ExecStatusType s;
+
+	s = PQresultStatus(r);
+	if (s == PGRES_TUPLES_OK)
+		printf("query successful, got %d rows\n", ctx->count);
+	else
+		printf("%s: %s\n", PQresStatus(s), PQerrorMessage(ctx->db));
+	PQclear(r);
+}
+
+/*
+ * Handle socket read event
+ *
+ * Returns:
+ * -1 - error
+ *  0 - need to read more data
+ *  1 - all done
+ */
+
+static int socket_read_cb(struct Context *ctx)
+{
+	PGresult *r;
+
+	/* read incoming data */
+	if (!PQconsumeInput(ctx->db))
+		return -1;
+
+	/*
+	 * One query may result in several PGresults,
+	 * first loop is over all PGresults.
+	 */
+	while (1) {
+		/*
+		 * Process all rows already in buffer.
+		 */
+		while (1) {
+			r = PQgetRow(ctx->db);
+			if (!r)
+				break;
+
+			proc_row(ctx, r);
+
+			PQclear(r);
+		}
+
+		/* regular async logic follows */
+
+		/* Need more data from network */
+		if (PQisBusy(ctx->db))
+			return 0;
+
+		/* we have complete PGresult ready */
+		r = PQgetResult(ctx->db);
+		if (r == NULL) {
+			/* all results have arrived */
+			return 1;
+		} else {
+			/* process final resultset status */
+			proc_result(ctx, r);
+		}
+	}
+}
+
+static void exec_query(struct Context *ctx, const char *q)
+{
+	int res;
+	int waitWrite;
+	PGconn *db = ctx->db;
+
+	ctx->count = 0;
+
+	/* launch query */
+	if (!PQsendQuery(ctx->db, q))
+		die(ctx->db, "PQsendQuery");
+
+	/* flush query */
+	res = PQflush(db);
+	if (res < 0)
+		die(db, "flush 1");
+	waitWrite = res > 0;
+
+	/* read data */
+	while (1) {
+		/* sleep until event */
+		db_wait(ctx->db, waitWrite);
+
+		/* got event, process it */
+		if (waitWrite) {
+			/* still more to flush? */
+			res = PQflush(db);
+			if (res < 0)
+				die(db, "flush 2");
+			waitWrite = res > 0;
+		} else {
+			/* read result */
+			res = socket_read_cb(ctx);
+			if (res < 0)
+				die(db, "socket_read_cb");
+			if (res > 0)
+				return;
+			waitWrite = 0;
+		}
+	}
+}
+
+int main(int argc, char *argv[])
+{
+	const char *connstr;
+	const char *q;
+	PGconn *db;
+	struct Context ctx;
+
+	connstr = "dbname=postgres";
+	if (argc > 1)
+		connstr = argv[1];
+
+	q = "show all";
+	if (argc > 2)
+		q = argv[2];
+
+	db = PQconnectdb(connstr);
+	if (!db || PQstatus(db) == CONNECTION_BAD)
+		die(db, "connect");
+
+	/* set up socket */
+	PQsetnonblocking(db, 1);
+
+	ctx.db = db;
+	exec_query(&ctx, q);
+
+	PQfinish(db);
+
+	return 0;
+}
+
diff --git a/src/test/examples/getrow-sync.c b/src/test/examples/getrow-sync.c
new file mode 100644
index 0000000..a7ed4f6
--- /dev/null
+++ b/src/test/examples/getrow-sync.c
@@ -0,0 +1,83 @@
+/*
+ * PQgetRow sync demo.
+ *
+ * usage: getrow-sync [connstr [query]]
+ */
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <libpq-fe.h>
+
+struct Context {
+	PGconn *db;
+	int count;
+};
+
+static void die(PGconn *db, const char *msg)
+{
+	fprintf(stderr, "%s: %s\n", msg, PQerrorMessage(db));
+	exit(1);
+}
+
+static void exec_query(struct Context *ctx, const char *q)
+{
+	PGconn *db = ctx->db;
+	PGresult *r;
+	ExecStatusType s;
+
+	ctx->count = 0;
+
+	if (!PQsendQuery(db, q))
+		die(db, "PQsendQuery");
+
+	/* loop with PQgetRow until final PGresult is available */
+	while (1) {
+		r = PQgetRow(db);
+		if (!r)
+			break;
+		ctx->count++;
+		PQclear(r);
+	}
+
+	/* final PGresult, either PGRES_TUPLES_OK or error */
+	r = PQgetResult(db);
+	s = PQresultStatus(r);
+	if (s == PGRES_TUPLES_OK)
+		printf("query successful, got %d rows\n", ctx->count);
+	else
+		printf("%s: %s\n", PQresStatus(s), PQerrorMessage(db));
+
+	PQclear(r);
+}
+
+
+int main(int argc, char *argv[])
+{
+	const char *connstr;
+	const char *q = "show all";
+	PGconn *db;
+	struct Context ctx;
+
+	connstr = "dbname=postgres";
+	if (argc > 1)
+		connstr = argv[1];
+
+	q = "show all";
+	if (argc > 2)
+		q = argv[2];
+
+	db = PQconnectdb(connstr);
+	if (!db || PQstatus(db) == CONNECTION_BAD)
+		die(db, "connect");
+
+	ctx.db = db;
+	exec_query(&ctx, q);
+
+	PQfinish(db);
+
+	return 0;
+}
+
diff --git a/src/test/examples/rowproc-async.c b/src/test/examples/rowproc-async.c
new file mode 100644
index 0000000..88b1672
--- /dev/null
+++ b/src/test/examples/rowproc-async.c
@@ -0,0 +1,189 @@
+/*
+ * Row processor async demo.
+ *
+ * usage: rowproc-async [connstr [query]]
+ */
+
+
+#include <sys/select.h>
+#include <errno.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include <libpq-fe.h>
+
+struct Context {
+	PGconn *db;
+	int count;
+};
+
+/* print db error message and die */
+static void die(PGconn *db, const char *msg)
+{
+	fprintf(stderr, "%s: %s", msg, PQerrorMessage(db));
+	exit(1);
+}
+
+/* wait for event on socket */
+static void db_wait(PGconn *db, int for_write)
+{
+	int fd = PQsocket(db);
+	fd_set fds;
+	int res;
+
+retry:
+	FD_ZERO(&fds);
+	FD_SET(fd, &fds);
+	if (for_write)
+		res = select(fd+1, NULL, &fds, NULL, NULL);
+	else
+		res = select(fd+1, &fds, NULL, NULL, NULL);
+
+	if (res == 0)
+		goto retry;
+	if (res < 0 && errno == EINTR)
+		goto retry;
+	if (res < 0)
+	{
+		fprintf(stderr, "select() failed: %s", strerror(errno));
+		exit(1);
+	}
+}
+
+/* do something with one row */
+static void proc_row(struct Context *ctx, PGresult *res, PGrowValue *columns)
+{
+	ctx->count++;
+
+	if (0)
+	printf("column: %.*s\n",
+		   columns[0].len,
+		   columns[0].value);
+}
+
+/* do something with resultset final status */
+static void proc_result(struct Context *ctx, PGresult *r)
+{
+	ExecStatusType s;
+
+	s = PQresultStatus(r);
+	if (s == PGRES_TUPLES_OK)
+		printf("query successful, got %d rows\n", ctx->count);
+	else
+		printf("%s: %s\n", PQresStatus(s), PQerrorMessage(ctx->db));
+	PQclear(r);
+}
+
+/* custom callback */
+static int my_handler(PGresult *res, PGrowValue *columns, void *arg)
+{
+	struct Context *ctx = arg;
+
+	proc_row(ctx, res, columns);
+
+	return 1;
+}
+
+/* this handles socket read event */
+static int socket_read_cb(struct Context *ctx, PGconn *db)
+{
+	PGresult *r;
+
+	/* read incoming data */
+	if (!PQconsumeInput(db))
+		return -1;
+
+	/*
+	 * one query may result in several PGresult's,
+	 * wrap everything in one big loop.
+	 */
+	while (1) {
+		/* need to wait for more data from network */
+		if (PQisBusy(db))
+			return 0;
+
+		/* we have complete PGresult ready */
+		r = PQgetResult(db);
+		if (r == NULL) {
+			/* all results have arrived */
+			return 1;
+		} else {
+			proc_result(ctx, r);
+		}
+	}
+}
+
+/* run query with custom callback */
+static void exec_query(struct Context *ctx, PGconn *db, const char *q)
+{
+	int res;
+	int waitWrite;
+
+	ctx->count = 0;
+
+	/* set up socket */
+	PQsetnonblocking(db, 1);
+
+	PQsetRowProcessor(db, my_handler, ctx);
+
+	/* launch query */
+	if (!PQsendQuery(db, q))
+		die(db, "PQsendQuery");
+
+	/* see if it is sent */
+	res = PQflush(db); // -1:err, 0:ok, 1:more
+	if (res < 0)
+		die(db, "flush 1");
+	waitWrite = res > 0;
+
+	/* read data */
+	while (1) {
+		db_wait(db, waitWrite);
+
+		/* got event, process it */
+		if (waitWrite) {
+			res = PQflush(db); // -1:err, 0:ok, 1:more
+			if (res < 0)
+				die(db, "flush 2");
+			waitWrite = res > 0;
+		} else {
+			res = socket_read_cb(ctx, db);
+			if (res < 0)
+				die(db, "socket_read_cb");
+			if (res > 0)
+				return;
+			waitWrite = 0;
+		}
+	}
+
+	PQsetRowProcessor(ctx->db, NULL, NULL);
+}
+
+int main(int argc, char *argv[])
+{
+	const char *connstr;
+	const char *q;
+	PGconn *db;
+	struct Context ctx;
+
+	connstr = "dbname=postgres";
+	if (argc > 1)
+		connstr = argv[1];
+
+	q = "show all";
+	if (argc > 2)
+		q = argv[2];
+
+	db = PQconnectdb(connstr);
+	if (!db || PQstatus(db) == CONNECTION_BAD)
+		die(db, "connect");
+	ctx.db = db;
+
+	exec_query(&ctx, db, q);
+
+	PQfinish(db);
+
+	return 0;
+}
+
diff --git a/src/test/examples/rowproc-sync.c b/src/test/examples/rowproc-sync.c
new file mode 100644
index 0000000..b29db48
--- /dev/null
+++ b/src/test/examples/rowproc-sync.c
@@ -0,0 +1,80 @@
+/*
+ * Row processor sync demo.
+ *
+ * usage: rowproc-sync [connstr [query]]
+ */
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdarg.h>
+#include <string.h>
+#include <setjmp.h>
+
+#include <libpq-fe.h>
+
+struct Context {
+	PGconn *db;
+	int count;
+};
+
+static void die(PGconn *db, const char *msg)
+{
+	fprintf(stderr, "%s: %s", msg, PQerrorMessage(db));
+	exit(1);
+}
+
+static int my_handler(PGresult *res, PGrowValue *columns, void *arg)
+{
+	struct Context *ctx = arg;
+
+	ctx->count++;
+
+	return 1;
+}
+
+static void exec_query(struct Context *ctx, const char *q)
+{
+	PGresult *r;
+
+	ctx->count = 0;
+	PQsetRowProcessor(ctx->db, my_handler, ctx);
+
+	r = PQexec(ctx->db, q);
+
+	/* check final result */
+	if (!r || PQresultStatus(r) != PGRES_TUPLES_OK)
+		die(ctx->db, "select");
+	else
+		printf("query successful, got %d rows\n", ctx->count);
+	PQclear(r);
+
+	PQsetRowProcessor(ctx->db, NULL, NULL);
+}
+
+
+int main(int argc, char *argv[])
+{
+	const char *connstr;
+	const char *q;
+	struct Context ctx;
+
+	connstr = "dbname=postgres";
+	if (argc > 1)
+		connstr = argv[1];
+
+	q = "show all";
+	if (argc > 2)
+		q = argv[2];
+
+	ctx.db = PQconnectdb(connstr);
+	if (!ctx.db || PQstatus(ctx.db) == CONNECTION_BAD)
+		die(ctx.db, "connect");
+
+	exec_query(&ctx, q);
+
+	PQfinish(ctx.db);
+
+	return 0;
+}
+
#90Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#88)
2 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, This is new version of patch for dblink using row processor.

- Use palloc to allocate temporaly memoriy blocks.

- Memory allocation is now done in once. Preallocate a block of
initial size and palloc simplified reallocation code.

- Resurrected the route for command invoking. And small
adjustment of behavior on error.

- Modification to fix connection name missing bug is removed out
to another patch.

- Commenting on the functions skipped over by lonjmp is
withholded according to Marko's discussion.

- rebased to e8476f46fc847060250c92ec9b310559293087fc

dblink_use_rowproc_20120326.patch - dblink row processor patch.
dblink_connname_20120326.patch - dblink connname fix patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

dblink_connname_20120326.patchtext/x-patch; charset=us-asciiDownload
diff --git b/contrib/dblink/dblink.c a/contrib/dblink/dblink.c
index dd73aa5..b79f0c0 100644
--- b/contrib/dblink/dblink.c
+++ a/contrib/dblink/dblink.c
@@ -733,6 +733,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 			else
 			{
 				DBLINK_GET_CONN;
+				conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 				sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			}
 		}
@@ -763,6 +764,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		else
 			/* shouldn't happen */
 			elog(ERROR, "wrong number of arguments");
+
+		conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	}
 
 	if (!conn)
dblink_use_rowproc_20120326.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..dd73aa5 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char* valbuf;
+	int valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -111,6 +127,9 @@ typedef struct remoteConnHashEnt
 /* initial number of connection hashes */
 #define NUMCONN 16
 
+/* Initial block size for value buffer in storeHandler */
+#define INITBUFLEN 64
+
 /* general utility */
 #define xpfree(var_) \
 	do { \
@@ -503,6 +522,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +579,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +635,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +696,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -660,6 +717,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		{
 			/* text,text,bool */
 			DBLINK_GET_CONN;
+			conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			fail = PG_GETARG_BOOL(2);
 		}
@@ -715,164 +773,229 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
-	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		/* synchronous query, or async result retrieval */
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		/*
+		 * exclude mismatch of the numbers of the colums here so as to
+		 * behave as before.
+		 */
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK &&
+			 !storeinfo.nummismatch))
+		{
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
+
+		/* Set command return status when the query was a command. */
+		if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		{
+			char *values[1];
+			HeapTuple tuple;
+			AttInMetadata *attinmeta;
+			ReturnSetInfo *rcinfo = (ReturnSetInfo*)fcinfo->resultinfo;
+			
+			values[0] = PQcmdStatus(res);
+			attinmeta = TupleDescGetAttInMetadata(rcinfo->setDesc);
+			tuple = BuildTupleFromCStrings(attinmeta, values);
+			tuplestore_puttuple(rcinfo->setResult, tuple);
+		}
+		else if (get_call_result_type(fcinfo, NULL, NULL) == TYPEFUNC_RECORD)
+		{
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+		}
+
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
 	}
 
-	materializeResult(fcinfo, res);
+	if (res)
+		PQclear(res);
+
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+		
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
-
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
+		case TYPEFUNC_COMPOSITE:
+			tupdesc = CreateTupleDescCopy(tupdesc);
+			sinfo->nattrs = tupdesc->natts;
+			break;
+		case TYPEFUNC_RECORD:
 			tupdesc = CreateTemplateTupleDesc(1, false);
 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
 							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
-
-			is_sql_cmd = false;
+			sinfo->nattrs = 1;
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+	/* make sure we have a persistent copy of the tupdesc */
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
-		}
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuflen = INITBUFLEN;
+	sinfo->valbuf = (char *)palloc(sinfo->valbuflen);
+	sinfo->cstrs = (char **)palloc(sinfo->nattrs * sizeof(char *));
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	if (sinfo->valbuf)
+	{
+		pfree(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	if (sinfo->cstrs)
+	{
+		pfree(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+/* Prototype of this function is PQrowProcessor */
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        newbuflen;
+	int        fields = PQnfields(res);
+	int        i;
+	char       **cstrs = sinfo->cstrs;
+	char       *pbuf;
+
+	if (sinfo->error_occurred)
+		return -1;
+
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+		/* This error will be processed in dblink_record_internal() */
+		return -1;
+	}
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
-			}
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+	/*
+     * The length of the buffer for each field is value length + 1 for
+     * zero-termination
+     */
+	newbuflen = fields;
+	for(i = 0 ; i < fields ; i++)
+		newbuflen += columns[i].len;
+
+	if (newbuflen > sinfo->valbuflen)
+	{
+		/*
+		 * Try to (re)allocate in bigger steps to avoid flood of allocations
+		 * on weird data.
+		 */
+		if (newbuflen < sinfo->valbuflen * 2)
+			newbuflen = sinfo->valbuflen * 2;
 
-		PQclear(res);
+		sinfo->valbuf = (char *)repalloc(sinfo->valbuf, newbuflen);
+		sinfo->valbuflen = newbuflen;
 	}
-	PG_CATCH();
+
+	pbuf = sinfo->valbuf;
+	for(i = 0 ; i < fields ; i++)
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			cstrs[i] = pbuf;
+			memcpy(pbuf, columns[i].value, len);
+			pbuf += len;
+			*pbuf++ = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return 1;
 }
 
 /*
#91Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#90)
2 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

I'm sorry to have coded a silly bug.

The previous patch has a bug in realloc size calculation.
And separation of the 'connname patch' was incomplete in regtest.
It is fixed in this patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

dblink_use_rowproc_20120327.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 36a8e3e..4de28ef 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -63,11 +63,23 @@ typedef struct remoteConn
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
 } remoteConn;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	char* valbuf;
+	int valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 /*
  * Internal declarations
  */
 static Datum dblink_record_internal(FunctionCallInfo fcinfo, bool is_async);
-static void materializeResult(FunctionCallInfo fcinfo, PGresult *res);
 static remoteConn *getConnectionByName(const char *name);
 static HTAB *createConnHash(void);
 static void createNewConnection(const char *name, remoteConn *rconn);
@@ -90,6 +102,10 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo);
+static void finishStoreInfo(storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
+
 
 /* Global */
 static remoteConn *pconn = NULL;
@@ -111,6 +127,9 @@ typedef struct remoteConnHashEnt
 /* initial number of connection hashes */
 #define NUMCONN 16
 
+/* Initial block size for value buffer in storeHandler */
+#define INITBUFLEN 64
+
 /* general utility */
 #define xpfree(var_) \
 	do { \
@@ -503,6 +522,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeinfo;
 
 	DBLINK_INIT;
 
@@ -559,15 +579,51 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -579,8 +635,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -640,6 +696,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
+	storeInfo   storeinfo;
 
 	/* check to see if caller supports us returning a tuplestore */
 	if (rsinfo == NULL || !IsA(rsinfo, ReturnSetInfo))
@@ -660,6 +717,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		{
 			/* text,text,bool */
 			DBLINK_GET_CONN;
+			conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			fail = PG_GETARG_BOOL(2);
 		}
@@ -715,164 +773,234 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	rsinfo->setResult = NULL;
 	rsinfo->setDesc = NULL;
 
-	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
-	else
+
+	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec/PQgetResult below
+	 */
+	initStoreInfo(&storeinfo, fcinfo);
+	PQsetRowProcessor(conn, storeHandler, &storeinfo);
+
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		/* synchronous query, or async result retrieval */
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		finishStoreInfo(&storeinfo);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+
+	finishStoreInfo(&storeinfo);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		/*
+		 * exclude mismatch of the numbers of the colums here so as to
+		 * behave as before.
+		 */
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK &&
+			 !storeinfo.nummismatch))
+		{
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
+
+		/* Set command return status when the query was a command. */
+		if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		{
+			char *values[1];
+			HeapTuple tuple;
+			AttInMetadata *attinmeta;
+			ReturnSetInfo *rcinfo = (ReturnSetInfo*)fcinfo->resultinfo;
+			
+			values[0] = PQcmdStatus(res);
+			attinmeta = TupleDescGetAttInMetadata(rcinfo->setDesc);
+			tuple = BuildTupleFromCStrings(attinmeta, values);
+			tuplestore_puttuple(rcinfo->setResult, tuple);
+		}
+		else if (get_call_result_type(fcinfo, NULL, NULL) == TYPEFUNC_RECORD)
+		{
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+		}
+
+		/* finishStoreInfo saves the fields referred to below. */
+		if (storeinfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
 	}
 
-	materializeResult(fcinfo, res);
+	if (res)
+		PQclear(res);
+
 	return (Datum) 0;
 }
 
-/*
- * Materialize the PGresult to return them as the function result.
- * The res will be released in this function.
- */
 static void
-materializeResult(FunctionCallInfo fcinfo, PGresult *res)
+initStoreInfo(storeInfo *sinfo, FunctionCallInfo fcinfo)
 {
 	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc	tupdesc;
 
-	Assert(rsinfo->returnMode == SFRM_Materialize);
-
-	PG_TRY();
+	sinfo->oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
+		
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		TupleDesc	tupdesc;
-		bool		is_sql_cmd = false;
-		int			ntuples;
-		int			nfields;
-
-		if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			is_sql_cmd = true;
-
-			/*
-			 * need a tuple descriptor representing one TEXT column to return
-			 * the command status string as our result tuple
-			 */
+		case TYPEFUNC_COMPOSITE:
+			tupdesc = CreateTupleDescCopy(tupdesc);
+			sinfo->nattrs = tupdesc->natts;
+			break;
+		case TYPEFUNC_RECORD:
 			tupdesc = CreateTemplateTupleDesc(1, false);
 			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
 							   TEXTOID, -1, 0);
-			ntuples = 1;
-			nfields = 1;
-		}
-		else
-		{
-			Assert(PQresultStatus(res) == PGRES_TUPLES_OK);
+			sinfo->nattrs = 1;
+			break;
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
+	}
 
-			is_sql_cmd = false;
+	/* make sure we have a persistent copy of the tupdesc */
 
-			/* get a tuple descriptor for our result type */
-			switch (get_call_result_type(fcinfo, NULL, &tupdesc))
-			{
-				case TYPEFUNC_COMPOSITE:
-					/* success */
-					break;
-				case TYPEFUNC_RECORD:
-					/* failed to determine actual type of RECORD */
-					ereport(ERROR,
-							(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-						errmsg("function returning record called in context "
-							   "that cannot accept type record")));
-					break;
-				default:
-					/* result type isn't composite */
-					elog(ERROR, "return type must be a row type");
-					break;
-			}
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuflen = INITBUFLEN;
+	sinfo->valbuf = (char *)palloc(sinfo->valbuflen);
+	sinfo->cstrs = (char **)palloc(sinfo->nattrs * sizeof(char *));
 
-			/* make sure we have a persistent copy of the tupdesc */
-			tupdesc = CreateTupleDescCopy(tupdesc);
-			ntuples = PQntuples(res);
-			nfields = PQnfields(res);
-		}
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
+}
 
-		/*
-		 * check result and tuple descriptor have the same number of columns
-		 */
-		if (nfields != tupdesc->natts)
-			ereport(ERROR,
-					(errcode(ERRCODE_DATATYPE_MISMATCH),
-					 errmsg("remote query result rowtype does not match "
-							"the specified FROM clause rowtype")));
+static void
+finishStoreInfo(storeInfo *sinfo)
+{
+	if (sinfo->valbuf)
+	{
+		pfree(sinfo->valbuf);
+		sinfo->valbuf = NULL;
+	}
 
-		if (ntuples > 0)
-		{
-			AttInMetadata *attinmeta;
-			Tuplestorestate *tupstore;
-			MemoryContext oldcontext;
-			int			row;
-			char	  **values;
-
-			attinmeta = TupleDescGetAttInMetadata(tupdesc);
-
-			oldcontext = MemoryContextSwitchTo(
-									rsinfo->econtext->ecxt_per_query_memory);
-			tupstore = tuplestore_begin_heap(true, false, work_mem);
-			rsinfo->setResult = tupstore;
-			rsinfo->setDesc = tupdesc;
-			MemoryContextSwitchTo(oldcontext);
+	if (sinfo->cstrs)
+	{
+		pfree(sinfo->cstrs);
+		sinfo->cstrs = NULL;
+	}
 
-			values = (char **) palloc(nfields * sizeof(char *));
+	MemoryContextSwitchTo(sinfo->oldcontext);
+}
 
-			/* put all tuples into the tuplestore */
-			for (row = 0; row < ntuples; row++)
-			{
-				HeapTuple	tuple;
+/* Prototype of this function is PQrowProcessor */
+static int
+storeHandler(PGresult *res, PGrowValue *columns, void *param)
+{
+	storeInfo *sinfo = (storeInfo *)param;
+	HeapTuple  tuple;
+	int        newbuflen;
+	int        fields = PQnfields(res);
+	int        i;
+	char       **cstrs = sinfo->cstrs;
+	char       *pbuf;
+
+	if (sinfo->error_occurred)
+		return -1;
+
+	if (sinfo->nattrs != fields)
+	{
+		sinfo->error_occurred = TRUE;
+		sinfo->nummismatch = TRUE;
+		finishStoreInfo(sinfo);
 
-				if (!is_sql_cmd)
-				{
-					int			i;
+		/* This error will be processed in dblink_record_internal() */
+		return -1;
+	}
 
-					for (i = 0; i < nfields; i++)
-					{
-						if (PQgetisnull(res, row, i))
-							values[i] = NULL;
-						else
-							values[i] = PQgetvalue(res, row, i);
-					}
-				}
-				else
-				{
-					values[0] = PQcmdStatus(res);
-				}
+	/*
+	 * value input functions assumes that the input string is
+	 * terminated by zero. We should make the values to be so.
+	 */
 
-				/* build the tuple and put it into the tuplestore. */
-				tuple = BuildTupleFromCStrings(attinmeta, values);
-				tuplestore_puttuple(tupstore, tuple);
-			}
+	/*
+     * The length of the buffer for each field is value length + 1 for
+     * zero-termination
+     */
+	newbuflen = fields;
+	for(i = 0 ; i < fields ; i++)
+		newbuflen += columns[i].len;
+
+	if (newbuflen > sinfo->valbuflen)
+	{
+		int tmplen = sinfo->valbuflen * 2;
+		/*
+		 * Try to (re)allocate in bigger steps to avoid flood of allocations
+		 * on weird data.
+		 */
+		while (newbuflen > tmplen && tmplen >= 0)
+			tmplen *= 2;
 
-			/* clean up and return the tuplestore */
-			tuplestore_donestoring(tupstore);
-		}
+		/* Check if the integer was wrap-rounded. */
+		if (tmplen < 0)
+			elog(ERROR, "Buffer size for one row exceeds integer limit");
 
-		PQclear(res);
+		sinfo->valbuf = (char *)repalloc(sinfo->valbuf, tmplen);
+		sinfo->valbuflen = tmplen;
 	}
-	PG_CATCH();
+
+	pbuf = sinfo->valbuf;
+	for(i = 0 ; i < fields ; i++)
 	{
-		/* be sure to release the libpq result */
-		PQclear(res);
-		PG_RE_THROW();
+		int len = columns[i].len;
+		if (len < 0)
+			cstrs[i] = NULL;
+		else
+		{
+			cstrs[i] = pbuf;
+			memcpy(pbuf, columns[i].value, len);
+			pbuf += len;
+			*pbuf++ = '\0';
+		}
 	}
-	PG_END_TRY();
+
+	/*
+	 * These functions may throw exception. It will be caught in
+	 * dblink_record_internal()
+	 */
+	tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	return 1;
 }
 
 /*
dblink_connname_20120327.patchtext/x-patch; charset=us-asciiDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 4de28ef..05d7e98 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -733,6 +733,7 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 			else
 			{
 				DBLINK_GET_CONN;
+				conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 				sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
 			}
 		}
@@ -763,6 +764,8 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 		else
 			/* shouldn't happen */
 			elog(ERROR, "wrong number of arguments");
+
+		conname = text_to_cstring(PG_GETARG_TEXT_PP(0));
 	}
 
 	if (!conn)
diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index 511dd5e..2dcba15 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -371,7 +371,7 @@ SELECT *
 FROM dblink('myconn','SELECT * FROM foobar',false) AS t(a int, b text, c text[])
 WHERE t.a > 7;
 NOTICE:  relation "foobar" does not exist
-CONTEXT:  Error occurred on dblink connection named "unnamed": could not execute query.
+CONTEXT:  Error occurred on dblink connection named "myconn": could not execute query.
  a | b | c 
 ---+---+---
 (0 rows)
#92Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#89)
Re: Speed dblink using alternate libpq tuple storage

On Sat, Mar 24, 2012 at 02:22:24AM +0200, Marko Kreen wrote:

Main advantage of including PQgetRow() together with low-level
rowproc API is that it allows de-emphasizing more complex parts of
rowproc API (exceptions, early exit, skipresult, custom error msg).
And drop/undocument them or simply call them postgres-internal-only.

I thought more about exceptions and PQgetRow and found
interesting pattern:

- Exceptions are problematic if always allowed. Although PQexec() is
easy to fix in current code, trying to keep to promise of "exceptions
are allowed from everywhere" adds non-trivial maintainability overhead
to future libpq changes, so instead we should simply fix documentation.
Especially as I cannot see any usage scenario that would benefit from
such promise.

- Multiple SELECTs from PQexec() are problematic even without
exceptions: additional documentation is needed how to detect
that rows are coming from new statement.

Now the interesting one:

- PQregisterProcessor() API allows changing the callback permanently.
Thus breaking any user code which simply calls PQexec()
and expects regular PGresult back. Again, nothing to fix
code-wise, need to document that callback should be set
only for current query, later changed back.

My conclusion is that row-processor API is low-level expert API and
quite easy to misuse. It would be preferable to have something more
robust as end-user API, the PQgetRow() is my suggestion for that.
Thus I see 3 choices:

1) Push row-processor as main API anyway and describe all dangerous
scenarios in documentation.
2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,
PQgetRow() as preferred API and row-processor for expert usage,
with proper documentation what works and what does not.
3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.

I guess this needs committer decision which way to go?

Second conclusion is that current dblink row-processor usage is broken
when user uses multiple SELECTs in SQL as dblink uses plain PQexec().
Simplest fix would be to use PQexecParams() instead, but if keeping old
behaviour is important, then dblink needs to emulate PQexec() resultset
behaviour with row-processor or PQgetRow().

--
marko

#93Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro HORIGUCHI (#91)
Re: Speed dblink using alternate libpq tuple storage

Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> writes:

I'm sorry to have coded a silly bug.
The previous patch has a bug in realloc size calculation.
And separation of the 'connname patch' was incomplete in regtest.
It is fixed in this patch.

I've applied a modified form of the conname update patch. It seemed to
me that the fault is really in the DBLINK_GET_CONN and
DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting
the surrounding function's conname variable along with conn, rconn, etc.
There was actually a second error of the same type visible in the dblink
regression test, which is also fixed by this more general method.

regards, tom lane

#94Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#92)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

My conclusion is that row-processor API is low-level expert API and
quite easy to misuse. It would be preferable to have something more
robust as end-user API, the PQgetRow() is my suggestion for that.
Thus I see 3 choices:

1) Push row-processor as main API anyway and describe all dangerous
scenarios in documentation.
2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,
PQgetRow() as preferred API and row-processor for expert usage,
with proper documentation what works and what does not.
3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.

I still am failing to see the use-case for PQgetRow. ISTM the entire
point of a special row processor is to reduce the per-row processing
overhead, but PQgetRow greatly increases that overhead. And it doesn't
reduce complexity much either IMO: you still have all the primary risk
factors arising from processing rows in advance of being sure that the
whole query completed successfully. Plus it conflates "no more data"
with "there was an error receiving the data" or "there was an error on
the server side". PQrecvRow alleviates the per-row-overhead aspect of
that but doesn't really do a thing from the complexity standpoint;
it doesn't look to me to be noticeably easier to use than a row
processor callback.

I think PQgetRow and PQrecvRow just add more API calls without making
any fundamental improvements, and so we could do without them. "There's
more than one way to do it" is not necessarily a virtue.

Second conclusion is that current dblink row-processor usage is broken
when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

Yeah. Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that "this is a new resultset".
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived. So maybe we could express that as a callback
with valid "res" but "columns" set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0. This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.

regards, tom lane

#95Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Tom Lane (#93)
Re: Speed dblink using alternate libpq tuple storage

I've applied a modified form of the conname update patch. It seemed to
me that the fault is really in the DBLINK_GET_CONN and
DBLINK_GET_NAMED_CONN macros, which ought to be responsible for setting
the surrounding function's conname variable along with conn, rconn, etc.
There was actually a second error of the same type visible in the dblink
regression test, which is also fixed by this more general method.

Come to think of it, the patch is mere a verifying touch for the
bug mingled with the other part of the dblink patch to all
appearances. I totally agree with you. It should be dropped for
this time and done another time.

I'am sorry for bothering you by such a damn thing.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#96Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#94)
Re: Speed dblink using alternate libpq tuple storage

On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

My conclusion is that row-processor API is low-level expert API and
quite easy to misuse. It would be preferable to have something more
robust as end-user API, the PQgetRow() is my suggestion for that.
Thus I see 3 choices:

1) Push row-processor as main API anyway and describe all dangerous
scenarios in documentation.
2) Have both PQgetRow() and row-processor available in <libpq-fe.h>,
PQgetRow() as preferred API and row-processor for expert usage,
with proper documentation what works and what does not.
3) Have PQgetRow() in <libpq-fe.h>, move row-processor to <libpq-int.h>.

I still am failing to see the use-case for PQgetRow. ISTM the entire
point of a special row processor is to reduce the per-row processing
overhead, but PQgetRow greatly increases that overhead.

No, decreasing CPU overhead is minor win. I guess in realistic
application, like dblink, you can't even measure the difference.

The *major* win comes from avoiding buffering of all rows in PGresult.
Ofcourse, this is noticeable only with bigger resultsets.
I guess such buffering pessimizes memory usage: code always
works on cold cache. And simply keeping RSS low is good for
long-term health of a process.

Second major win is avoiding the need to use cursor with small chunks
to access resultset of unknown size. Thus stalling application
until next block arrives from network.

The "PGresult *PQgetRow()" is for applications that do not convert
rows immediately to some internal format, but keep using PGresult.
So they can be converted to row-by-row processing with minimal
changes to actual code.

Note that the PGrowValue is temporary struct that application *must*
move data away from. If app internally uses PGresult, then it's
pretty annoying to invent a new internal format for long-term
storage.

But maybe I'm overestimating the number of such applications.

And it doesn't
reduce complexity much either IMO: you still have all the primary risk
factors arising from processing rows in advance of being sure that the
whole query completed successfully.

It avoids the complexity of:

* How to pass error from callback to upper code

* Needing to know how exceptions behave

* How to use early exit to pass rows to upper code one-by-one,
(by storing the PGresult and PGrowValue in temp place
and later checking their values)

* How to detect that new resultset has started. (keeping track
of previous PGresult or noting some quirky API behaviour
we may invent for such case)

* Needing to make sure the callback does not leak to call-sites
that expect regular libpq behaviour.
("Always call PQregisterRowProcessor(db, NULL, NULL) after query finishes" )
["But now I'm in exception handler, how do I find the connection?"]

I've now reviewed the callback code and even done some coding with it
and IMHO it's too low-level to be end-user-API.

Yes, the "query-may-still-fail" complexity remains, but thats not unlike
the usual "multi-statement-transaction-is-not-guaranteed-to-succeed"
complexity.

Another compexity that remains is "how-to-skip-current-resultset",
but that is a problem only on sync connections and the answer is
simple - "call PQgetResult()". Or "call PQgetRow/PQrecvRow" if
user wants to avoid buffering.

Plus it conflates "no more data"
with "there was an error receiving the data" or "there was an error on
the server side".

Well, current PQgetRow() is written with style: "return only single-row
PGresult, to see errors user must call PQgetResult()". Basically
so that user it forced to fall back familiar libpq usage pattern.

It can be changed, so that PQgetRow() returns also errors.

Or we can drop it and just keep PQrecvRow().

PQrecvRow alleviates the per-row-overhead aspect of
that but doesn't really do a thing from the complexity standpoint;
it doesn't look to me to be noticeably easier to use than a row
processor callback.

I think PQgetRow and PQrecvRow just add more API calls without making
any fundamental improvements, and so we could do without them. "There's
more than one way to do it" is not necessarily a virtue.

Please re-read the above list of problematic situations that this API
fixes. Then, if you still think that PQrecvRow() is pointless, sure,
let's drop it.

We can also postpone it to 9.3, to poll users whether they want
easier API, or is maximum performance important. (PQrecvRow()
*does* have few cycles of overhead compared to callbacks.)

Only option that we have on the table for 9.2 but not later
is moving the callback API to <libpq-int.h>.

Second conclusion is that current dblink row-processor usage is broken
when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

Yeah. Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that "this is a new resultset".
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived. So maybe we could express that as a callback
with valid "res" but "columns" set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0. This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.

Try to imagine how final documentation will look like.

Then imagine documentation for PGrecvRow() / PQgetRow().

--
marko

#97Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#96)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

Second conclusion is that current dblink row-processor usage is broken
when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

Yeah. Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that "this is a new resultset".
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived. So maybe we could express that as a callback
with valid "res" but "columns" set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0. This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.

Try to imagine how final documentation will look like.

Then imagine documentation for PGrecvRow() / PQgetRow().

What's your point, exactly? PGrecvRow() / PQgetRow() aren't going to
make that any better as currently defined, because there's noplace to
indicate "this is a new resultset" in those APIs either.

regards, tom lane

#98Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#97)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Mar 30, 2012 at 11:59:12AM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

Second conclusion is that current dblink row-processor usage is broken
when user uses multiple SELECTs in SQL as dblink uses plain PQexec().

Yeah. Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that "this is a new resultset".
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived. So maybe we could express that as a callback
with valid "res" but "columns" set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0. This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.

Try to imagine how final documentation will look like.

Then imagine documentation for PGrecvRow() / PQgetRow().

What's your point, exactly? PGrecvRow() / PQgetRow() aren't going to
make that any better as currently defined, because there's noplace to
indicate "this is a new resultset" in those APIs either.

Have you looked at the examples? PQgetResult() is pretty good hint
that one resultset finished...

--
marko

#99Marko Kreen
markokr@gmail.com
In reply to: Marko Kreen (#98)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Mar 30, 2012 at 7:04 PM, Marko Kreen <markokr@gmail.com> wrote:

Have you looked at the examples?  PQgetResult() is pretty good hint
that one resultset finished...

Ok, the demos are around this long thread and hard to find,
so here is a summary of links:

Original design mail:

http://archives.postgresql.org/message-id/20120224154616.GA16985@gmail.com

First patch with quick demos:

http://archives.postgresql.org/message-id/20120226221922.GA6981@gmail.com

Demos as diff:

http://archives.postgresql.org/message-id/20120324002224.GA19635@gmail.com

Demos/experiments/tests (bit messier than the demos-as-diffs):

https://github.com/markokr/libpq-rowproc-demos

Note - the point is that user *must* call PQgetResult() when resultset ends.
Thus also the "PQgetRow() does not return errors" decision.

I'll put this mail into commitfest page too, seems I've forgotten to
put some mails there.

--
marko

#100Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#85)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote:

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then. This allows internal pqAddRow
to set regular "out of memory" error. Otherwise give generic
"row processor error".

Current implement seems already doing this in
parseInput3(). Could you give me further explanation?

The suggestion was about getAnotherTuple() - currently it sets
always "error in row processor". With such check, the callback
can set the error result itself. Currently only callbacks that
live inside libpq can set errors, but if we happen to expose
error-setting function in outside API, then the getAnotherTuple()
would already be ready for it.

I'm pretty dissatisfied with the error reporting situation for row
processors. You can't just decide not to solve it, which seems to be
the current state of affairs. What I'm inclined to do is to add a
"char **" parameter to the row processor, and say that when the
processor returns -1 it can store an error message string there.
If it does so, that's what we report. If it doesn't (which we'd detect
by presetting the value to NULL), then use a generic "error in row
processor" message. This is cheap and doesn't prevent the row processor
from using some application-specific error reporting method if it wants;
but it does allow the processor to make use of libpq's error mechanism
when that's preferable.

regards, tom lane

#101Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#100)
Re: Speed dblink using alternate libpq tuple storage

On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

On Wed, Mar 07, 2012 at 03:14:57PM +0900, Kyotaro HORIGUCHI wrote:

My suggestion - check in getAnotherTuple whether resultStatus is
already error and do nothing then. This allows internal pqAddRow
to set regular "out of memory" error. Otherwise give generic
"row processor error".

Current implement seems already doing this in
parseInput3(). Could you give me further explanation?

The suggestion was about getAnotherTuple() - currently it sets
always "error in row processor". With such check, the callback
can set the error result itself. Currently only callbacks that
live inside libpq can set errors, but if we happen to expose
error-setting function in outside API, then the getAnotherTuple()
would already be ready for it.

I'm pretty dissatisfied with the error reporting situation for row
processors. You can't just decide not to solve it, which seems to be
the current state of affairs. What I'm inclined to do is to add a
"char **" parameter to the row processor, and say that when the
processor returns -1 it can store an error message string there.
If it does so, that's what we report. If it doesn't (which we'd detect
by presetting the value to NULL), then use a generic "error in row
processor" message. This is cheap and doesn't prevent the row processor
from using some application-specific error reporting method if it wants;
but it does allow the processor to make use of libpq's error mechanism
when that's preferable.

Yeah.

But such API seems to require specifying allocator, which seems ugly.
I think it would be better to just use Kyotaro's original idea
of PQsetRowProcessorError() which nicer to use.

Few thoughts on the issue:

----------

As libpq already provides quite good coverage of PGresult
manipulation APIs, then how about:

void PQsetResultError(PGresult *res, const char *msg);

that does:

res->errMsg = pqResultStrdup(msg);
res->resultStatus = PGRES_FATAL_ERROR;

that would also cause minimal fuss in getAnotherTuple().

------------

I would actually like even more:

void PQsetConnectionError(PGconn *conn, const char *msg);

that does full-blown libpq error logic. Thus it would be
useful everywherewhere in libpq. But it seems bit too disruptive,
so I would like a ACK from a somebody who knows libpq better.
(well, from you...)

-----------

Another thought - if we have API to set error from *outside*
of row-processor callback, that would immediately solve the
"how to skip incoming resultset without buffering it" problem.

And it would be usable for PQgetRow()/PQrecvRow() too.

--
marko

#102Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#101)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote:

I'm pretty dissatisfied with the error reporting situation for row
processors. You can't just decide not to solve it, which seems to be
the current state of affairs. What I'm inclined to do is to add a
"char **" parameter to the row processor, and say that when the
processor returns -1 it can store an error message string there.

But such API seems to require specifying allocator, which seems ugly.

Not if the message is a constant string, which seems like the typical
situation (think "out of memory"). If the row processor does need a
buffer for a constructed string, it could make use of some space in its
"void *param" area, for instance.

I think it would be better to just use Kyotaro's original idea
of PQsetRowProcessorError() which nicer to use.

I don't particularly care for that idea because it opens up all sorts of
potential issues when such a function is called at the wrong time.
Moreover, you have to remember that the typical situation here is that
we're going to be out of memory or otherwise in trouble, which means
you've got to be really circumspect about what you assume will work.
Row processors that think they can do a lot of fancy message
construction should be discouraged, and an API that requires
construction of a new PGresult in order to return an error is right out.
(This is why getAnotherTuple is careful to clear the failed result
before it tries to build a new one. But that trick isn't going to be
available to an external row processor.)

regards, tom lane

#103Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#102)
Re: Speed dblink using alternate libpq tuple storage

On Sat, Mar 31, 2012 at 1:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Marko Kreen <markokr@gmail.com> writes:

On Fri, Mar 30, 2012 at 05:18:42PM -0400, Tom Lane wrote:

I'm pretty dissatisfied with the error reporting situation for row
processors.  You can't just decide not to solve it, which seems to be
the current state of affairs.  What I'm inclined to do is to add a
"char **" parameter to the row processor, and say that when the
processor returns -1 it can store an error message string there.

But such API seems to require specifying allocator, which seems ugly.

Not if the message is a constant string, which seems like the typical
situation (think "out of memory").  If the row processor does need a
buffer for a constructed string, it could make use of some space in its
"void *param" area, for instance.

If it's specified as string that libpq does not own, then I'm fine with it.

I think it would be better to just use Kyotaro's original idea
of PQsetRowProcessorError() which nicer to use.

I don't particularly care for that idea because it opens up all sorts of
potential issues when such a function is called at the wrong time.
Moreover, you have to remember that the typical situation here is that
we're going to be out of memory or otherwise in trouble, which means
you've got to be really circumspect about what you assume will work.
Row processors that think they can do a lot of fancy message
construction should be discouraged, and an API that requires
construction of a new PGresult in order to return an error is right out.
(This is why getAnotherTuple is careful to clear the failed result
before it tries to build a new one.  But that trick isn't going to be
available to an external row processor.)

Kyotaro's original idea was to assume out-of-memory if error
string was not set, thus the callback needed to set the string
only when it really had something to say.

--
marko

#104Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#103)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

On Sat, Mar 31, 2012 at 1:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Not if the message is a constant string, which seems like the typical
situation (think "out of memory"). �If the row processor does need a
buffer for a constructed string, it could make use of some space in its
"void *param" area, for instance.

If it's specified as string that libpq does not own, then I'm fine with it.

Check. Let's make it "const char **" in fact, just to be clear on that.

(This is why getAnotherTuple is careful to clear the failed result
before it tries to build a new one. �But that trick isn't going to be
available to an external row processor.)

Kyotaro's original idea was to assume out-of-memory if error
string was not set, thus the callback needed to set the string
only when it really had something to say.

Hmm. We could still do that in conjunction with the idea of returning
the string from the row processor, but I'm not sure if it's useful or
just overly cute.

[ thinks... ] A small advantage of assuming NULL means that is that
we could postpone the libpq_gettext("out of memory") call until after
clearing the overflowed PGresult, which would greatly improve the odds
of getting a nicely translated result and not just ASCII. Might be
worth it just for that.

regards, tom lane

#105Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#96)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

On Thu, Mar 29, 2012 at 06:56:30PM -0400, Tom Lane wrote:

Yeah. Perhaps we should tweak the row-processor callback API so that
it gets an explicit notification that "this is a new resultset".
Duplicating PQexec's behavior would then involve having the dblink row
processor throw away any existing tuplestore and start over when it
gets such a call.

There's multiple ways to express that but the most convenient thing
from libpq's viewpoint, I think, is to have a callback that occurs
immediately after collecting a RowDescription message, before any
rows have arrived. So maybe we could express that as a callback
with valid "res" but "columns" set to NULL?

A different approach would be to add a row counter to the arguments
provided to the row processor; then you'd know a new resultset had
started if you saw rowcounter == 0. This might have another advantage
of not requiring the row processor to count the rows for itself, which
I think many row processors would otherwise have to do.

I had been leaning towards the second approach with a row counter,
because it seemed cleaner, but I thought of another consideration that
makes the first way seem better. Suppose that your row processor has to
do some setup work at the start of a result set, and that work needs to
see the resultset properties (eg number of columns) so it can't be done
before starting PQgetResult. In the patch as submitted, the
only way to manage that is to keep enough state to recognize that the
current row processor call is the first one, which we realized is
inadequate for multiple-result-set cases. With a row counter argument
you can do the setup whenever rowcount == 0, which fixes that. But
neither of these methods deals correctly with an empty result set!
To make that work, you need to add extra logic after the PQgetResult
call to do the setup work the row processor should have done but never
got a chance to. So that's ugly, and it makes for an easy-to-miss bug.
A call that occurs when we receive RowDescription, independently of
whether the result set contains any rows, makes this a lot cleaner.

regards, tom lane

#106Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#105)
Re: Speed dblink using alternate libpq tuple storage

I've been thinking some more about the early-termination cases (where
the row processor returns zero or longjmps), and I believe I have got
proposals that smooth off most of the rough edges there.

First off, returning zero is really pretty unsafe as it stands, because
it only works more-or-less-sanely if the connection is being used in
async style. If the row processor returns zero within a regular
PQgetResult call, that will cause PQgetResult to block waiting for more
input. Which might not be forthcoming, if we're in the last bufferload
of a query response from the server. Even in the async case, I think
it's a bad design to have PQisBusy return true when the row processor
requested stoppage. In that situation, there is work available for the
outer application code to do, whereas normally PQisBusy == true means
we're still waiting for the server.

I think we can fix this by introducing a new PQresultStatus, called say
PGRES_SUSPENDED, and having PQgetResult return an empty PGresult with
status PGRES_SUSPENDED after the row processor has returned zero.
Internally, there'd also be a new asyncStatus PGASYNC_SUSPENDED,
which we'd set before exiting from the getAnotherTuple call. This would
cause PQisBusy and PQgetResult to do the right things. In PQgetResult,
we'd switch back to PGASYNC_BUSY state after producing a PGRES_SUSPENDED
result, so that subsequent calls would resume parsing input.

With this design, a suspending row processor can be used safely in
either async or non-async mode. It does cost an extra PGresult creation
and deletion per cycle, but that's not much more than a malloc and free.

Also, we can document that a longjmp out of the row processor leaves the
library in the same state as if the row processor had returned zero and
a PGRES_SUSPENDED result had been returned to the application; which
will be a true statement in all cases, sync or async.

I also mentioned earlier that I wasn't too thrilled with the design of
PQskipResult; in particular that it would encourage application writers
to miss server-sent error results, which would inevitably be a bad idea.
I think what we ought to do is define (and implement) it as being
exactly equivalent to PQgetResult, except that it temporarily installs
a dummy row processor so that data rows are discarded rather than
accumulated. Then, the documented way to clean up after deciding to
abandon a suspended query will be to do PQskipResult until it returns
null, paying normal attention to any result statuses other than
PGRES_TUPLES_OK. This is still not terribly helpful for async-mode
applications, but what they'd probably end up doing is installing their
own dummy row processors and then flushing results as part of their
normal outer loop. The only thing we could do for them is to expose
a dummy row processor, which seems barely worthwhile given that it's
a one-line function.

I remain of the opinion that PQgetRow/PQrecvRow aren't adding much
usability-wise.

regards, tom lane

#107Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#106)
Re: Speed dblink using alternate libpq tuple storage

On Sun, Apr 01, 2012 at 05:51:19PM -0400, Tom Lane wrote:

I've been thinking some more about the early-termination cases (where
the row processor returns zero or longjmps), and I believe I have got
proposals that smooth off most of the rough edges there.

First off, returning zero is really pretty unsafe as it stands, because
it only works more-or-less-sanely if the connection is being used in
async style. If the row processor returns zero within a regular
PQgetResult call, that will cause PQgetResult to block waiting for more
input. Which might not be forthcoming, if we're in the last bufferload
of a query response from the server. Even in the async case, I think
it's a bad design to have PQisBusy return true when the row processor
requested stoppage. In that situation, there is work available for the
outer application code to do, whereas normally PQisBusy == true means
we're still waiting for the server.

I think we can fix this by introducing a new PQresultStatus, called say
PGRES_SUSPENDED, and having PQgetResult return an empty PGresult with
status PGRES_SUSPENDED after the row processor has returned zero.
Internally, there'd also be a new asyncStatus PGASYNC_SUSPENDED,
which we'd set before exiting from the getAnotherTuple call. This would
cause PQisBusy and PQgetResult to do the right things. In PQgetResult,
we'd switch back to PGASYNC_BUSY state after producing a PGRES_SUSPENDED
result, so that subsequent calls would resume parsing input.

With this design, a suspending row processor can be used safely in
either async or non-async mode. It does cost an extra PGresult creation
and deletion per cycle, but that's not much more than a malloc and free.

I added extra magic to PQisBusy(), you are adding extra magic to
PQgetResult(). Not much difference.

Seems we both lost sight of actual usage scenario for the early-exit
logic - that both callback and upper-level code *must* cooperate
for it to be useful. Instead, we designed API for non-cooperating case,
which is wrong.

So the proper approach would be to have new API call, designed to
handle it, and allow early-exit only from there.

That would also avoid any breakage of old APIs. Also it would avoid
any accidental data loss, if the user code does not have exactly
right sequence of calls.

How about PQisBusy2(), which returns '2' when early-exit is requested?
Please suggest something better...

--
marko

#108Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#107)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

Seems we both lost sight of actual usage scenario for the early-exit
logic - that both callback and upper-level code *must* cooperate
for it to be useful. Instead, we designed API for non-cooperating case,
which is wrong.

Exactly. So you need an extra result state, or something isomorphic.

So the proper approach would be to have new API call, designed to
handle it, and allow early-exit only from there.

That would also avoid any breakage of old APIs. Also it would avoid
any accidental data loss, if the user code does not have exactly
right sequence of calls.

How about PQisBusy2(), which returns '2' when early-exit is requested?
Please suggest something better...

My proposal is way better than that. You apparently aren't absorbing my
point, which is that making this behavior unusable with every existing
API (whether intentionally or by oversight) isn't an improvement.
The row processor needs to be able to do this *without* assuming a
particular usage style, and most particularly it should not force people
to use async mode.

An alternative that I'd prefer to that one is to get rid of the
suspension return mode altogether. However, that leaves us needing
to document what it means to longjmp out of a row processor without
having any comparable API concept, so I don't really find it better.

regards, tom lane

#109Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#108)
Re: Speed dblink using alternate libpq tuple storage

I've whacked the libpq part of this patch around to the point where I'm
reasonably satisfied with it (attached), and am now looking at the
dblink part. I soon realized that there's a rather nasty issue with the
dblink patch, which is that it fundamentally doesn't work for async
operations. In an async setup what you would do is dblink_send_query(),
then periodically poll with dblink_is_busy(), then when it says the
query is done, collect the results with dblink_get_result(). The
trouble with this is that PQisBusy will invoke the standard row
processor, so by the time dblink_get_result runs it's way too late to
switch row processors.

I thought about fixing that by installing dblink's custom row processor
permanently, but that doesn't really work because we don't know the
expected tuple column datatypes until we see the call environment for
dblink_get_result().

A hack on top of that hack would be to collect the data into a
tuplestore that contains all text columns, and then convert to the
correct rowtype during dblink_get_result, but that seems rather ugly
and not terribly high-performance.

What I'm currently thinking we should do is just use the old method
for async queries, and only optimize the synchronous case.

I thought for awhile that this might represent a generic deficiency
in the whole concept of a row processor, but probably it's mostly
down to dblink's rather bizarre API. It would be unusual I think for
people to want a row processor that couldn't know what to do until
after the entire query result is received.

regards, tom lane

#110Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#108)
Re: Speed dblink using alternate libpq tuple storage

On Sun, Apr 01, 2012 at 07:23:06PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

So the proper approach would be to have new API call, designed to
handle it, and allow early-exit only from there.

That would also avoid any breakage of old APIs. Also it would avoid
any accidental data loss, if the user code does not have exactly
right sequence of calls.

How about PQisBusy2(), which returns '2' when early-exit is requested?
Please suggest something better...

My proposal is way better than that. You apparently aren't absorbing my
point, which is that making this behavior unusable with every existing
API (whether intentionally or by oversight) isn't an improvement.
The row processor needs to be able to do this *without* assuming a
particular usage style,

I don't get what kind of usage scenario you think of when you
"early-exit without assuming anything about upper-level usage."

Could you show example code where it is useful?

The fact remains that upper-level code must cooperate with callback.
Why is it useful to hijack PQgetResult() to do so? Especially as the
PGresult it returns is not useful in any way and the callback still needs
to use side channel to pass actual values to upper level.

IMHO it's much better to remove the concept of early-exit from public
API completely and instead give "get" style API that does the early-exit
internally. See below for example.

and most particularly it should not force people
to use async mode.

Seems our concept of "async mode" is different. I associate
PQisnonblocking() with it. And old code, eg. PQrecvRow()
works fine in both modes.

An alternative that I'd prefer to that one is to get rid of the
suspension return mode altogether.

That might be good idea. I would prefer to postpone controversial
features instead having hurried design for them.

Also - is there really need to make callback API ready for *all*
potential usage scenarios? IMHO it would be better to make sure
it's good enough that higher-level and easier to use APIs can
be built on top of it. And thats it.

However, that leaves us needing
to document what it means to longjmp out of a row processor without
having any comparable API concept, so I don't really find it better.

Why do you think any new API concept is needed? Why is following rule
not enough (for sync connection):

"After exception you need to call PQgetResult() / PQfinish()".

Only thing that needs fixing is storing lastResult under PGconn
to support exceptions for PQexec(). (If we need to support
exceptions everywhere.)

---------------

Again, note that if we would provide PQrecvRow()-style API, then we *don't*
need early-exit in callback API. Nor exceptions...

If current PQrecvRow() / PQgetRow() are too bloated for you, then how about
thin wrapper around PQisBusy():

/* 0 - need more data, 1 - have result, 2 - have row */
int PQhasRowOrResult(PGconn *conn, PGresult **hdr, PGrowValue **cols)
{
int gotrow = 0;
PQrowProcessor oldproc;
void *oldarg;
int busy;

/* call PQisBusy with our callback */
oldproc = PQgetRowProcessor(conn, &oldarg);
PQsetRowProcessor(conn, hasrow_cb, &flag);
busy = PQisBusy(conn);
PQsetRowProcessor(conn, oldproc, oldarg);

if (gotrow)
{
*hdr = conn->result;
*cols = conn->rowBuf;
return 2;
}
return busy ? 0 : 1;
}

static int hasrow_cb(PGresult *res, PGrowValue *columns, void *param)
{
int *gotrow = param;
*gotrow = 1;
return 0;
}

Also, instead hasrow_cb(), we could have integer flag under PGconn that
getAnotherTuple() checks, thus getting rid if it even in internal
callback API.

Yes, it requires async-style usage pattern, but works for both
sync and async connections. And it can be used to build even
simpler API on top of it.

Summary: it would be better to keep "early-exit" internal detail,
as the actual feature needed is processing rows outside of callback.
So why not provide API for it?

--
marko

#111Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#110)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

The fact remains that upper-level code must cooperate with callback.
Why is it useful to hijack PQgetResult() to do so?

Because that's the defined communications channel. We're not
"hijacking" it. If we're going to start using pejorative words here,
I will assert that your proposal hijacks PQisBusy to make it do
something substantially different than the traditional understanding
of it (more about that below).

Seems our concept of "async mode" is different. I associate
PQisnonblocking() with it.

Well, there are really four levels to the API design:

* Plain old PQexec.
* Break down PQexec into PQsendQuery and PQgetResult.
* Avoid waiting in PQgetResult by testing PQisBusy.
* Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
on socket writes) by using PQisnonblocking.

Any given app might choose to run at any one of those four levels,
although the first one probably isn't interesting for an app that would
care about using a suspending row processor. But I see no reason that
we should design the suspension feature such that it doesn't work at the
second level. PQisBusy is, and should be, an optional state-testing
function, not something that changes the set of things you can do with
the connection.

IMHO it's much better to remove the concept of early-exit from public
API completely and instead give "get" style API that does the early-exit
internally.

If the row processor has an early-exit option, it hardly seems
reasonable to me to claim that that's not part of the public API.

In particular, I flat out will not accept a design in which that option
doesn't work unless the current call came via PQisBusy, much less some
entirely new call like PQhasRowOrResult. It's unusably fragile (ie,
timing sensitive) if that has to be true.

There's another way in which I think your proposal breaks the existing
API, which is that without an internal PQASYNC_SUSPENDED state that is
cleared by PQgetResult, it's unsafe to probe PQisBusy multiple times
before doing something useful. That shouldn't be the case. PQisBusy
is supposed to test whether data is ready for the application to do
something with --- it should not throw away data until a separate call
has been made to consume the data. Changing its behavior like that
would risk creating subtle bugs in existing event-loop logic. A
possibly useful analogy is that select() and poll() don't clear a
socket's read-ready state, you have to do a separate read() to do that.
There are good reasons for that separation of actions.

Again, note that if we would provide PQrecvRow()-style API, then we *don't*
need early-exit in callback API. Nor exceptions...

AFAICS, we *do* need exceptions for dblink's usage. So most of what's
at stake here is not avoidable, unless you're proposing we put this
whole set of patches off till 9.3 so we can think it over some more.

regards, tom lane

#112Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#111)
Re: Speed dblink using alternate libpq tuple storage

On Tue, Apr 03, 2012 at 05:32:25PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

The fact remains that upper-level code must cooperate with callback.
Why is it useful to hijack PQgetResult() to do so?

Because that's the defined communications channel. We're not
"hijacking" it. If we're going to start using pejorative words here,
I will assert that your proposal hijacks PQisBusy to make it do
something substantially different than the traditional understanding
of it (more about that below).

And I would agree with it - and I already did:

Seems we both lost sight of actual usage scenario for the early-exit
logic - that both callback and upper-level code *must* cooperate
for it to be useful. Instead, we designed API for non-cooperating case,
which is wrong.

Seems our concept of "async mode" is different. I associate
PQisnonblocking() with it.

Well, there are really four levels to the API design:

* Plain old PQexec.
* Break down PQexec into PQsendQuery and PQgetResult.
* Avoid waiting in PQgetResult by testing PQisBusy.
* Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
on socket writes) by using PQisnonblocking.

Any given app might choose to run at any one of those four levels,
although the first one probably isn't interesting for an app that would
care about using a suspending row processor. But I see no reason that
we should design the suspension feature such that it doesn't work at the
second level. PQisBusy is, and should be, an optional state-testing
function, not something that changes the set of things you can do with
the connection.

Thats actually nice overview. I think our basic disagreement comes
from how we map the early-exit into those modes.

I want to think of the early-exit row-processing as 5th and 6th modes:

* Row-by-row processing on sync connection (PQsendQuery() + ???)
* Row-by-row processing on async connection (PQsendQuery() + ???)

But instead you want work with almost no changes on existing modes.

And I don't like it because as I've said it previously, the upper
level must know about callback and handle it properly, so it does
not make sense keep existing loop structure in stone.

Instead we should design the new mode that is logical for user
and also logical inside libpq.

IMHO it's much better to remove the concept of early-exit from public
API completely and instead give "get" style API that does the early-exit
internally.

If the row processor has an early-exit option, it hardly seems
reasonable to me to claim that that's not part of the public API.

Please keep in mind the final goal - nobody is interested in
"early-exit" on it's own, the interesting goal is processing rows
outside of libpq.

And the early-exit return code is clumsy hack to make it possible
with callbacks.

But why should we provide complex way of achieving something,
when we can provide easy and direct way?

In particular, I flat out will not accept a design in which that option
doesn't work unless the current call came via PQisBusy, much less some
entirely new call like PQhasRowOrResult. It's unusably fragile (ie,
timing sensitive) if that has to be true.

Agreed for PQisBusy, but why is PQhasRowOrResult() fragile?

It's easy to make PQhasRowOrResult more robust - let it set flag under
PGconn and let getAnotherTuple() do early exit on it's own, thus keeping
callback completely out of loop. Thus avoiding any chance user callback
can accidentally trigger the behaviour.

There's another way in which I think your proposal breaks the existing
API, which is that without an internal PQASYNC_SUSPENDED state that is
cleared by PQgetResult, it's unsafe to probe PQisBusy multiple times
before doing something useful. That shouldn't be the case. PQisBusy
is supposed to test whether data is ready for the application to do
something with --- it should not throw away data until a separate call
has been made to consume the data. Changing its behavior like that
would risk creating subtle bugs in existing event-loop logic. A
possibly useful analogy is that select() and poll() don't clear a
socket's read-ready state, you have to do a separate read() to do that.
There are good reasons for that separation of actions.

It does look like argument for new "modes" for row-by-row processing,
preferably with new API calls.

Because adding "magic" to existing APIs seems to only cause confusion.

Again, note that if we would provide PQrecvRow()-style API, then we *don't*
need early-exit in callback API. Nor exceptions...

AFAICS, we *do* need exceptions for dblink's usage. So most of what's
at stake here is not avoidable, unless you're proposing we put this
whole set of patches off till 9.3 so we can think it over some more.

My point was that if we have an API for processing rows outside libpq,
the need for exceptions for callbacks is mostly gone. Converting dblink
to PQhasRowOrResult() should not be hard.

I don't mind keeping exceptions, but only if we don't need to add extra
complexity just for them.

--
marko

#113Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#112)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

On Tue, Apr 03, 2012 at 05:32:25PM -0400, Tom Lane wrote:

Well, there are really four levels to the API design:
* Plain old PQexec.
* Break down PQexec into PQsendQuery and PQgetResult.
* Avoid waiting in PQgetResult by testing PQisBusy.
* Avoid waiting in PQsendQuery (ie, avoid the risk of blocking
on socket writes) by using PQisnonblocking.

Thats actually nice overview. I think our basic disagreement comes
from how we map the early-exit into those modes.
I want to think of the early-exit row-processing as 5th and 6th modes:

* Row-by-row processing on sync connection (PQsendQuery() + ???)
* Row-by-row processing on async connection (PQsendQuery() + ???)

But instead you want work with almost no changes on existing modes.

Well, the trouble with the proposed PQgetRow/PQrecvRow is that they only
work safely at the second API level. They're completely unsafe to use
with PQisBusy, and I think that is a show-stopper. In your own terms,
the "6th mode" doesn't work.

More generally, it's not very safe to change the row processor while a
query is in progress. PQskipResult can get away with doing so, but only
because the entire point of that function is to lose data, and we don't
much care whether some rows already got handled differently. For every
other use-case, you have to set up the row processor in advance and
leave it in place, which is a guideline that PQgetRow/PQrecvRow violate.

So I think the only way to use row-by-row processing is to permanently
install a row processor that normally returns zero. It's possible that
we could provide a predefined row processor that acts that way and
invite people to install it. However, I think it's premature to suppose
that we know all the details of how somebody might want to use this.
In particular the notion of cloning the PGresult for each row seems
expensive and not obviously more useful than direct access to the
network buffer. So I'd rather leave it as-is and see if any common
usage patterns arise, then add support for those patterns.

In particular, I flat out will not accept a design in which that option
doesn't work unless the current call came via PQisBusy, much less some
entirely new call like PQhasRowOrResult. It's unusably fragile (ie,
timing sensitive) if that has to be true.

Agreed for PQisBusy, but why is PQhasRowOrResult() fragile?

Because it breaks if you use PQisBusy *anywhere* in the application.
That's not just a bug hazard but a loss of functionality. I think it's
important to have a pure "is data available" state test function that
doesn't cause data to be consumed from the connection, and there's no
way to have that if there are API functions that change the row
processor setting mid-query. (Another way to say this is that PQisBusy
ought to be idempotent from the standpoint of the application --- we
know that it does perform work inside libpq, but it doesn't change the
state of the connection so far as the app can tell, and so it doesn't
matter if you call it zero, one, or N times between other calls.)

regards, tom lane

#114Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Tom Lane (#109)
1 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

Hello, This is the new version of dblink patch.

- Calling dblink_is_busy prevents row processor from being used.

- some PGresult leak fixed.

- Rebased to current head.

A hack on top of that hack would be to collect the data into a
tuplestore that contains all text columns, and then convert to the
correct rowtype during dblink_get_result, but that seems rather ugly
and not terribly high-performance.

What I'm currently thinking we should do is just use the old method
for async queries, and only optimize the synchronous case.

Ok, I agree with you except for performance issue. I give up to use
row processor for async query with dblink_is_busy called.

I thought for awhile that this might represent a generic deficiency
in the whole concept of a row processor, but probably it's mostly
down to dblink's rather bizarre API.  It would be unusual I think for
people to want a row processor that couldn't know what to do until
after the entire query result is received.

I hope so. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachments:

dblink_rowproc_20120405.patchapplication/octet-stream; name=dblink_rowproc_20120405.patchDownload
diff --git a/contrib/dblink/dblink.c b/contrib/dblink/dblink.c
index 46c7cc5..ffdf9e3 100644
--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -56,11 +56,27 @@
 
 PG_MODULE_MAGIC;
 
+typedef struct storeInfo
+{
+	Tuplestorestate *tuplestore;
+	int nattrs;
+	MemoryContext oldcontext;
+	AttInMetadata *attinmeta;
+	TupleDesc tupdesc;
+	char* valbuf;
+	int valbuflen;
+	char **cstrs;
+	bool error_occurred;
+	bool nummismatch;
+} storeInfo;
+
 typedef struct remoteConn
 {
 	PGconn	   *conn;			/* Hold the remote connection */
 	int			openCursorCount;	/* The number of open cursors */
 	bool		newXactForCursor;		/* Opened a transaction for a cursor */
+	bool        materialize_needed; /* Materialize result if true  */
+
 } remoteConn;
 
 /*
@@ -91,11 +107,12 @@ static char *escape_param_str(const char *from);
 static void validate_pkattnums(Relation rel,
 				   int2vector *pkattnums_arg, int32 pknumatts_arg,
 				   int **pkattnums, int *pknumatts);
+static void initStoreInfo(FunctionCallInfo fcinfo, storeInfo *sinfo);
+static int storeHandler(PGresult *res, PGrowValue *columns, void *param);
 
 /* Global */
 static remoteConn *pconn = NULL;
 static HTAB *remoteConnHash = NULL;
-
 /*
  *	Following is list that holds multiple remote connections.
  *	Calling convention of each dblink function changes to accept
@@ -112,6 +129,9 @@ typedef struct remoteConnHashEnt
 /* initial number of connection hashes */
 #define NUMCONN 16
 
+/* Initial block size for value buffer in storeHandler */
+#define INITBUFLEN 64
+
 /* general utility */
 #define xpfree(var_) \
 	do { \
@@ -201,6 +221,7 @@ typedef struct remoteConnHashEnt
 				pconn->conn = NULL; \
 				pconn->openCursorCount = 0; \
 				pconn->newXactForCursor = FALSE; \
+				pconn->materialize_needed = false;	\
 			} \
 	} while (0)
 
@@ -229,8 +250,11 @@ dblink_connect(PG_FUNCTION_ARGS)
 		conname_or_str = text_to_cstring(PG_GETARG_TEXT_PP(0));
 
 	if (connname)
+	{
 		rconn = (remoteConn *) MemoryContextAlloc(TopMemoryContext,
 												  sizeof(remoteConn));
+		rconn->materialize_needed = false;
+	}
 
 	/* first check for valid foreign data server */
 	connstr = get_connect_string(conname_or_str);
@@ -504,6 +528,7 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	char	   *curname = NULL;
 	int			howmany = 0;
 	bool		fail = true;	/* default to backward compatible */
+	storeInfo   storeInfo;
 
 	prepTuplestoreResult(fcinfo);
 
@@ -557,15 +582,52 @@ dblink_fetch(PG_FUNCTION_ARGS)
 	appendStringInfo(&buf, "FETCH %d FROM %s", howmany, curname);
 
 	/*
+	 * Result is stored into storeinfo.tuplestore instead of
+	 * res->result retuned by PQexec below
+	 */
+	initStoreInfo(fcinfo, &storeInfo);
+	PQsetRowProcessor(conn, storeHandler, &storeInfo);
+	
+	/*
 	 * Try to execute the query.  Note that since libpq uses malloc, the
 	 * PGresult will be long-lived even though we are still in a short-lived
 	 * memory context.
 	 */
-	res = PQexec(conn, buf.data);
+	PG_TRY();
+	{
+		res = PQexec(conn, buf.data);
+	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
+
+		PQsetRowProcessor(conn, NULL, NULL);
+		edata = CopyErrorData();
+		FlushErrorState();
+
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+	PQsetRowProcessor(conn, NULL, NULL);
+
 	if (!res ||
 		(PQresultStatus(res) != PGRES_COMMAND_OK &&
 		 PQresultStatus(res) != PGRES_TUPLES_OK))
 	{
+		if (storeInfo.nummismatch)
+		{
+			if (res)
+				PQclear(res);
+
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
+
 		dblink_res_error(conname, res, "could not fetch from cursor", fail);
 		return (Datum) 0;
 	}
@@ -577,8 +639,8 @@ dblink_fetch(PG_FUNCTION_ARGS)
 				(errcode(ERRCODE_INVALID_CURSOR_NAME),
 				 errmsg("cursor \"%s\" does not exist", curname)));
 	}
+	PQclear(res);
 
-	materializeResult(fcinfo, res);
 	return (Datum) 0;
 }
 
@@ -616,6 +678,8 @@ dblink_send_query(PG_FUNCTION_ARGS)
 	if (retval != 1)
 		elog(NOTICE, "%s", PQerrorMessage(conn));
 
+	rconn->materialize_needed = false;
+	
 	PG_RETURN_INT32(retval);
 }
 
@@ -638,11 +702,12 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 	remoteConn *rconn = NULL;
 	bool		fail = true;	/* default to backward compatible */
 	bool		freeconn = false;
-
-	prepTuplestoreResult(fcinfo);
+	storeInfo   storeInfo;
 
 	DBLINK_INIT;
 
+	prepTuplestoreResult(fcinfo);
+
 	if (!is_async)
 	{
 		if (PG_NARGS() == 3)
@@ -698,31 +763,97 @@ dblink_record_internal(FunctionCallInfo fcinfo, bool is_async)
 
 	if (!conn)
 		DBLINK_CONN_NOT_AVAIL;
-
-	/* synchronous query, or async result retrieval */
-	if (!is_async)
-		res = PQexec(conn, sql);
+	
+	if (!is_async || (rconn && !rconn->materialize_needed))
+	{
+		/*
+		 * Result is stored into storeinfo.tuplestore instead of
+		 * res->result retuned by PQexec/PQgetResult below
+		 */
+		initStoreInfo(fcinfo, &storeInfo);
+		PQsetRowProcessor(conn, storeHandler, &storeInfo);
+	}
 	else
+		storeInfo.nummismatch = false;
+
+	PG_TRY();
 	{
-		res = PQgetResult(conn);
-		/* NULL means we're all done with the async results */
-		if (!res)
-			return (Datum) 0;
+		/* synchronous query, or async result retrieval */
+		if (!is_async)
+			res = PQexec(conn, sql);
+		else
+			res = PQgetResult(conn);
 	}
+	PG_CATCH();
+	{
+		ErrorData *edata;
 
-	/* if needed, close the connection to the database and cleanup */
-	if (freeconn)
-		PQfinish(conn);
+		PQsetRowProcessor(conn, NULL, NULL);
+		edata = CopyErrorData();
+		FlushErrorState();
 
-	if (!res ||
-		(PQresultStatus(res) != PGRES_COMMAND_OK &&
-		 PQresultStatus(res) != PGRES_TUPLES_OK))
+		/* Skip remaining results when storeHandler raises exception. */
+		PQskipResult(conn, TRUE);
+		ReThrowError(edata);
+	}
+	PG_END_TRY();
+	PQsetRowProcessor(conn, NULL, NULL);
+
+	/* NULL res from async get means we're all done with the results */
+	if (res || !is_async)
 	{
-		dblink_res_error(conname, res, "could not execute query", fail);
-		return (Datum) 0;
+		if (freeconn)
+			PQfinish(conn);
+
+		/*
+		 * exclude mismatch of the numbers of the colums here so as to
+		 * behave as before.
+		 */
+		if (!res ||
+			(PQresultStatus(res) != PGRES_COMMAND_OK &&
+			 PQresultStatus(res) != PGRES_TUPLES_OK &&
+			 !storeInfo.nummismatch))
+		{
+			dblink_res_error(conname, res, "could not execute query", fail);
+			return (Datum) 0;
+		}
+
+		/*
+		 * Materialize result if command result or materiarize is needed.
+		 * Current libpq and dblink API design does not allow to use row
+		 * processor for asynchronous query when dblink_is_busy is called prior
+		 * to dblink_get_result.
+		 */
+		if (PQresultStatus(res) == PGRES_COMMAND_OK ||
+			(rconn && rconn->materialize_needed))
+		{
+			materializeResult(fcinfo, res);
+			return (Datum) 0;
+		}
+		else if (get_call_result_type(fcinfo, NULL, NULL) == TYPEFUNC_RECORD)
+		{
+			PQclear(res);
+
+			/* failed to determine actual type of RECORD */
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("function returning record called in context "
+							"that cannot accept type record")));
+		}
+
+		if (storeInfo.nummismatch)
+		{
+			/* This is only for backward compatibility */
+			ereport(ERROR,
+					(errcode(ERRCODE_DATATYPE_MISMATCH),
+					 errmsg("remote query result rowtype does not match "
+							"the specified FROM clause rowtype")));
+		}
 	}
 
-	materializeResult(fcinfo, res);
+	if (res)
+		PQclear(res);
+
 	return (Datum) 0;
 }
 
@@ -890,375 +1021,518 @@ materializeResult(FunctionCallInfo fcinfo, PGresult *res)
 	PG_END_TRY();
 }
 
-/*
- * List all open dblink connections by name.
- * Returns an array of all connection names.
- * Takes no params
- */
-PG_FUNCTION_INFO_V1(dblink_get_connections);
-Datum
-dblink_get_connections(PG_FUNCTION_ARGS)
-{
-	HASH_SEQ_STATUS status;
-	remoteConnHashEnt *hentry;
-	ArrayBuildState *astate = NULL;
-
-	if (remoteConnHash)
-	{
-		hash_seq_init(&status, remoteConnHash);
-		while ((hentry = (remoteConnHashEnt *) hash_seq_search(&status)) != NULL)
-		{
-			/* stash away current value */
-			astate = accumArrayResult(astate,
-									  CStringGetTextDatum(hentry->name),
-									  false, TEXTOID, CurrentMemoryContext);
-		}
-	}
-
-	if (astate)
-		PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate,
-											  CurrentMemoryContext));
-	else
-		PG_RETURN_NULL();
-}
-
-/*
- * Checks if a given remote connection is busy
- *
- * Returns 1 if the connection is busy, 0 otherwise
- * Params:
- *	text connection_name - name of the connection to check
- *
- */
-PG_FUNCTION_INFO_V1(dblink_is_busy);
-Datum
-dblink_is_busy(PG_FUNCTION_ARGS)
-{
-	char	   *conname = NULL;
-	PGconn	   *conn = NULL;
-	remoteConn *rconn = NULL;
-
-	DBLINK_INIT;
-	DBLINK_GET_NAMED_CONN;
-
-	PQconsumeInput(conn);
-	PG_RETURN_INT32(PQisBusy(conn));
-}
-
-/*
- * Cancels a running request on a connection
- *
- * Returns text:
- *	"OK" if the cancel request has been sent correctly,
- *		an error message otherwise
- *
- * Params:
- *	text connection_name - name of the connection to check
- *
- */
-PG_FUNCTION_INFO_V1(dblink_cancel_query);
-Datum
-dblink_cancel_query(PG_FUNCTION_ARGS)
-{
-	int			res = 0;
-	char	   *conname = NULL;
-	PGconn	   *conn = NULL;
-	remoteConn *rconn = NULL;
-	PGcancel   *cancel;
-	char		errbuf[256];
-
-	DBLINK_INIT;
-	DBLINK_GET_NAMED_CONN;
-	cancel = PQgetCancel(conn);
-
-	res = PQcancel(cancel, errbuf, 256);
-	PQfreeCancel(cancel);
-
-	if (res == 1)
-		PG_RETURN_TEXT_P(cstring_to_text("OK"));
-	else
-		PG_RETURN_TEXT_P(cstring_to_text(errbuf));
-}
-
-
-/*
- * Get error message from a connection
- *
- * Returns text:
- *	"OK" if no error, an error message otherwise
- *
- * Params:
- *	text connection_name - name of the connection to check
- *
- */
-PG_FUNCTION_INFO_V1(dblink_error_message);
-Datum
-dblink_error_message(PG_FUNCTION_ARGS)
-{
-	char	   *msg;
-	char	   *conname = NULL;
-	PGconn	   *conn = NULL;
-	remoteConn *rconn = NULL;
-
-	DBLINK_INIT;
-	DBLINK_GET_NAMED_CONN;
-
-	msg = PQerrorMessage(conn);
-	if (msg == NULL || msg[0] == '\0')
-		PG_RETURN_TEXT_P(cstring_to_text("OK"));
-	else
-		PG_RETURN_TEXT_P(cstring_to_text(msg));
-}
-
-/*
- * Execute an SQL non-SELECT command
- */
-PG_FUNCTION_INFO_V1(dblink_exec);
-Datum
-dblink_exec(PG_FUNCTION_ARGS)
+static void
+initStoreInfo(FunctionCallInfo fcinfo, storeInfo *sinfo)
 {
-	text	   *volatile sql_cmd_status = NULL;
-	PGconn	   *volatile conn = NULL;
-	volatile bool freeconn = false;
-
-	DBLINK_INIT;
-
-	PG_TRY();
-	{
-		char	   *msg;
-		PGresult   *res = NULL;
-		char	   *connstr = NULL;
-		char	   *sql = NULL;
-		char	   *conname = NULL;
-		remoteConn *rconn = NULL;
-		bool		fail = true;	/* default to backward compatible behavior */
-
-		if (PG_NARGS() == 3)
-		{
-			/* must be text,text,bool */
-			DBLINK_GET_CONN;
-			sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
-			fail = PG_GETARG_BOOL(2);
-		}
-		else if (PG_NARGS() == 2)
-		{
-			/* might be text,text or text,bool */
-			if (get_fn_expr_argtype(fcinfo->flinfo, 1) == BOOLOID)
-			{
-				conn = pconn->conn;
-				sql = text_to_cstring(PG_GETARG_TEXT_PP(0));
-				fail = PG_GETARG_BOOL(1);
-			}
-			else
-			{
-				DBLINK_GET_CONN;
-				sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
-			}
-		}
-		else if (PG_NARGS() == 1)
-		{
-			/* must be single text argument */
-			conn = pconn->conn;
-			sql = text_to_cstring(PG_GETARG_TEXT_PP(0));
-		}
-		else
-			/* shouldn't happen */
-			elog(ERROR, "wrong number of arguments");
-
-		if (!conn)
-			DBLINK_CONN_NOT_AVAIL;
+	ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
+	TupleDesc       tupdesc = NULL;
+	MemoryContext   oldcontext;
 
-		res = PQexec(conn, sql);
-		if (!res ||
-			(PQresultStatus(res) != PGRES_COMMAND_OK &&
-			 PQresultStatus(res) != PGRES_TUPLES_OK))
-		{
-			dblink_res_error(conname, res, "could not execute command", fail);
+	oldcontext = MemoryContextSwitchTo(
+		rsinfo->econtext->ecxt_per_query_memory);
 
-			/*
-			 * and save a copy of the command status string to return as our
-			 * result tuple
-			 */
-			sql_cmd_status = cstring_to_text("ERROR");
-		}
-		else if (PQresultStatus(res) == PGRES_COMMAND_OK)
-		{
-			/*
-			 * and save a copy of the command status string to return as our
-			 * result tuple
-			 */
-			sql_cmd_status = cstring_to_text(PQcmdStatus(res));
-			PQclear(res);
-		}
-		else
-		{
-			PQclear(res);
-			ereport(ERROR,
-				  (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
-				   errmsg("statement returning results not allowed")));
-		}
-	}
-	PG_CATCH();
+	switch (get_call_result_type(fcinfo, NULL, &tupdesc))
 	{
-		/* if needed, close the connection to the database */
-		if (freeconn)
-			PQfinish(conn);
-		PG_RE_THROW();
-	}
-	PG_END_TRY();
-
-	/* if needed, close the connection to the database */
-	if (freeconn)
-		PQfinish(conn);
-
-	PG_RETURN_TEXT_P(sql_cmd_status);
-}
-
-
-/*
- * dblink_get_pkey
- *
- * Return list of primary key fields for the supplied relation,
- * or NULL if none exists.
- */
-PG_FUNCTION_INFO_V1(dblink_get_pkey);
-Datum
-dblink_get_pkey(PG_FUNCTION_ARGS)
-{
-	int16		numatts;
-	char	  **results;
-	FuncCallContext *funcctx;
-	int32		call_cntr;
-	int32		max_calls;
-	AttInMetadata *attinmeta;
-	MemoryContext oldcontext;
-
-	/* stuff done only on the first call of the function */
-	if (SRF_IS_FIRSTCALL())
-	{
-		Relation	rel;
-		TupleDesc	tupdesc;
-
-		/* create a function context for cross-call persistence */
-		funcctx = SRF_FIRSTCALL_INIT();
-
-		/*
-		 * switch to memory context appropriate for multiple function calls
-		 */
-		oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
-
-		/* open target relation */
-		rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
-
-		/* get the array of attnums */
-		results = get_pkey_attnames(rel, &numatts);
-
-		relation_close(rel, AccessShareLock);
-
-		/*
-		 * need a tuple descriptor representing one INT and one TEXT column
-		 */
-		tupdesc = CreateTemplateTupleDesc(2, false);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 1, "position",
-						   INT4OID, -1, 0);
-		TupleDescInitEntry(tupdesc, (AttrNumber) 2, "colname",
-						   TEXTOID, -1, 0);
-
-		/*
-		 * Generate attribute metadata needed later to produce tuples from raw
-		 * C strings
-		 */
-		attinmeta = TupleDescGetAttInMetadata(tupdesc);
-		funcctx->attinmeta = attinmeta;
-
-		if ((results != NULL) && (numatts > 0))
-		{
-			funcctx->max_calls = numatts;
-
-			/* got results, keep track of them */
-			funcctx->user_fctx = results;
-		}
-		else
-		{
-			/* fast track when no results */
-			MemoryContextSwitchTo(oldcontext);
-			SRF_RETURN_DONE(funcctx);
-		}
-
-		MemoryContextSwitchTo(oldcontext);
+		case TYPEFUNC_COMPOSITE:
+			tupdesc = CreateTupleDescCopy(tupdesc);
+			sinfo->nattrs = tupdesc->natts;
+			break;
+				
+		case TYPEFUNC_RECORD:
+			tupdesc = CreateTemplateTupleDesc(1, false);
+			TupleDescInitEntry(tupdesc, (AttrNumber) 1, "status",
+							   TEXTOID, -1, 0);
+			sinfo->nattrs = 1;
+			break;
+				
+		default:
+			/* result type isn't composite */
+			elog(ERROR, "return type must be a row type");
+			break;
 	}
 
-	/* stuff done on every call of the function */
-	funcctx = SRF_PERCALL_SETUP();
-
-	/*
-	 * initialize per-call variables
-	 */
-	call_cntr = funcctx->call_cntr;
-	max_calls = funcctx->max_calls;
-
-	results = (char **) funcctx->user_fctx;
-	attinmeta = funcctx->attinmeta;
-
-	if (call_cntr < max_calls)	/* do when there is more left to send */
-	{
-		char	  **values;
-		HeapTuple	tuple;
-		Datum		result;
+	sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+	sinfo->error_occurred = FALSE;
+	sinfo->nummismatch = FALSE;
+	sinfo->tuplestore = tuplestore_begin_heap(true, false, work_mem);
+	sinfo->valbuflen = INITBUFLEN;
+	sinfo->valbuf = (char *)palloc(sinfo->valbuflen);
+	sinfo->cstrs = (char **)palloc(sinfo->nattrs * sizeof(char *));
 
-		values = (char **) palloc(2 * sizeof(char *));
-		values[0] = (char *) palloc(12);		/* sign, 10 digits, '\0' */
-
-		sprintf(values[0], "%d", call_cntr + 1);
-
-		values[1] = results[call_cntr];
-
-		/* build the tuple */
-		tuple = BuildTupleFromCStrings(attinmeta, values);
-
-		/* make the tuple into a datum */
-		result = HeapTupleGetDatum(tuple);
+	rsinfo->setResult = sinfo->tuplestore;
+	rsinfo->setDesc = tupdesc;
 
-		SRF_RETURN_NEXT(funcctx, result);
-	}
-	else
-	{
-		/* do when there is no more left */
-		SRF_RETURN_DONE(funcctx);
-	}
+	MemoryContextSwitchTo(oldcontext);
 }
 
-
-/*
- * dblink_build_sql_insert
- *
- * Used to generate an SQL insert statement
- * based on an existing tuple in a local relation.
- * This is useful for selectively replicating data
- * to another server via dblink.
- *
- * API:
- * <relname> - name of local table of interest
- * <pkattnums> - an int2vector of attnums which will be used
- * to identify the local tuple of interest
- * <pknumatts> - number of attnums in pkattnums
- * <src_pkattvals_arry> - text array of key values which will be used
- * to identify the local tuple of interest
- * <tgt_pkattvals_arry> - text array of key values which will be used
- * to build the string for execution remotely. These are substituted
- * for their counterparts in src_pkattvals_arry
- */
-PG_FUNCTION_INFO_V1(dblink_build_sql_insert);
-Datum
-dblink_build_sql_insert(PG_FUNCTION_ARGS)
-{
-	text	   *relname_text = PG_GETARG_TEXT_P(0);
-	int2vector *pkattnums_arg = (int2vector *) PG_GETARG_POINTER(1);
-	int32		pknumatts_arg = PG_GETARG_INT32(2);
-	ArrayType  *src_pkattvals_arry = PG_GETARG_ARRAYTYPE_P(3);
-	ArrayType  *tgt_pkattvals_arry = PG_GETARG_ARRAYTYPE_P(4);
-	Relation	rel;
-	int		   *pkattnums;
+ /* Prototype of this function is PQrowProcessor */
+ static int
+ storeHandler(PGresult *res, PGrowValue *columns, void *param)
+ {
+	 storeInfo *sinfo = (storeInfo *)param;
+	 HeapTuple  tuple;
+	 int        newbuflen;
+	 int        fields = PQnfields(res);
+	 int        i;
+	 char       **cstrs = sinfo->cstrs;
+	 char       *pbuf;
+
+	 if (sinfo->error_occurred)
+		 return -1;
+
+	 if (sinfo->nattrs == 0)
+	 {
+		 int i;
+		 TupleDesc tupdesc = CreateTemplateTupleDesc(fields, false);
+
+		 sinfo->nattrs = fields;
+		 for (i = 1 ; i <= fields ; i++)
+			 TupleDescInitEntry(tupdesc, (AttrNumber)i, "hoge",
+								TEXTOID, -1, 0);
+		 sinfo->attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		 sinfo->tupdesc = tupdesc;
+	 }
+
+	 if (sinfo->nattrs != fields)
+	 {
+		 sinfo->error_occurred = TRUE;
+		 sinfo->nummismatch = TRUE;
+
+		 /* This error will be processed in dblink_record_internal() */
+		 return -1;
+	 }
+
+	 /*
+	  * value input functions assumes that the input string is
+	  * terminated by zero. We should make the values to be so.
+	  */
+
+	 /*
+	  * The length of the buffer for each field is value length + 1 for
+	  * zero-termination
+	  */
+	 newbuflen = fields;
+	 for(i = 0 ; i < fields ; i++)
+		 newbuflen += columns[i].len;
+
+	 if (newbuflen > sinfo->valbuflen)
+	 {
+		 int tmplen = sinfo->valbuflen * 2;
+		 /*
+		  * Try to (re)allocate in bigger steps to avoid flood of allocations
+		  * on weird data.
+		  */
+		 while (newbuflen > tmplen && tmplen >= 0)
+			 tmplen *= 2;
+
+		 /* Check if the integer was wrap-rounded. */
+		 if (tmplen < 0)
+			 elog(ERROR, "Buffer size for one row exceeds integer limit");
+		 sinfo->valbuf = (char *)repalloc(sinfo->valbuf, tmplen);
+		 sinfo->valbuflen = tmplen;
+	 }
+
+	 pbuf = sinfo->valbuf;
+	 for(i = 0 ; i < fields ; i++)
+	 {
+		 int len = columns[i].len;
+		 if (len < 0)
+			 cstrs[i] = NULL;
+		 else
+		 {
+			 cstrs[i] = pbuf;
+			 memcpy(pbuf, columns[i].value, len);
+			 pbuf += len;
+			 *pbuf++ = '\0';
+		 }
+	 }
+
+	 /*
+	  * These functions may throw exception. It will be caught in
+	  * dblink_record_internal()
+	  */
+	 tuple = BuildTupleFromCStrings(sinfo->attinmeta, cstrs);
+	 tuplestore_puttuple(sinfo->tuplestore, tuple);
+
+	 return 1;
+ }
+
+ /*
+  * List all open dblink connections by name.
+  * Returns an array of all connection names.
+  * Takes no params
+  */
+ PG_FUNCTION_INFO_V1(dblink_get_connections);
+ Datum
+ dblink_get_connections(PG_FUNCTION_ARGS)
+ {
+	 HASH_SEQ_STATUS status;
+	 remoteConnHashEnt *hentry;
+	 ArrayBuildState *astate = NULL;
+
+	 if (remoteConnHash)
+	 {
+		 hash_seq_init(&status, remoteConnHash);
+		 while ((hentry = (remoteConnHashEnt *) hash_seq_search(&status)) != NULL)
+		 {
+			 /* stash away current value */
+			 astate = accumArrayResult(astate,
+									   CStringGetTextDatum(hentry->name),
+									   false, TEXTOID, CurrentMemoryContext);
+		 }
+	 }
+
+	 if (astate)
+		 PG_RETURN_ARRAYTYPE_P(makeArrayResult(astate,
+											   CurrentMemoryContext));
+	 else
+		 PG_RETURN_NULL();
+ }
+
+ /*
+  * Checks if a given remote connection is busy
+  *
+  * Returns 1 if the connection is busy, 0 otherwise
+  * Params:
+  *	text connection_name - name of the connection to check
+  *
+  */
+ PG_FUNCTION_INFO_V1(dblink_is_busy);
+ Datum
+ dblink_is_busy(PG_FUNCTION_ARGS)
+ {
+	 char	   *conname = NULL;
+	 PGconn	   *conn = NULL;
+	 remoteConn *rconn = NULL;
+
+	 DBLINK_INIT;
+	 DBLINK_GET_NAMED_CONN;
+	 
+	 /*
+	  * The result will be read by calling dblink_is_busy on current implement.
+	  * This disables to use storeHandler afterwards. Materialize needs return
+	  * type information of dblink_get_result which is not available here.
+	  */
+	 rconn->materialize_needed = true;
+	 
+	 PQconsumeInput(conn);
+	 PG_RETURN_INT32(PQisBusy(conn));
+ }
+
+ /*
+  * Cancels a running request on a connection
+  *
+  * Returns text:
+  *	"OK" if the cancel request has been sent correctly,
+  *		an error message otherwise
+  *
+  * Params:
+  *	text connection_name - name of the connection to check
+  *
+  */
+ PG_FUNCTION_INFO_V1(dblink_cancel_query);
+ Datum
+ dblink_cancel_query(PG_FUNCTION_ARGS)
+ {
+	 int			res = 0;
+	 char	   *conname = NULL;
+	 PGconn	   *conn = NULL;
+	 remoteConn *rconn = NULL;
+	 PGcancel   *cancel;
+	 char		errbuf[256];
+
+	 DBLINK_INIT;
+	 DBLINK_GET_NAMED_CONN;
+	 cancel = PQgetCancel(conn);
+
+	 res = PQcancel(cancel, errbuf, 256);
+	 PQfreeCancel(cancel);
+
+	 if (res == 1)
+		 PG_RETURN_TEXT_P(cstring_to_text("OK"));
+	 else
+		 PG_RETURN_TEXT_P(cstring_to_text(errbuf));
+ }
+
+
+ /*
+  * Get error message from a connection
+  *
+  * Returns text:
+  *	"OK" if no error, an error message otherwise
+  *
+  * Params:
+  *	text connection_name - name of the connection to check
+  *
+  */
+ PG_FUNCTION_INFO_V1(dblink_error_message);
+ Datum
+ dblink_error_message(PG_FUNCTION_ARGS)
+ {
+	 char	   *msg;
+	 char	   *conname = NULL;
+	 PGconn	   *conn = NULL;
+	 remoteConn *rconn = NULL;
+
+	 DBLINK_INIT;
+	 DBLINK_GET_NAMED_CONN;
+
+	 msg = PQerrorMessage(conn);
+	 if (msg == NULL || msg[0] == '\0')
+		 PG_RETURN_TEXT_P(cstring_to_text("OK"));
+	 else
+		 PG_RETURN_TEXT_P(cstring_to_text(msg));
+ }
+
+ /*
+  * Execute an SQL non-SELECT command
+  */
+ PG_FUNCTION_INFO_V1(dblink_exec);
+ Datum
+ dblink_exec(PG_FUNCTION_ARGS)
+ {
+	 text	   *volatile sql_cmd_status = NULL;
+	 PGconn	   *volatile conn = NULL;
+	 volatile bool freeconn = false;
+
+	 DBLINK_INIT;
+
+	 PG_TRY();
+	 {
+		 char	   *msg;
+		 PGresult   *res = NULL;
+		 char	   *connstr = NULL;
+		 char	   *sql = NULL;
+		 char	   *conname = NULL;
+		 remoteConn *rconn = NULL;
+		 bool		fail = true;	/* default to backward compatible behavior */
+
+		 if (PG_NARGS() == 3)
+		 {
+			 /* must be text,text,bool */
+			 DBLINK_GET_CONN;
+			 sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
+			 fail = PG_GETARG_BOOL(2);
+		 }
+		 else if (PG_NARGS() == 2)
+		 {
+			 /* might be text,text or text,bool */
+			 if (get_fn_expr_argtype(fcinfo->flinfo, 1) == BOOLOID)
+			 {
+				 conn = pconn->conn;
+				 sql = text_to_cstring(PG_GETARG_TEXT_PP(0));
+				 fail = PG_GETARG_BOOL(1);
+			 }
+			 else
+			 {
+				 DBLINK_GET_CONN;
+				 sql = text_to_cstring(PG_GETARG_TEXT_PP(1));
+			 }
+		 }
+		 else if (PG_NARGS() == 1)
+		 {
+			 /* must be single text argument */
+			 conn = pconn->conn;
+			 sql = text_to_cstring(PG_GETARG_TEXT_PP(0));
+		 }
+		 else
+			 /* shouldn't happen */
+			 elog(ERROR, "wrong number of arguments");
+
+		 if (!conn)
+			 DBLINK_CONN_NOT_AVAIL;
+
+		 res = PQexec(conn, sql);
+		 if (!res ||
+			 (PQresultStatus(res) != PGRES_COMMAND_OK &&
+			  PQresultStatus(res) != PGRES_TUPLES_OK))
+		 {
+			 dblink_res_error(conname, res, "could not execute command", fail);
+
+			 /*
+			  * and save a copy of the command status string to return as our
+			  * result tuple
+			  */
+			 sql_cmd_status = cstring_to_text("ERROR");
+		 }
+		 else if (PQresultStatus(res) == PGRES_COMMAND_OK)
+		 {
+			 /*
+			  * and save a copy of the command status string to return as our
+			  * result tuple
+			  */
+			 sql_cmd_status = cstring_to_text(PQcmdStatus(res));
+			 PQclear(res);
+		 }
+		 else
+		 {
+			 PQclear(res);
+			 ereport(ERROR,
+				   (errcode(ERRCODE_S_R_E_PROHIBITED_SQL_STATEMENT_ATTEMPTED),
+					errmsg("statement returning results not allowed")));
+		 }
+	 }
+	 PG_CATCH();
+	 {
+		 /* if needed, close the connection to the database */
+		 if (freeconn)
+			 PQfinish(conn);
+		 PG_RE_THROW();
+	 }
+	 PG_END_TRY();
+
+	 /* if needed, close the connection to the database */
+	 if (freeconn)
+		 PQfinish(conn);
+
+	 PG_RETURN_TEXT_P(sql_cmd_status);
+ }
+
+
+ /*
+  * dblink_get_pkey
+  *
+  * Return list of primary key fields for the supplied relation,
+  * or NULL if none exists.
+  */
+ PG_FUNCTION_INFO_V1(dblink_get_pkey);
+ Datum
+ dblink_get_pkey(PG_FUNCTION_ARGS)
+ {
+	 int16		numatts;
+	 char	  **results;
+	 FuncCallContext *funcctx;
+	 int32		call_cntr;
+	 int32		max_calls;
+	 AttInMetadata *attinmeta;
+	 MemoryContext oldcontext;
+
+	 /* stuff done only on the first call of the function */
+	 if (SRF_IS_FIRSTCALL())
+	 {
+		 Relation	rel;
+		 TupleDesc	tupdesc;
+
+		 /* create a function context for cross-call persistence */
+		 funcctx = SRF_FIRSTCALL_INIT();
+
+		 /*
+		  * switch to memory context appropriate for multiple function calls
+		  */
+		 oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
+
+		 /* open target relation */
+		 rel = get_rel_from_relname(PG_GETARG_TEXT_P(0), AccessShareLock, ACL_SELECT);
+
+		 /* get the array of attnums */
+		 results = get_pkey_attnames(rel, &numatts);
+
+		 relation_close(rel, AccessShareLock);
+
+		 /*
+		  * need a tuple descriptor representing one INT and one TEXT column
+		  */
+		 tupdesc = CreateTemplateTupleDesc(2, false);
+		 TupleDescInitEntry(tupdesc, (AttrNumber) 1, "position",
+							INT4OID, -1, 0);
+		 TupleDescInitEntry(tupdesc, (AttrNumber) 2, "colname",
+							TEXTOID, -1, 0);
+
+		 /*
+		  * Generate attribute metadata needed later to produce tuples from raw
+		  * C strings
+		  */
+		 attinmeta = TupleDescGetAttInMetadata(tupdesc);
+		 funcctx->attinmeta = attinmeta;
+
+		 if ((results != NULL) && (numatts > 0))
+		 {
+			 funcctx->max_calls = numatts;
+
+			 /* got results, keep track of them */
+			 funcctx->user_fctx = results;
+		 }
+		 else
+		 {
+			 /* fast track when no results */
+			 MemoryContextSwitchTo(oldcontext);
+			 SRF_RETURN_DONE(funcctx);
+		 }
+
+		 MemoryContextSwitchTo(oldcontext);
+	 }
+
+	 /* stuff done on every call of the function */
+	 funcctx = SRF_PERCALL_SETUP();
+
+	 /*
+	  * initialize per-call variables
+	  */
+	 call_cntr = funcctx->call_cntr;
+	 max_calls = funcctx->max_calls;
+
+	 results = (char **) funcctx->user_fctx;
+	 attinmeta = funcctx->attinmeta;
+
+	 if (call_cntr < max_calls)	/* do when there is more left to send */
+	 {
+		 char	  **values;
+		 HeapTuple	tuple;
+		 Datum		result;
+
+		 values = (char **) palloc(2 * sizeof(char *));
+		 values[0] = (char *) palloc(12);		/* sign, 10 digits, '\0' */
+
+		 sprintf(values[0], "%d", call_cntr + 1);
+
+		 values[1] = results[call_cntr];
+
+		 /* build the tuple */
+		 tuple = BuildTupleFromCStrings(attinmeta, values);
+
+		 /* make the tuple into a datum */
+		 result = HeapTupleGetDatum(tuple);
+
+		 SRF_RETURN_NEXT(funcctx, result);
+	 }
+	 else
+	 {
+		 /* do when there is no more left */
+		 SRF_RETURN_DONE(funcctx);
+	 }
+ }
+
+
+ /*
+  * dblink_build_sql_insert
+  *
+  * Used to generate an SQL insert statement
+  * based on an existing tuple in a local relation.
+  * This is useful for selectively replicating data
+  * to another server via dblink.
+  *
+  * API:
+  * <relname> - name of local table of interest
+  * <pkattnums> - an int2vector of attnums which will be used
+  * to identify the local tuple of interest
+  * <pknumatts> - number of attnums in pkattnums
+  * <src_pkattvals_arry> - text array of key values which will be used
+  * to identify the local tuple of interest
+  * <tgt_pkattvals_arry> - text array of key values which will be used
+  * to build the string for execution remotely. These are substituted
+  * for their counterparts in src_pkattvals_arry
+  */
+ PG_FUNCTION_INFO_V1(dblink_build_sql_insert);
+ Datum
+ dblink_build_sql_insert(PG_FUNCTION_ARGS)
+ {
+	 text	   *relname_text = PG_GETARG_TEXT_P(0);
+	 int2vector *pkattnums_arg = (int2vector *) PG_GETARG_POINTER(1);
+	 int32		pknumatts_arg = PG_GETARG_INT32(2);
+	 ArrayType  *src_pkattvals_arry = PG_GETARG_ARRAYTYPE_P(3);
+	 ArrayType  *tgt_pkattvals_arry = PG_GETARG_ARRAYTYPE_P(4);
+	 Relation	rel;
+	 int		   *pkattnums;
 	int			pknumatts;
 	char	  **src_pkattvals;
 	char	  **tgt_pkattvals;
#115Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro HORIGUCHI (#114)
Re: Speed dblink using alternate libpq tuple storage

Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> writes:

What I'm currently thinking we should do is just use the old method
for async queries, and only optimize the synchronous case.

Ok, I agree with you except for performance issue. I give up to use
row processor for async query with dblink_is_busy called.

Yeah, that seems like a reasonable idea.

Given the lack of consensus around the suspension API, maybe the best
way to get the underlying libpq patch to a committable state is to take
it out --- that is, remove the "return zero" option for row processors.
Since we don't have a test case for it in dblink, it's hard to escape
the feeling that we may be expending a lot of effort for something that
nobody really wants, and/or misdesigning it for lack of a concrete use
case. Is anybody going to be really unhappy if that part of the patch
gets left behind?

regards, tom lane

#116Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#115)
Re: Speed dblink using alternate libpq tuple storage

On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given the lack of consensus around the suspension API, maybe the best
way to get the underlying libpq patch to a committable state is to take
it out --- that is, remove the "return zero" option for row processors.
Since we don't have a test case for it in dblink, it's hard to escape
the feeling that we may be expending a lot of effort for something that
nobody really wants, and/or misdesigning it for lack of a concrete use
case.  Is anybody going to be really unhappy if that part of the patch
gets left behind?

Agreed.

--
marko

#117Kyotaro HORIGUCHI
horiguchi.kyotaro@oss.ntt.co.jp
In reply to: Kyotaro HORIGUCHI (#114)
Re: Speed dblink using alternate libpq tuple storage

I'm afraid not re-initializing materialize_needed for the next query
in the latest dblink patch.
I will confirm that and send the another one if needed in a few hours.

# I need to catch the train I usually get on..

Hello, This is the new version of dblink patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

#118Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#116)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given the lack of consensus around the suspension API, maybe the best
way to get the underlying libpq patch to a committable state is to take
it out --- that is, remove the "return zero" option for row processors.

Agreed.

Done that way.

regards, tom lane

#119Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kyotaro HORIGUCHI (#117)
Re: Speed dblink using alternate libpq tuple storage

Kyotaro HORIGUCHI <horiguchi.kyotaro@oss.ntt.co.jp> writes:

I'm afraid not re-initializing materialize_needed for the next query
in the latest dblink patch.
I will confirm that and send the another one if needed in a few hours.

I've committed a revised version of the previous patch. I'm not sure
that the case of dblink_is_busy not being used is interesting enough
to justify contorting the logic, and I'm worried about introducing
corner case bugs for that.

regards, tom lane

#120Kyotaro HORIGUCHI
horiguchi.kyotaro@lab.ntt.co.jp
In reply to: Tom Lane (#119)
Re: Speed dblink using alternate libpq tuple storage

I'm afraid not re-initializing materialize_needed for the next query
in the latest dblink patch.

I've found no need to worry about the re-initializing issue.

I've committed a revised version of the previous patch.

Thank you for that.

I'm not sure that the case of dblink_is_busy not being used is
interesting enough to justify contorting the logic, and I'm
worried about introducing corner case bugs for that.

I'm afraid of indefinite state by mixing sync and async queries
or API call sequence for async query out of my expectations
(which is rather narrow).

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

#121Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#118)
1 attachment(s)
Re: Speed dblink using alternate libpq tuple storage

On Wed, Apr 04, 2012 at 06:41:00PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

On Wed, Apr 4, 2012 at 10:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Given the lack of consensus around the suspension API, maybe the best
way to get the underlying libpq patch to a committable state is to take
it out --- that is, remove the "return zero" option for row processors.

Agreed.

Done that way.

Minor cleanups:

* Change callback return value to be 'bool': 0 is error.
Currently the accepted return codes are 1 and -1,
which is weird.

If we happen to have the 'early-exit' logic in the future,
it should not work via callback return code. So keeping the 0
in reserve is unnecessary.

* Support exceptions in multi-statement PQexec() by storing
finished result under PGconn temporarily. Without doing it,
the result can leak if callback longjmps while processing
next result.

* Add <caution> to docs for permanently keeping custom callback.

This API fragility is also reason why early-exit (if it appears)
should not work via callback - instead it should give safer API.

--
marko

Attachments:

rowproc-cleanups.difftext/x-diff; charset=us-asciiDownload
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 0ec501e..1e7678c 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -5608,6 +5608,16 @@ defaultNoticeProcessor(void *arg, const char *message)
 
   <caution>
    <para>
+    It's dangerous to leave custom row processor attached to connection
+    permanently as there might be occasional queries that expect
+    regular PGresult behaviour.  So unless it is certain all code
+    is familiar with custom callback, it is better to restore default
+    row processor after query is finished.
+   </para>
+  </caution>
+
+  <caution>
+   <para>
     The row processor function sees the rows before it is known whether the
     query will succeed overall, since the server might return some rows before
     encountering an error.  For proper transactional behavior, it must be
@@ -5674,8 +5684,8 @@ typedef struct pgDataValue
    <parameter>errmsgp</parameter> is an output parameter used only for error
    reporting.  If the row processor needs to report an error, it can set
    <literal>*</><parameter>errmsgp</parameter> to point to a suitable message
-   string (and then return <literal>-1</>).  As a special case, returning
-   <literal>-1</> without changing <literal>*</><parameter>errmsgp</parameter>
+   string (and then return <literal>0</>).  As a special case, returning
+   <literal>0</> without changing <literal>*</><parameter>errmsgp</parameter>
    from its initial value of NULL is taken to mean <quote>out of memory</>.
   </para>
 
@@ -5702,10 +5712,10 @@ typedef struct pgDataValue
 
   <para>
    The row processor function must return either <literal>1</> or
-   <literal>-1</>.
+   <literal>0</>.
    <literal>1</> is the normal, successful result value; <application>libpq</>
    will continue with receiving row values from the server and passing them to
-   the row processor.  <literal>-1</> indicates that the row processor has
+   the row processor.  <literal>0</> indicates that the row processor has
    encountered an error.  In that case,
    <application>libpq</> will discard all remaining rows in the result set
    and then return a <literal>PGRES_FATAL_ERROR</> <type>PGresult</type> to
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 03fd6e4..90f6d6a 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -2897,6 +2897,11 @@ closePGconn(PGconn *conn)
 										 * absent */
 	conn->asyncStatus = PGASYNC_IDLE;
 	pqClearAsyncResult(conn);	/* deallocate result */
+	if (conn->tempResult)
+	{
+		PQclear(conn->tempResult);
+		conn->tempResult = NULL;
+	}
 	pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
 	conn->addrlist = NULL;
 	conn->addr_cur = NULL;
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 86f157c..554df94 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -1028,7 +1028,7 @@ PQgetRowProcessor(const PGconn *conn, void **param)
 /*
  * pqStdRowProcessor
  *	  Add the received row to the PGresult structure
- *	  Returns 1 if OK, -1 if error occurred.
+ *	  Returns 1 if OK, 0 if error occurred.
  *
  * Note: "param" should point to the PGconn, but we don't actually need that
  * as of the current coding.
@@ -1059,7 +1059,7 @@ pqStdRowProcessor(PGresult *res, const PGdataValue *columns,
 	tup = (PGresAttValue *)
 		pqResultAlloc(res, nfields * sizeof(PGresAttValue), TRUE);
 	if (tup == NULL)
-		return -1;
+		return 0;
 
 	for (i = 0; i < nfields; i++)
 	{
@@ -1078,7 +1078,7 @@ pqStdRowProcessor(PGresult *res, const PGdataValue *columns,
 
 			val = (char *) pqResultAlloc(res, clen + 1, isbinary);
 			if (val == NULL)
-				return -1;
+				return 0;
 
 			/* copy and zero-terminate the data (even if it's binary) */
 			memcpy(val, columns[i].value, clen);
@@ -1091,7 +1091,7 @@ pqStdRowProcessor(PGresult *res, const PGdataValue *columns,
 
 	/* And add the tuple to the PGresult's tuple array */
 	if (!pqAddTuple(res, tup))
-		return -1;
+		return 0;
 
 	/* Success */
 	return 1;
@@ -1954,6 +1954,13 @@ PQexecFinish(PGconn *conn)
 	PGresult   *result;
 	PGresult   *lastResult;
 
+	/* Make sure old result is NULL. */
+	if (conn->tempResult)
+	{
+		PQclear(conn->tempResult);
+		conn->tempResult = NULL;
+	}
+
 	/*
 	 * For backwards compatibility, return the last result if there are more
 	 * than one --- but merge error messages if we get more than one error
@@ -1991,7 +1998,14 @@ PQexecFinish(PGconn *conn)
 			result->resultStatus == PGRES_COPY_BOTH ||
 			conn->status == CONNECTION_BAD)
 			break;
+
+		/*
+		 * As we know previous result is stored from this function,
+		 * there is no need for extra cleanup here.
+		 */
+		conn->tempResult = lastResult;
 	}
+	conn->tempResult = NULL;
 
 	return lastResult;
 }
diff --git a/src/interfaces/libpq/fe-protocol2.c b/src/interfaces/libpq/fe-protocol2.c
index 43f9954..f155678 100644
--- a/src/interfaces/libpq/fe-protocol2.c
+++ b/src/interfaces/libpq/fe-protocol2.c
@@ -761,7 +761,7 @@ getRowDescriptions(PGconn *conn)
 			/* everything is good */
 			return 0;
 
-		case -1:
+		case 0:
 			/* error, report the errmsg below */
 			break;
 
@@ -950,7 +950,7 @@ getAnotherTuple(PGconn *conn, bool binary)
 			/* everything is good */
 			return 0;
 
-		case -1:
+		case 0:
 			/* error, report the errmsg below */
 			break;
 
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index a773d7a..9124a87 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -589,7 +589,7 @@ getRowDescriptions(PGconn *conn, int msgLength)
 			/* everything is good */
 			return 0;
 
-		case -1:
+		case 0:
 			/* error, report the errmsg below */
 			break;
 
@@ -793,7 +793,7 @@ getAnotherTuple(PGconn *conn, int msgLength)
 			/* everything is good */
 			return 0;
 
-		case -1:
+		case 0:
 			/* error, report the errmsg below */
 			break;
 
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index 0b6e676..7f0f8fd 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -404,6 +404,9 @@ struct pg_conn
 	PGdataValue *rowBuf;		/* array for passing values to rowProcessor */
 	int			rowBufLen;		/* number of entries allocated in rowBuf */
 
+	/* support rowproc exceptions in multi-resultset functions (PQexec) */
+	PGresult	*tempResult;	/* temp result storage */
+
 	/* Status for asynchronous result construction */
 	PGresult   *result;			/* result being constructed */
 
#122Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#121)
Re: Speed dblink using alternate libpq tuple storage

Marko Kreen <markokr@gmail.com> writes:

Minor cleanups:

* Change callback return value to be 'bool': 0 is error.
Currently the accepted return codes are 1 and -1,
which is weird.

No, I did it that way intentionally, with the thought that we might add
back return code zero (or other return codes) in the future. If we go
with bool then sensible expansion is impossible, or at least ugly.
(I think it was you that objected to 0/1/2 in the first place, no?)

If we happen to have the 'early-exit' logic in the future,
it should not work via callback return code.

This assumption seems unproven to me, and even if it were,
it doesn't mean we will never have any other exit codes.

* Support exceptions in multi-statement PQexec() by storing
finished result under PGconn temporarily. Without doing it,
the result can leak if callback longjmps while processing
next result.

I'm unconvinced this is a good thing either.

regards, tom lane

#123Marko Kreen
markokr@gmail.com
In reply to: Tom Lane (#122)
Re: Speed dblink using alternate libpq tuple storage

On Thu, Apr 05, 2012 at 12:49:37PM -0400, Tom Lane wrote:

Marko Kreen <markokr@gmail.com> writes:

Minor cleanups:

* Change callback return value to be 'bool': 0 is error.
Currently the accepted return codes are 1 and -1,
which is weird.

No, I did it that way intentionally, with the thought that we might add
back return code zero (or other return codes) in the future. If we go
with bool then sensible expansion is impossible, or at least ugly.
(I think it was you that objected to 0/1/2 in the first place, no?)

Well, I was the one who added the 0/1/2 in the first place,
then I noticed that -1/0/1 works better as a triple.

But the early-exit from callback creates unnecessary fragility
so now I'm convinced we don't want to do it that way.

If we happen to have the 'early-exit' logic in the future,
it should not work via callback return code.

This assumption seems unproven to me, and even if it were,
it doesn't mean we will never have any other exit codes.

I cannot even imagine why we would want new codes, so it seems
like unnecessary mess in API.

But it's a minor issue, so if you intend to keep it, I won't
push it further.

* Support exceptions in multi-statement PQexec() by storing
finished result under PGconn temporarily. Without doing it,
the result can leak if callback longjmps while processing
next result.

I'm unconvinced this is a good thing either.

This is less minor issue. Do we support longjmp() or not?

Why are you convinced that it's not needed?

--
marko