WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

Started by Ivan Trofimovabout 2 years ago5 messages
#1Ivan Trofimov
i.trofimow@yandex.ru
1 attachment(s)

Hi!

Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when
executing a prepared statement.

The response for that D message is a RowDescription, which doesn't
change during prepared

statement lifetime (with the attributes format being an exception, as
they aren't know before execution) .

In a presumably very common case of repeatedly executing the same
statement, this leads to

both client and server parsing/sending exactly the same RowDescritpion
data over and over again.

Instead, library user could acquire a statement result RowDescription
once (via PQdescribePrepared),

and reuse it in subsequent calls to PQexecPrepared and/or its async
friends. Doing it this way saves

a measurable amount of CPU for both client and server and saves a lot of
network traffic, for example:

when selecting a single row from a table with 30 columns, where each
column has 10-symbols name, and

every value in a row is a 10-symbols TEXT, i'm seeing an amount of bytes
sent to client decreased

by a factor of 2.8, and the CPU time client spends in userland decreased
by a factor of ~1.5.

The patch attached adds a family of functions

PQsendQueryPreparedPredescribed, PQgetResultPredescribed,
PQisBusyPredescribed,

which allow a user to do just that.

If the idea seems reasonable i'd be happy to extend these to
PQexecPrepared as well and cover it with tests.

P.S. This is my first time ever sending a patch via email, so please
don't hesitate to point at mistakes

i'm doing in the process.

Attachments:

PQsendQueryPreparedPredescribed.patchtext/x-patch; charset=UTF-8; name=PQsendQueryPreparedPredescribed.patchDownload
diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt
index 850734ac96..989354ca06 100644
--- a/src/interfaces/libpq/exports.txt
+++ b/src/interfaces/libpq/exports.txt
@@ -191,3 +191,6 @@ PQclosePrepared           188
 PQclosePortal             189
 PQsendClosePrepared       190
 PQsendClosePortal         191
+PQsendQueryPreparedPredescribed     192
+PQgetResultPredescribed             193
+PQisBusyPredescribed                194
diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c
index 04610ccf5e..a18bead9e6 100644
--- a/src/interfaces/libpq/fe-exec.c
+++ b/src/interfaces/libpq/fe-exec.c
@@ -72,8 +72,19 @@ static int	PQsendQueryGuts(PGconn *conn,
 							const char *const *paramValues,
 							const int *paramLengths,
 							const int *paramFormats,
-							int resultFormat);
-static void parseInput(PGconn *conn);
+							int resultFormat,
+							bool sendDescribe);
+static int	PQsendQueryPreparedInternal(PGconn *conn,
+										const char *stmtName,
+										int nParams,
+										const char *const *paramValues,
+										const int *paramLengths,
+										const int *paramFormats,
+										int resultFormat,
+										bool sendDescribe);
+static int	PQisBusyInternal(PGconn *conn, PGresult *description);
+static PGresult *PQgetResultInternal(PGconn *conn, PGresult *description);
+static void parseInput(PGconn *conn, PGresult *description);
 static PGresult *getCopyResult(PGconn *conn, ExecStatusType copytype);
 static bool PQexecStart(PGconn *conn);
 static PGresult *PQexecFinish(PGconn *conn);
@@ -1528,7 +1539,8 @@ PQsendQueryParams(PGconn *conn,
 						   paramValues,
 						   paramLengths,
 						   paramFormats,
-						   resultFormat);
+						   resultFormat,
+						   true /* send Describe */ );
 }
 
 /*
@@ -1643,6 +1655,26 @@ PQsendQueryPrepared(PGconn *conn,
 					const int *paramLengths,
 					const int *paramFormats,
 					int resultFormat)
+{
+	return PQsendQueryPreparedInternal(conn,
+									   stmtName,
+									   nParams,
+									   paramValues,
+									   paramLengths,
+									   paramFormats,
+									   resultFormat,
+									   true /* send Describe */ );
+}
+
+int
+PQsendQueryPreparedInternal(PGconn *conn,
+							const char *stmtName,
+							int nParams,
+							const char *const *paramValues,
+							const int *paramLengths,
+							const int *paramFormats,
+							int resultFormat,
+							bool sendDescribe)
 {
 	if (!PQsendQueryStart(conn, true))
 		return 0;
@@ -1668,7 +1700,50 @@ PQsendQueryPrepared(PGconn *conn,
 						   paramValues,
 						   paramLengths,
 						   paramFormats,
-						   resultFormat);
+						   resultFormat,
+						   sendDescribe);
+}
+
+int
+PQsendQueryPreparedPredescribed(PGconn *conn,
+								const char *stmtName,
+								int nParams,
+								const char *const *paramValues,
+								const int *paramLengths,
+								const int *paramFormats,
+								int resultFormat,
+								PGresult *description)
+{
+	int			i;
+	int			nFields;
+	int			send_result;
+
+	if (!description)
+	{
+		libpq_append_conn_error(conn, "query result description is a null pointer");
+		return 0;
+	}
+
+	send_result = PQsendQueryPreparedInternal(conn,
+											  stmtName,
+											  nParams,
+											  paramValues,
+											  paramLengths,
+											  paramFormats,
+											  resultFormat,
+											  false /* send Describe */ );
+
+	/* We only need to adjust attributes format if send succeeded */
+	if (send_result && description->attDescs != NULL)
+	{
+		nFields = description->numAttributes;
+		for (i = 0; i < nFields; i++)
+		{
+			description->attDescs[i].format = resultFormat;
+		}
+	}
+
+	return send_result;
 }
 
 /*
@@ -1766,7 +1841,8 @@ PQsendQueryGuts(PGconn *conn,
 				const char *const *paramValues,
 				const int *paramLengths,
 				const int *paramFormats,
-				int resultFormat)
+				int resultFormat,
+				bool sendDescribe)
 {
 	int			i;
 	PGcmdQueueEntry *entry;
@@ -1873,12 +1949,15 @@ PQsendQueryGuts(PGconn *conn,
 	if (pqPutMsgEnd(conn) < 0)
 		goto sendFailed;
 
-	/* construct the Describe Portal message */
-	if (pqPutMsgStart(PqMsg_Describe, conn) < 0 ||
-		pqPutc('P', conn) < 0 ||
-		pqPuts("", conn) < 0 ||
-		pqPutMsgEnd(conn) < 0)
-		goto sendFailed;
+	if (sendDescribe)
+	{
+		/* construct the Describe Portal message */
+		if (pqPutMsgStart(PqMsg_Describe, conn) < 0 ||
+			pqPutc('P', conn) < 0 ||
+			pqPuts("", conn) < 0 ||
+			pqPutMsgEnd(conn) < 0)
+			goto sendFailed;
+	}
 
 	/* construct the Execute message */
 	if (pqPutMsgStart(PqMsg_Execute, conn) < 0 ||
@@ -1990,9 +2069,9 @@ PQconsumeInput(PGconn *conn)
  * Note that this function will NOT attempt to read more data from the backend.
  */
 static void
-parseInput(PGconn *conn)
+parseInput(PGconn *conn, PGresult *description)
 {
-	pqParseInput3(conn);
+	pqParseInput3Predescribed(conn, description);
 }
 
 /*
@@ -2002,12 +2081,18 @@ parseInput(PGconn *conn)
 
 int
 PQisBusy(PGconn *conn)
+{
+	return PQisBusyInternal(conn, NULL);
+}
+
+static int
+PQisBusyInternal(PGconn *conn, PGresult *description)
 {
 	if (!conn)
 		return false;
 
 	/* Parse any available data, if our state permits. */
-	parseInput(conn);
+	parseInput(conn, description);
 
 	/*
 	 * PQgetResult will return immediately in all states except BUSY.  Also,
@@ -2020,6 +2105,12 @@ PQisBusy(PGconn *conn)
 	return conn->asyncStatus == PGASYNC_BUSY && conn->status != CONNECTION_BAD;
 }
 
+int
+PQisBusyPredescribed(PGconn *conn, PGresult *description)
+{
+	return PQisBusyInternal(conn, description);
+}
+
 /*
  * PQgetResult
  *	  Get the next PGresult produced by a query.  Returns NULL if no
@@ -2033,6 +2124,12 @@ PQisBusy(PGconn *conn)
  */
 PGresult *
 PQgetResult(PGconn *conn)
+{
+	return PQgetResultInternal(conn, NULL);
+}
+
+static PGresult *
+PQgetResultInternal(PGconn *conn, PGresult *description)
 {
 	PGresult   *res;
 
@@ -2040,7 +2137,7 @@ PQgetResult(PGconn *conn)
 		return NULL;
 
 	/* Parse any available data, if our state permits. */
-	parseInput(conn);
+	parseInput(conn, description);
 
 	/* If not ready to return something, block until we are. */
 	while (conn->asyncStatus == PGASYNC_BUSY)
@@ -2078,7 +2175,7 @@ PQgetResult(PGconn *conn)
 		}
 
 		/* Parse it. */
-		parseInput(conn);
+		parseInput(conn, description);
 
 		/*
 		 * If we had a write error, but nothing above obtained a query result
@@ -2182,6 +2279,12 @@ PQgetResult(PGconn *conn)
 	return res;
 }
 
+PGresult *
+PQgetResultPredescribed(PGconn *conn, PGresult *description)
+{
+	return PQgetResultInternal(conn, description);
+}
+
 /*
  * getCopyResult
  *	  Helper for PQgetResult: generate result for COPY-in-progress cases
@@ -2638,7 +2741,7 @@ PQnotifies(PGconn *conn)
 		return NULL;
 
 	/* Parse any available data to see if we can extract NOTIFY messages. */
-	parseInput(conn);
+	parseInput(conn, NULL);
 
 	event = conn->notifyHead;
 	if (event)
@@ -2677,7 +2780,7 @@ PQputCopyData(PGconn *conn, const char *buffer, int nbytes)
 	 * input data into the input buffer happens down inside pqSendSome, but
 	 * it's not authorized to get rid of the data again.)
 	 */
-	parseInput(conn);
+	parseInput(conn, NULL);
 
 	if (nbytes > 0)
 	{
diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 5613c56b14..8fa5853f84 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -55,6 +55,7 @@ static void reportErrorPosition(PQExpBuffer msg, const char *query,
 								int loc, int encoding);
 static int	build_startup_packet(const PGconn *conn, char *packet,
 								 const PQEnvironmentOption *options);
+static void pqParseInput3Internal(PGconn *conn, PGresult *description);
 
 
 /*
@@ -62,8 +63,8 @@ static int	build_startup_packet(const PGconn *conn, char *packet,
  * until input is exhausted or a stopping state is reached.
  * Note that this function will NOT attempt to read more data from the backend.
  */
-void
-pqParseInput3(PGconn *conn)
+static void
+pqParseInput3Internal(PGconn *conn, PGresult *description)
 {
 	char		id;
 	int			msgLength;
@@ -402,11 +403,38 @@ pqParseInput3(PGconn *conn)
 					}
 					else
 					{
-						/* Set up to report error at end of query */
-						libpq_append_conn_error(conn, "server sent data (\"D\" message) without prior row description (\"T\" message)");
-						pqSaveErrorResult(conn);
-						/* Discard the unexpected message */
-						conn->inCursor += msgLength;
+						if (!description)
+						{
+							/* Set up to report error at end of query */
+							libpq_append_conn_error(conn,
+													"server sent data (\"D\" message) without prior row description (\"T\" message)");
+							pqSaveErrorResult(conn);
+							/* Discard the unexpected message */
+							conn->inCursor += msgLength;
+						}
+						else
+						{
+							/*
+							 * We didn't receive row description from the
+							 * server, but we have externally provided
+							 * description, use that.
+							 */
+							if (conn->result == NULL)
+							{
+								conn->result = PQmakeEmptyPGresult(conn, PGRES_TUPLES_OK);
+							}
+							if (!conn->result)
+							{
+								libpq_append_conn_error(conn, "out of memory");
+								pqSaveErrorResult(conn);
+							}
+							else
+							{
+								PQsetResultAttrs(conn->result, description->numAttributes, description->attDescs);
+								if (getAnotherTuple(conn, msgLength))
+									return;
+							}
+						}
 					}
 					break;
 				case PqMsg_CopyInResponse:
@@ -478,6 +506,18 @@ pqParseInput3(PGconn *conn)
 	}
 }
 
+void
+pqParseInput3(PGconn *conn)
+{
+	pqParseInput3Internal(conn, NULL);
+}
+
+void
+pqParseInput3Predescribed(PGconn *conn, PGresult *description)
+{
+	pqParseInput3Internal(conn, description);
+}
+
 /*
  * handleSyncLoss: clean up after loss of message-boundary sync
  *
diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h
index 97762d56f5..573b68fea2 100644
--- a/src/interfaces/libpq/libpq-fe.h
+++ b/src/interfaces/libpq/libpq-fe.h
@@ -462,11 +462,21 @@ extern int	PQsendQueryPrepared(PGconn *conn,
 								const int *paramLengths,
 								const int *paramFormats,
 								int resultFormat);
+extern int	PQsendQueryPreparedPredescribed(PGconn *conn,
+											const char *stmtName,
+											int nParams,
+											const char *const *paramValues,
+											const int *paramLengths,
+											const int *paramFormats,
+											int resultFormat,
+											PGresult *description);
 extern int	PQsetSingleRowMode(PGconn *conn);
 extern PGresult *PQgetResult(PGconn *conn);
+extern PGresult *PQgetResultPredescribed(PGconn *conn, PGresult *description);
 
 /* Routines for managing an asynchronous query */
 extern int	PQisBusy(PGconn *conn);
+extern int	PQisBusyPredescribed(PGconn *conn, PGresult *description);
 extern int	PQconsumeInput(PGconn *conn);
 
 /* Routines for pipeline mode management */
diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
index c745facfec..df91c39fbe 100644
--- a/src/interfaces/libpq/libpq-int.h
+++ b/src/interfaces/libpq/libpq-int.h
@@ -706,6 +706,7 @@ extern int	PQsendQueryContinue(PGconn *conn, const char *query);
 extern char *pqBuildStartupPacket3(PGconn *conn, int *packetlen,
 								   const PQEnvironmentOption *options);
 extern void pqParseInput3(PGconn *conn);
+extern void pqParseInput3Predescribed(PGconn *conn, PGresult *description);
 extern int	pqGetErrorNotice3(PGconn *conn, bool isError);
 extern void pqBuildErrorMessage3(PQExpBuffer msg, const PGresult *res,
 								 PGVerbosity verbosity, PGContextVisibility show_context);
#2Andrey M. Borodin
x4mmm@yandex-team.ru
In reply to: Ivan Trofimov (#1)
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

Hi Ivan,

thank you for the patch.

On 22 Nov 2023, at 03:58, Ivan Trofimov <i.trofimow@yandex.ru> wrote:

Currently libpq sends B(ind), D(escribe), E(execute), S(ync) when executing a prepared statement.
The response for that D message is a RowDescription, which doesn't change during prepared
statement lifetime (with the attributes format being an exception, as they aren't know before execution) .

From my POV the idea seems reasonable (though I’m not a real libpq expert).
BTW some drivers also send Describe even before Bind. This creates some fuss for routing connection poolers.

In a presumably very common case of repeatedly executing the same statement, this leads to
both client and server parsing/sending exactly the same RowDescritpion data over and over again.
Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared),
and reuse it in subsequent calls to PQexecPrepared and/or its async friends.

But what if query result structure changes? Will we detect this error gracefully and return correct error?

Best regards, Andrey Borodin.

#3Ivan Trofimov
i.trofimow@yandex.ru
In reply to: Andrey M. Borodin (#2)
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

In a presumably very common case of repeatedly executing the same statement, this leads to
both client and server parsing/sending exactly the same RowDescritpion data over and over again.
Instead, library user could acquire a statement result RowDescription once (via PQdescribePrepared),
and reuse it in subsequent calls to PQexecPrepared and/or its async friends.

But what if query result structure changes? Will we detect this error gracefully and return correct error?

Afaik changing prepared statement result structure is prohibited by
Postgres server-side, and should always lead to "ERROR: cached plan
must not change result type", see src/test/regress/sql/plancache.sql.

So yes, from the libpq point of view this is just an server error, which
would be given to the user, the patch shouldn't change any behavior here.

The claim about this always being a server-side error better be
reassured from someone from the Postgres team, of course.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ivan Trofimov (#3)
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

Ivan Trofimov <i.trofimow@yandex.ru> writes:

Afaik changing prepared statement result structure is prohibited by
Postgres server-side, and should always lead to "ERROR: cached plan
must not change result type", see src/test/regress/sql/plancache.sql.

Independently of whether we're willing to guarantee that that will
never change, I think this patch is basically a bad idea as presented.
It adds a whole new set of programmer-error possibilities, and I doubt
it saves enough in typical cases to justify creating such a foot-gun.

Moreover, it will force us to devote attention to the problem of
keeping libpq itself from misbehaving badly in the inevitable
situation that somebody passes the wrong tuple descriptor.
That is developer effort we could better spend elsewhere.

I say this as somebody who deliberately designed the v3 protocol
to allow clients to skip Describe steps if they were confident
they knew the query result type. I am not disavowing that choice;
I just think that successful use of that option requires a client-
side coding structure that allows tying a previously-obtained
tuple descriptor to the current query with confidence. The proposed
API fails badly at that, or at least leaves it up to the end-user
programmer while providing no tools to help her get it right.

Instead, I'm tempted to suggest having PQprepare/PQexecPrepared
maintain a cache that maps statement name to result tupdesc, so that
this is all handled internally to libpq. The main hole in that idea
is that it's possible to issue PREPARE, DEALLOCATE, etc via PQexec, so
that a user could possibly redefine a prepared statement without libpq
noticing it. Maybe that's not a big issue. For a little more safety,
we could add some extra step that the library user has to take to
enable caching of result tupdescs, whereupon it's definitely caller
error if she does that and then changes the statement behind our back.

BTW, the submitted patch lacks both documentation and tests.
For a feature like this, there is much to be said for writing
the documentation *first*. Having to explain how to use something
often helps you figure out weak spots in your initial design.

regards, tom lane

#5Ivan Trofimov
i.trofimow@yandex.ru
In reply to: Tom Lane (#4)
Re: WIP: libpq: add a possibility to not send D(escribe) when executing a prepared statement

Hi Tom! Thank you for considering this.

It adds a whole new set of programmer-error possibilities, and I doubt
it saves enough in typical cases to justify creating such a foot-gun.

Although I agree it adds a considerable amount of complexity, I'd argue
it doesn't bring the complexity to a new level, since matching queries
against responses is a concept users of asynchronous processing are
already familiar with, especially so when pipelining is in play.

In case of a single-row select this can easily save as much as a half of
the network traffic, which is likely to be encrypted/decrypted through
multiple hops (a connection-pooler, for example), and has to be
serialized/parsed on a server, a client, a pooler etc.
For example, i have a service which bombards its Postgres database with
~20kRPS of "SELECT * FROM users WHERE id=$1", with "users" being a table
of just a bunch of textual ids, a couple of timestamps and some enums in
it, and for that service alone this change would save
~10Megabytes of server-originated traffic per second, and i have
hundreds of such services at my workplace.

I can provide more elaborate network/CPU measurements of different
workloads if needed.

Instead, I'm tempted to suggest having PQprepare/PQexecPrepared
maintain a cache that maps statement name to result tupdesc, so that
this is all handled internally to libpq

From a perspective of someone who maintains a library built on top of
libpq and is familiar with other such libraries, I think this is much
easier done on the level above libpq, simply because there is more
control of when and how invalidation/eviction is done, and the level
above also has a more straightforward way to access the cache across
different asynchronous processing points.

I just think that successful use of that option requires a client-
side coding structure that allows tying a previously-obtained
tuple descriptor to the current query with confidence. The proposed
API fails badly at that, or at least leaves it up to the end-user
programmer while providing no tools to help her get it right

I understand your concerns of usability/safety of what I propose, and I
think I have an idea of how to make this much less of a foot-gun: what
if we add a new function

PGresult *
PQexecPreparedPredescribed(PGconn *conn,
const char *stmtName,
PGresult* description,
...);
which requires both a prepared statement and its tuple descriptor (or
these two could even be tied together by a struct), and exposes its
implementation (basically what I've prototyped in the patch) in the
libpq-int.h?

This way users of synchronous API get a nice thing too, which is
arguably pretty hard to misuse:
if the description isn't available upfront then there's no point to
reach for the function added since PQexecPrepared is strictly better
performance/usability-wise, and if the description is available it's
most likely cached alongside the statement.
If a user still manages to provide an erroneous description, well,
they either get a parsing error or the erroneous description back,
I don't see how libpq could misbehave badly here.

Exposure of the implementation in the internal includes gives a
possibility for users to juggle the actual foot-gun, but implies they
know very well what they are doing, and are ready to be on their own.

What do you think of such approach?