Libpq COPY optimization

Started by Alon Goldshuvabout 20 years ago7 messages
#1Alon Goldshuv
agoldshuv@greenplum.com
1 attachment(s)

The following is a suggestion for optimizing the libpq COPY FROM call for
better performance. I submitted a similar suggestion awhile ago, but it
wasn't safe enough. This one is better. It shows a pretty significant
improvement while maintaining deadlock prevention.

The change is localized to PQputCopyData, not requiring
an alternate version of pqPutMsgEnd.

The change is that before returning, PQputCopyData would process inbound
NOTICEs *only if* the buffer was flushed. (and drop the other
PQconsumeInput/parseInput calls in PQputCopyData)

At the end of the copy operation, libpq's caller will call
PQputCopyEnd followed by PQgetResult. PQputCopyEnd flushes
the buffer, but I don't think we need to add any PQconsumeInput/
parseInput there, because any remaining inbound NOTICEs will be
handled immediately afterward when libpq's caller calls PQgetResult.

Alon.

Attachments:

interrupt-checking-patch-for-libpq.patch.txtapplication/octet-stream; name=interrupt-checking-patch-for-libpq.patch.txtDownload
Index: cdb-pg/src/interfaces/libpq/fe-exec.c
===================================================================
RCS file: /opt/cvsroot/cdb2/cdb-pg/src/interfaces/libpq/fe-exec.c,v
diff -u -N -r1.2 -r1.3
--- cdb-pg/src/interfaces/libpq/fe-exec.c	30 Jan 2005 03:27:47 -0000	1.2
+++ cdb-pg/src/interfaces/libpq/fe-exec.c	28 Oct 2005 14:30:50 -0000	1.3
@@ -1465,18 +1465,10 @@
 		return -1;
 	}
 
-	/*
-	 * Check for NOTICE messages coming back from the server.  Since the
-	 * server might generate multiple notices during the COPY, we have to
-	 * consume those in a reasonably prompt fashion to prevent the comm
-	 * buffers from filling up and possibly blocking the server.
-	 */
-	if (!PQconsumeInput(conn))
-		return -1;				/* I/O failure */
-	parseInput(conn);
-
 	if (nbytes > 0)
 	{
+		int oc = conn->outCount + nbytes;
+				
 		/*
 		 * Try to flush any previously sent data in preference to growing
 		 * the output buffer.  If we can't enlarge the buffer enough to
@@ -1485,9 +1477,10 @@
 		 * protocol 2.0 case.)
 		 */
 		if ((conn->outBufSize - conn->outCount - 5) < nbytes)
-		{
+		{			
 			if (pqFlush(conn) < 0)
 				return -1;
+						
 			if (pqCheckOutBufferSpace(conn->outCount + 5 + nbytes, conn))
 				return pqIsnonblocking(conn) ? 0 : -1;
 		}
@@ -1506,6 +1499,21 @@
 				pqPutMsgEnd(conn) < 0)
 				return -1;
 		}
+		
+		/* if some data was sent to the server */
+		if (oc > conn->outCount)
+		{
+			/*
+			 * Check for NOTICE messages coming back from the server.  Since the
+			 * server might generate multiple notices during the COPY, we have to
+			 * consume those in a reasonably prompt fashion to prevent the comm
+			 * buffers from filling up and possibly blocking the server.
+			 */			
+			if (!PQconsumeInput(conn))
+				return -1;
+			parseInput(conn);
+		}
+		
 	}
 	return 1;
 }
#2Luke Lonergan
llonergan@greenplum.com
In reply to: Alon Goldshuv (#1)
Re: Libpq COPY optimization

A note on this ­ we see a huge performance benefit from this change on
Solaris, so much so that it should be mandatory for that platform. Solaris¹
error handling is deeper, and so the repeated redundant interrupts that this
patch avoids causes 60% of the CPU to be consumed during COPY. After this
patch, this work disappears from the profile and the COPY speed is much
improved.

There are also big speed improvements on Linux, on the order of 40% at high
speeds. Using Bizgres MPP, this change allowed us to increase from 47 MB/s
loading to 70+ MB/s loading speed (250GB/hour).

- Luke

On 1/8/06 6:00 AM, "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:

Show quoted text

The following is a suggestion for optimizing the libpq COPY FROM call for
better performance. I submitted a similar suggestion awhile ago, but it
wasn't safe enough. This one is better. It shows a pretty significant
improvement while maintaining deadlock prevention.

The change is localized to PQputCopyData, not requiring
an alternate version of pqPutMsgEnd.

The change is that before returning, PQputCopyData would process inbound
NOTICEs *only if* the buffer was flushed. (and drop the other
PQconsumeInput/parseInput calls in PQputCopyData)

At the end of the copy operation, libpq's caller will call
PQputCopyEnd followed by PQgetResult. PQputCopyEnd flushes
the buffer, but I don't think we need to add any PQconsumeInput/
parseInput there, because any remaining inbound NOTICEs will be
handled immediately afterward when libpq's caller calls PQgetResult.

Alon.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alon Goldshuv (#1)
Re: Libpq COPY optimization

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

The following is a suggestion for optimizing the libpq COPY FROM call for
better performance. I submitted a similar suggestion awhile ago, but it
wasn't safe enough. This one is better.

It's not enough better, because it will still deadlock given a
sufficiently large message-to-send. I don't think you can postpone
the clearing-input action until after all the data is sent.

regards, tom lane

#4Alon Goldshuv
agoldshuv@greenplum.com
In reply to: Tom Lane (#3)
Re: Libpq COPY optimization

Tom,

It's not enough better, because it will still deadlock given a
sufficiently large message-to-send. I don't think you can postpone
the clearing-input action until after all the data is sent.

regards, tom lane

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock. I'll
look at the code again, maybe I missed something.

Thx,
Alon.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Alon Goldshuv (#4)
Re: Libpq COPY optimization

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock.

Hm, I had forgotten that the low-level pqSendSome routine does that.
That makes the PQconsumeInput call in PQputCopyData redundant (or
almost; see below). The parseInput call is still needed, because it's
there to pull NOTICE messages out of the input buffer and get rid of
them, rather than possibly having the input buffer grow to exceed
memory. But when there's nothing for it to do, parseInput is cheap
enough that there's no real need to bypass it.

In short, if you just remove the PQconsumeInput call I think you'll find
that it does what you want.

The only case where it's helpful to have it there is if there's a
incomplete message in the input buffer, as parseInput isn't quite so
fast if it has to determine that the message is incomplete. Without
the PQconsumeInput call, the incomplete-message state could persist
for a long time, and you'd pay the parseInput overhead each time through
PQputCopyData. However, that's certainly not the normal situation,
so I think we could leave that case slightly pessimal. It's certainly
true that that path in parseInput is a lot faster than a kernel call,
so it'd still be better than it is now.

regards, tom lane

#6Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Tom Lane (#5)
Re: Libpq COPY optimization

Can I get an updated patch for this?

---------------------------------------------------------------------------

Tom Lane wrote:

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock.

Hm, I had forgotten that the low-level pqSendSome routine does that.
That makes the PQconsumeInput call in PQputCopyData redundant (or
almost; see below). The parseInput call is still needed, because it's
there to pull NOTICE messages out of the input buffer and get rid of
them, rather than possibly having the input buffer grow to exceed
memory. But when there's nothing for it to do, parseInput is cheap
enough that there's no real need to bypass it.

In short, if you just remove the PQconsumeInput call I think you'll find
that it does what you want.

The only case where it's helpful to have it there is if there's a
incomplete message in the input buffer, as parseInput isn't quite so
fast if it has to determine that the message is incomplete. Without
the PQconsumeInput call, the incomplete-message state could persist
for a long time, and you'd pay the parseInput overhead each time through
PQputCopyData. However, that's certainly not the normal situation,
so I think we could leave that case slightly pessimal. It's certainly
true that that path in parseInput is a lot faster than a kernel call,
so it'd still be better than it is now.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#7Alon Goldshuv
agoldshuv@greenplum.com
In reply to: Bruce Momjian (#6)
Re: Libpq COPY optimization

I'll send it to -patches shortly

Alon.

On 1/23/06 10:20 PM, "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:

Show quoted text

Can I get an updated patch for this?

---------------------------------------------------------------------------

Tom Lane wrote:

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock.

Hm, I had forgotten that the low-level pqSendSome routine does that.
That makes the PQconsumeInput call in PQputCopyData redundant (or
almost; see below). The parseInput call is still needed, because it's
there to pull NOTICE messages out of the input buffer and get rid of
them, rather than possibly having the input buffer grow to exceed
memory. But when there's nothing for it to do, parseInput is cheap
enough that there's no real need to bypass it.

In short, if you just remove the PQconsumeInput call I think you'll find
that it does what you want.

The only case where it's helpful to have it there is if there's a
incomplete message in the input buffer, as parseInput isn't quite so
fast if it has to determine that the message is incomplete. Without
the PQconsumeInput call, the incomplete-message state could persist
for a long time, and you'd pay the parseInput overhead each time through
PQputCopyData. However, that's certainly not the normal situation,
so I think we could leave that case slightly pessimal. It's certainly
true that that path in parseInput is a lot faster than a kernel call,
so it'd still be better than it is now.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match