Libpq COPY optimization

Started by Alon Goldshuvabout 20 years ago7 messages

agoldshuv@greenplum.com

about 20 years ago

1 attachment(s)

The following is a suggestion for optimizing the libpq COPY FROM call for
better performance. I submitted a similar suggestion awhile ago, but it
wasn't safe enough. This one is better. It shows a pretty significant
improvement while maintaining deadlock prevention.

The change is localized to PQputCopyData, not requiring
an alternate version of pqPutMsgEnd.

The change is that before returning, PQputCopyData would process inbound
NOTICEs *only if* the buffer was flushed. (and drop the other
PQconsumeInput/parseInput calls in PQputCopyData)

At the end of the copy operation, libpq's caller will call
PQputCopyEnd followed by PQgetResult. PQputCopyEnd flushes
the buffer, but I don't think we need to add any PQconsumeInput/
parseInput there, because any remaining inbound NOTICEs will be
handled immediately afterward when libpq's caller calls PQgetResult.

Alon.

Attachments:

interrupt-checking-patch-for-libpq.patch.txtapplication/octet-stream; name=interrupt-checking-patch-for-libpq.patch.txtDownload

Index: cdb-pg/src/interfaces/libpq/fe-exec.c
===================================================================
RCS file: /opt/cvsroot/cdb2/cdb-pg/src/interfaces/libpq/fe-exec.c,v
diff -u -N -r1.2 -r1.3
--- cdb-pg/src/interfaces/libpq/fe-exec.c	30 Jan 2005 03:27:47 -0000	1.2
+++ cdb-pg/src/interfaces/libpq/fe-exec.c	28 Oct 2005 14:30:50 -0000	1.3
@@ -1465,18 +1465,10 @@
 		return -1;
 	}
 
-	/*
-	 * Check for NOTICE messages coming back from the server.  Since the
-	 * server might generate multiple notices during the COPY, we have to
-	 * consume those in a reasonably prompt fashion to prevent the comm
-	 * buffers from filling up and possibly blocking the server.
-	 */
-	if (!PQconsumeInput(conn))
-		return -1;				/* I/O failure */
-	parseInput(conn);
-
 	if (nbytes > 0)
 	{
+		int oc = conn->outCount + nbytes;
+				
 		/*
 		 * Try to flush any previously sent data in preference to growing
 		 * the output buffer.  If we can't enlarge the buffer enough to
@@ -1485,9 +1477,10 @@
 		 * protocol 2.0 case.)
 		 */
 		if ((conn->outBufSize - conn->outCount - 5) < nbytes)
-		{
+		{			
 			if (pqFlush(conn) < 0)
 				return -1;
+						
 			if (pqCheckOutBufferSpace(conn->outCount + 5 + nbytes, conn))
 				return pqIsnonblocking(conn) ? 0 : -1;
 		}
@@ -1506,6 +1499,21 @@
 				pqPutMsgEnd(conn) < 0)
 				return -1;
 		}
+		
+		/* if some data was sent to the server */
+		if (oc > conn->outCount)
+		{
+			/*
+			 * Check for NOTICE messages coming back from the server.  Since the
+			 * server might generate multiple notices during the COPY, we have to
+			 * consume those in a reasonably prompt fashion to prevent the comm
+			 * buffers from filling up and possibly blocking the server.
+			 */			
+			if (!PQconsumeInput(conn))
+				return -1;
+			parseInput(conn);
+		}
+		
 	}
 	return 1;
 }

Luke Lonergan

llonergan@greenplum.com

about 20 years ago

In reply to: Alon Goldshuv (#1)

Re: Libpq COPY optimization

A note on this we see a huge performance benefit from this change on
Solaris, so much so that it should be mandatory for that platform. Solaris¹
error handling is deeper, and so the repeated redundant interrupts that this
patch avoids causes 60% of the CPU to be consumed during COPY. After this
patch, this work disappears from the profile and the COPY speed is much
improved.

There are also big speed improvements on Linux, on the order of 40% at high
speeds. Using Bizgres MPP, this change allowed us to increase from 47 MB/s
loading to 70+ MB/s loading speed (250GB/hour).

- Luke

On 1/8/06 6:00 AM, "Alon Goldshuv" <agoldshuv@greenplum.com> wrote:

Show quoted text

The following is a suggestion for optimizing the libpq COPY FROM call for
better performance. I submitted a similar suggestion awhile ago, but it
wasn't safe enough. This one is better. It shows a pretty significant
improvement while maintaining deadlock prevention.

The change is localized to PQputCopyData, not requiring
an alternate version of pqPutMsgEnd.

The change is that before returning, PQputCopyData would process inbound
NOTICEs *only if* the buffer was flushed. (and drop the other
PQconsumeInput/parseInput calls in PQputCopyData)

At the end of the copy operation, libpq's caller will call
PQputCopyEnd followed by PQgetResult. PQputCopyEnd flushes
the buffer, but I don't think we need to add any PQconsumeInput/
parseInput there, because any remaining inbound NOTICEs will be
handled immediately afterward when libpq's caller calls PQgetResult.

Alon.

Tom Lane

tgl@sss.pgh.pa.us

about 20 years ago

In reply to: Alon Goldshuv (#1)

Re: Libpq COPY optimization

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

The following is a suggestion for optimizing the libpq COPY FROM call for
better performance. I submitted a similar suggestion awhile ago, but it
wasn't safe enough. This one is better.

It's not enough better, because it will still deadlock given a
sufficiently large message-to-send. I don't think you can postpone
the clearing-input action until after all the data is sent.

regards, tom lane

Alon Goldshuv

agoldshuv@greenplum.com

about 20 years ago

In reply to: Tom Lane (#3)

Re: Libpq COPY optimization

Tom,

It's not enough better, because it will still deadlock given a
sufficiently large message-to-send. I don't think you can postpone
the clearing-input action until after all the data is sent.

regards, tom lane

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock. I'll
look at the code again, maybe I missed something.

Thx,
Alon.

Tom Lane

tgl@sss.pgh.pa.us

almost 20 years ago

In reply to: Alon Goldshuv (#4)

Re: Libpq COPY optimization

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock.

Hm, I had forgotten that the low-level pqSendSome routine does that.
That makes the PQconsumeInput call in PQputCopyData redundant (or
almost; see below). The parseInput call is still needed, because it's
there to pull NOTICE messages out of the input buffer and get rid of
them, rather than possibly having the input buffer grow to exceed
memory. But when there's nothing for it to do, parseInput is cheap
enough that there's no real need to bypass it.

In short, if you just remove the PQconsumeInput call I think you'll find
that it does what you want.

The only case where it's helpful to have it there is if there's a
incomplete message in the input buffer, as parseInput isn't quite so
fast if it has to determine that the message is incomplete. Without
the PQconsumeInput call, the incomplete-message state could persist
for a long time, and you'd pay the parseInput overhead each time through
PQputCopyData. However, that's certainly not the normal situation,
so I think we could leave that case slightly pessimal. It's certainly
true that that path in parseInput is a lot faster than a kernel call,
so it'd still be better than it is now.

regards, tom lane

Bruce Momjian

pgman@candle.pha.pa.us

almost 20 years ago

In reply to: Tom Lane (#5)

Re: Libpq COPY optimization

Can I get an updated patch for this?

---------------------------------------------------------------------------

Tom Lane wrote:

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock.

Hm, I had forgotten that the low-level pqSendSome routine does that.
That makes the PQconsumeInput call in PQputCopyData redundant (or
almost; see below). The parseInput call is still needed, because it's
there to pull NOTICE messages out of the input buffer and get rid of
them, rather than possibly having the input buffer grow to exceed
memory. But when there's nothing for it to do, parseInput is cheap
enough that there's no real need to bypass it.

In short, if you just remove the PQconsumeInput call I think you'll find
that it does what you want.

The only case where it's helpful to have it there is if there's a
incomplete message in the input buffer, as parseInput isn't quite so
fast if it has to determine that the message is incomplete. Without
the PQconsumeInput call, the incomplete-message state could persist
for a long time, and you'd pay the parseInput overhead each time through
PQputCopyData. However, that's certainly not the normal situation,
so I think we could leave that case slightly pessimal. It's certainly
true that that path in parseInput is a lot faster than a kernel call,
so it'd still be better than it is now.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Alon Goldshuv

agoldshuv@greenplum.com

almost 20 years ago

In reply to: Bruce Momjian (#6)

Re: Libpq COPY optimization

I'll send it to -patches shortly

Alon.

On 1/23/06 10:20 PM, "Bruce Momjian" <pgman@candle.pha.pa.us> wrote:

Show quoted text

Can I get an updated patch for this?

---------------------------------------------------------------------------

Tom Lane wrote:

"Alon Goldshuv" <agoldshuv@greenplum.com> writes:

Please help me understand this better. It appears to me that when the
client->backend pipe fills up, pqSendSome() consumes any incoming
NOTICE/WARNING messages before waiting, which should prevent deadlock.

Hm, I had forgotten that the low-level pqSendSome routine does that.
That makes the PQconsumeInput call in PQputCopyData redundant (or
almost; see below). The parseInput call is still needed, because it's
there to pull NOTICE messages out of the input buffer and get rid of
them, rather than possibly having the input buffer grow to exceed
memory. But when there's nothing for it to do, parseInput is cheap
enough that there's no real need to bypass it.

In short, if you just remove the PQconsumeInput call I think you'll find
that it does what you want.

The only case where it's helpful to have it there is if there's a
incomplete message in the input buffer, as parseInput isn't quite so
fast if it has to determine that the message is incomplete. Without
the PQconsumeInput call, the incomplete-message state could persist
for a long time, and you'd pay the parseInput overhead each time through
PQputCopyData. However, that's certainly not the normal situation,
so I think we could leave that case slightly pessimal. It's certainly
true that that path in parseInput is a lot faster than a kernel call,
so it'd still be better than it is now.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match