Improvements in Copy From

Started by vignesh Cover 5 years ago16 messages

vignesh21@gmail.com

over 5 years ago

1 attachment(s)

Hi,

While reviewing copy from I identified few improvements for copy from
that can be done :
a) copy from stdin copies lesser amount of data to buffer even though
space is available in buffer because minread was passed as 1 to
CopyGetData, Hence it only reads until the data read from libpq is
less than minread. This can be fixed by passing the actual space
available in buffer, this reduces the unnecessary frequent calls to
CopyGetData.
b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
that is not being used, it can be removed.
c) Copy from reads header line and do nothing for the header line, we
need not clear EOL & need not convert to server encoding for the
header line.

Attached patch has the changes for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-Improvements-in-copy-from.patchtext/x-patch; charset=US-ASCII; name=0001-Improvements-in-copy-from.patchDownload

From b86edd9bf4f4350598a0518996e64f40f13aea21 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Wed, 1 Jul 2020 17:51:51 +0530
Subject: [PATCH] Improvements in copy from.

There are 3 improvements for copy from in this patch which is detailed below:
a) copy from stdin copies lesser amount of data to buffer even though space is
available in buffer because minread was passed as 1 to CopyGetData, fixed it by
passing the actual space available in buffer, this reduces the frequent call to
CopyGetData.
b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter, this is not
being used, it has been removed.
c) Copy from reads header line and does nothing for the read line, we need not
clear EOL & need not convert to server encoding for the header line.
---
 src/backend/commands/copy.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3e199bd..dfb7d92 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -793,6 +793,7 @@ CopyLoadRawBuf(CopyState cstate)
 {
 	int			nbytes;
 	int			inbytes;
+	int         minread = 1;
 
 	if (cstate->raw_buf_index < cstate->raw_buf_len)
 	{
@@ -804,8 +805,11 @@ CopyLoadRawBuf(CopyState cstate)
 	else
 		nbytes = 0;				/* no data need be saved */
 
+	if (cstate->copy_dest == COPY_NEW_FE)
+		minread = RAW_BUF_SIZE - nbytes;
+
 	inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
-						  1, RAW_BUF_SIZE - nbytes);
+						  minread, RAW_BUF_SIZE - nbytes);
 	nbytes += inbytes;
 	cstate->raw_buf[nbytes] = '\0';
 	cstate->raw_buf_index = 0;
@@ -2603,8 +2607,7 @@ CopyMultiInsertInfoCleanup(CopyMultiInsertInfo *miinfo)
  * Callers must ensure that the buffer is not full.
  */
 static inline TupleTableSlot *
-CopyMultiInsertInfoNextFreeSlot(CopyMultiInsertInfo *miinfo,
-								ResultRelInfo *rri)
+CopyMultiInsertInfoNextFreeSlot(ResultRelInfo *rri)
 {
 	CopyMultiInsertBuffer *buffer = rri->ri_CopyMultiInsertBuffer;
 	int			nused = buffer->nused;
@@ -2969,8 +2972,7 @@ CopyFrom(CopyState cstate)
 			Assert(resultRelInfo == target_resultRelInfo);
 			Assert(insertMethod == CIM_MULTI);
 
-			myslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-													 resultRelInfo);
+			myslot = CopyMultiInsertInfoNextFreeSlot(resultRelInfo);
 		}
 
 		/*
@@ -3117,8 +3119,7 @@ CopyFrom(CopyState cstate)
 				/* no other path available for partitioned table */
 				Assert(insertMethod == CIM_MULTI_CONDITIONAL);
 
-				batchslot = CopyMultiInsertInfoNextFreeSlot(&multiInsertInfo,
-															resultRelInfo);
+				batchslot = CopyMultiInsertInfoNextFreeSlot(resultRelInfo);
 
 				if (map != NULL)
 					myslot = execute_attr_map_slot(map->attrMap, myslot,
@@ -3856,7 +3857,7 @@ CopyReadLine(CopyState cstate)
 			} while (CopyLoadRawBuf(cstate));
 		}
 	}
-	else
+	else if (!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		/*
 		 * If we didn't hit EOF, then we must have transferred the EOL marker
@@ -3890,8 +3891,9 @@ CopyReadLine(CopyState cstate)
 		}
 	}
 
-	/* Done reading the line.  Convert it to server encoding. */
-	if (cstate->need_transcoding)
+	/* Done reading the line.  Convert it to server encoding if not header. */
+	if (cstate->need_transcoding &&
+		!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		char	   *cvt;
 
-- 
1.8.3.1

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: vignesh C (#1)

Re: Improvements in Copy From

On Wed, Jul 1, 2020 at 6:16 PM vignesh C <vignesh21@gmail.com> wrote:

Attached patch has the changes for the same.
Thoughts?

Added a commitfest entry for this:
https://commitfest.postgresql.org/29/2642/

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: vignesh C (#1)

Re: Improvements in Copy From

On Thu, 2 Jul 2020 at 00:46, vignesh C <vignesh21@gmail.com> wrote:

b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
that is not being used, it can be removed.

This was raised in [1]/messages/by-id/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ@mail.gmail.com. We decided not to remove it.

David

[1]: /messages/by-id/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ@mail.gmail.com

David Rowley

dgrowleyml@gmail.com

over 5 years ago

In reply to: David Rowley (#3)

Re: Improvements in Copy From

On Tue, 14 Jul 2020 at 17:22, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 2 Jul 2020 at 00:46, vignesh C <vignesh21@gmail.com> wrote:

b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
that is not being used, it can be removed.

This was raised in [1]. We decided not to remove it.

I just added a comment to the function to mention why we want to keep
the parameter. I hope that will save any wasted time proposing its
removal in the future.

FWIW, the function is inlined. Removing it will gain us nothing
performance-wise anyway.

David

Show quoted text

[1] /messages/by-id/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ@mail.gmail.com

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: David Rowley (#4)

Re: Improvements in Copy From

On Tue, Jul 14, 2020 at 11:13 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 14 Jul 2020 at 17:22, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 2 Jul 2020 at 00:46, vignesh C <vignesh21@gmail.com> wrote:

b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
that is not being used, it can be removed.

This was raised in [1]. We decided not to remove it.

I just added a comment to the function to mention why we want to keep
the parameter. I hope that will save any wasted time proposing its
removal in the future.

FWIW, the function is inlined. Removing it will gain us nothing
performance-wise anyway.

David

[1] /messages/by-id/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ@mail.gmail.com

Thanks David for pointing it out, as this has been discussed and
concluded no point in discussing the same thing again. This patch has
a couple of other improvements which can still be taken forward. I
will remove this change and post a new patch to retain the other
issues that were fixed.

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: vignesh C (#5)

1 attachment(s)

Re: Improvements in Copy From

On Tue, Jul 14, 2020 at 12:17 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Jul 14, 2020 at 11:13 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 14 Jul 2020 at 17:22, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 2 Jul 2020 at 00:46, vignesh C <vignesh21@gmail.com> wrote:

b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
that is not being used, it can be removed.

This was raised in [1]. We decided not to remove it.

I just added a comment to the function to mention why we want to keep
the parameter. I hope that will save any wasted time proposing its
removal in the future.

FWIW, the function is inlined. Removing it will gain us nothing
performance-wise anyway.

David

[1] /messages/by-id/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ@mail.gmail.com

Thanks David for pointing it out, as this has been discussed and
concluded no point in discussing the same thing again. This patch has
a couple of other improvements which can still be taken forward. I
will remove this change and post a new patch to retain the other
issues that were fixed.

I have removed the changes that david had pointed out and retained the
remaining changes. Attaching the patch for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Attachments:

0001-Improvements-in-copy-from.patchtext/x-patch; charset=US-ASCII; name=0001-Improvements-in-copy-from.patchDownload

From fbafa5eaaa84028b3bbfb7cde0cbcc3963fd033a Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Tue, 14 Jul 2020 12:21:37 +0530
Subject: [PATCH] Improvements in copy from.

There are couple of improvements for copy from in this patch which is detailed
below:
a) copy from stdin copies lesser amount of data to buffer even though space is
available in buffer because minread was passed as 1 to CopyGetData, fixed it by
passing the actual space available in buffer, this reduces the frequent call to
CopyGetData.
b) Copy from reads header line and does nothing for the read line, we need not
clear EOL & need not convert to server encoding for the header line.
---
 src/backend/commands/copy.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 44da71c..bc27dfc 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -796,6 +796,7 @@ CopyLoadRawBuf(CopyState cstate)
 {
 	int			nbytes;
 	int			inbytes;
+	int			minread = 1;
 
 	if (cstate->raw_buf_index < cstate->raw_buf_len)
 	{
@@ -807,8 +808,11 @@ CopyLoadRawBuf(CopyState cstate)
 	else
 		nbytes = 0;				/* no data need be saved */
 
+	if (cstate->copy_dest == COPY_NEW_FE)
+		minread = RAW_BUF_SIZE - nbytes;
+
 	inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
-						  1, RAW_BUF_SIZE - nbytes);
+						  minread, RAW_BUF_SIZE - nbytes);
 	nbytes += inbytes;
 	cstate->raw_buf[nbytes] = '\0';
 	cstate->raw_buf_index = 0;
@@ -3869,7 +3873,7 @@ CopyReadLine(CopyState cstate)
 			} while (CopyLoadRawBuf(cstate));
 		}
 	}
-	else
+	else if (!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		/*
 		 * If we didn't hit EOF, then we must have transferred the EOL marker
@@ -3903,8 +3907,9 @@ CopyReadLine(CopyState cstate)
 		}
 	}
 
-	/* Done reading the line.  Convert it to server encoding. */
-	if (cstate->need_transcoding)
+	/* Done reading the line.  Convert it to server encoding if not header. */
+	if (cstate->need_transcoding &&
+		!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		char	   *cvt;
 
-- 
1.8.3.1

Peter Smith

smithpb2250@gmail.com

over 5 years ago

In reply to: vignesh C (#6)

Re: Improvements in Copy From

Hello.

FYI - that patch has conflicts when applied.

Kind Regards
Peter Smith
Fujitsu Australia.

Show quoted text

On Thu, Aug 27, 2020 at 3:11 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Jul 14, 2020 at 12:17 PM vignesh C <vignesh21@gmail.com> wrote:

On Tue, Jul 14, 2020 at 11:13 AM David Rowley <dgrowleyml@gmail.com> wrote:

On Tue, 14 Jul 2020 at 17:22, David Rowley <dgrowleyml@gmail.com> wrote:

On Thu, 2 Jul 2020 at 00:46, vignesh C <vignesh21@gmail.com> wrote:

b) CopyMultiInsertInfoNextFreeSlot had an unused function parameter
that is not being used, it can be removed.

This was raised in [1]. We decided not to remove it.

I just added a comment to the function to mention why we want to keep
the parameter. I hope that will save any wasted time proposing its
removal in the future.

FWIW, the function is inlined. Removing it will gain us nothing
performance-wise anyway.

David

[1] /messages/by-id/CAKJS1f-A5aYvPHe10Wy9LjC4RzLsBrya8b2gfuQHFabhwZT_NQ@mail.gmail.com

Thanks David for pointing it out, as this has been discussed and
concluded no point in discussing the same thing again. This patch has
a couple of other improvements which can still be taken forward. I
will remove this change and post a new patch to retain the other
issues that were fixed.

I have removed the changes that david had pointed out and retained the
remaining changes. Attaching the patch for the same.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: Peter Smith (#7)

1 attachment(s)

Re: Improvements in Copy From

On Thu, Aug 27, 2020 at 11:02 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hello.

FYI - that patch has conflicts when applied.

Thanks for letting me know. Attached new patch which is rebased on top of head.

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

Attachments:

v2-0001-Improvements-in-copy-from.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Improvements-in-copy-from.patchDownload

From a343fe1f8fdf4293d2ef6841e243390b99f29e28 Mon Sep 17 00:00:00 2001
From: Vignesh C <vignesh21@gmail.com>
Date: Sun, 30 Aug 2020 12:31:12 +0530
Subject: [PATCH v2] Improvements in copy from.

There are couple of improvements for copy from in this patch which is detailed
below:
a) copy from stdin copies lesser amount of data to buffer even though space is
available in buffer because minread was passed as 1 to CopyGetData, fixed it by
passing the actual space available in buffer, this reduces the frequent call to
CopyGetData.
b) Copy from reads header line and does nothing for the read line, we need not
clear EOL & need not convert to server encoding for the header line.
---
 src/backend/commands/copy.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index db7d24a..c688baa 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -801,14 +801,18 @@ CopyLoadRawBuf(CopyState cstate)
 {
 	int			nbytes = RAW_BUF_BYTES(cstate);
 	int			inbytes;
+	int			minread = 1;
 
 	/* Copy down the unprocessed data if any. */
 	if (nbytes > 0)
 		memmove(cstate->raw_buf, cstate->raw_buf + cstate->raw_buf_index,
 				nbytes);
 
+	if (cstate->copy_dest == COPY_NEW_FE)
+		minread = RAW_BUF_SIZE - nbytes;
+
 	inbytes = CopyGetData(cstate, cstate->raw_buf + nbytes,
-						  1, RAW_BUF_SIZE - nbytes);
+						  minread, RAW_BUF_SIZE - nbytes);
 	nbytes += inbytes;
 	cstate->raw_buf[nbytes] = '\0';
 	cstate->raw_buf_index = 0;
@@ -3917,7 +3921,7 @@ CopyReadLine(CopyState cstate)
 			} while (CopyLoadRawBuf(cstate));
 		}
 	}
-	else
+	else if (!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		/*
 		 * If we didn't hit EOF, then we must have transferred the EOL marker
@@ -3951,8 +3955,9 @@ CopyReadLine(CopyState cstate)
 		}
 	}
 
-	/* Done reading the line.  Convert it to server encoding. */
-	if (cstate->need_transcoding)
+	/* Done reading the line.  Convert it to server encoding if not header. */
+	if (cstate->need_transcoding &&
+		!(cstate->cur_lineno == 0 && cstate->header_line))
 	{
 		char	   *cvt;
 
-- 
1.8.3.1

Surafel Temesgen

surafel3000@gmail.com

over 5 years ago

In reply to: vignesh C (#1)

Re: Improvements in Copy From

Hi Vignesh

On Wed, Jul 1, 2020 at 3:46 PM vignesh C <vignesh21@gmail.com> wrote:

Hi,

While reviewing copy from I identified few improvements for copy from
that can be done :
a) copy from stdin copies lesser amount of data to buffer even though
space is available in buffer because minread was passed as 1 to
CopyGetData, Hence it only reads until the data read from libpq is
less than minread. This can be fixed by passing the actual space
available in buffer, this reduces the unnecessary frequent calls to
CopyGetData.

why not applying the same optimization on file read ?

c) Copy from reads header line and do nothing for the header line, we
need not clear EOL & need not convert to server encoding for the
header line.

We have a patch for column matching feature [1]. /messages/by-id/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com that may need a header line
to be further processed. Even without that I think it is preferable to
process the header line for nothing than adding those checks to the loop,
performance-wise.

[1]: . /messages/by-id/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com
/messages/by-id/CAF1-J-0PtCWMeLtswwGV2M70U26n4g33gpe1rcKQqe6wVQDrFA@mail.gmail.com

regards

Surafel

#10

Peter Smith

smithpb2250@gmail.com

over 5 years ago

In reply to: vignesh C (#8)

Re: Improvements in Copy From

My basic understanding of first part of your patch is that by
adjusting the "minread" it now allows it to loop multiple times
internally within the CopyGetData rather than calling CopyLoadRawBuf
for every N lines. There doesn't seem to be much change to what other
code gets executed so the saving is essentially whatever is the cost
of making 2 x function calls (CopyLoadRawBuff + CopyGetData) x N. Is
that understanding correct?

But with that change there seems to be opportunity for yet another
tiny saving possible. IIUC, now you are processing a lot more data
within the CopyGetData so it is now very likely that you will also
gobble the COPY_NEW_FE's 'c' marker. So cstate->reached_eof will be
set. So this means the calling code of CopyReadLineText may no longer
need to call the CopyLoadRawBuf one last time just to discover there
are no more bytes to read - something that it already knows if
cstate->reached_eof == true.

For example, with your change can't you also modify CopyReadLineText like below:

BEFORE
if (!CopyLoadRawBuf(cstate))
hit_eof = true;

AFTER
if (cstate->reached_eof)
{
cstate->raw_buf[0] = '\0';
cstate->raw_buf_index = cstate->raw_buf_len = 0;
hit_eof = true;
}
else if (!CopyLoadRawBuf(cstate))
{
hit_eof = true;
}

Whether such a micro-optimisation is worth doing is another question.

---

Kind Regards,
Peter Smith.
Fujitsu Australia

Show quoted text

On Sun, Aug 30, 2020 at 5:25 PM vignesh C <vignesh21@gmail.com> wrote:

On Thu, Aug 27, 2020 at 11:02 AM Peter Smith <smithpb2250@gmail.com> wrote:

Hello.

FYI - that patch has conflicts when applied.

Thanks for letting me know. Attached new patch which is rebased on top of head.

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

#11

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: Surafel Temesgen (#9)

Re: Improvements in Copy From

On Mon, Sep 7, 2020 at 1:19 PM Surafel Temesgen <surafel3000@gmail.com> wrote:

Hi Vignesh

On Wed, Jul 1, 2020 at 3:46 PM vignesh C <vignesh21@gmail.com> wrote:

Hi,

While reviewing copy from I identified few improvements for copy from
that can be done :
a) copy from stdin copies lesser amount of data to buffer even though
space is available in buffer because minread was passed as 1 to
CopyGetData, Hence it only reads until the data read from libpq is
less than minread. This can be fixed by passing the actual space
available in buffer, this reduces the unnecessary frequent calls to
CopyGetData.

why not applying the same optimization on file read ?

For file read this is already taken care as you can see from below code:
bytesread = fread(databuf, 1, maxread, cstate->copy_file);
if (ferror(cstate->copy_file))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not read from COPY file: %m")));
if (bytesread == 0)
cstate->reached_eof = true;
break;

We do not have any condition to break unlike the case of stdin, we
read 1 * maxread size of data, So no need to do anything for it.

c) Copy from reads header line and do nothing for the header line, we
need not clear EOL & need not convert to server encoding for the
header line.

We have a patch for column matching feature [1] that may need a header line to be further processed. Even without that I think it is preferable to process the header line for nothing than adding those checks to the loop, performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

Thoughts?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

#12

vignesh C

vignesh21@gmail.com

over 5 years ago

In reply to: Peter Smith (#10)

Re: Improvements in Copy From

On Wed, Sep 9, 2020 at 12:24 PM Peter Smith <smithpb2250@gmail.com> wrote:

My basic understanding of first part of your patch is that by
adjusting the "minread" it now allows it to loop multiple times
internally within the CopyGetData rather than calling CopyLoadRawBuf
for every N lines. There doesn't seem to be much change to what other
code gets executed so the saving is essentially whatever is the cost
of making 2 x function calls (CopyLoadRawBuff + CopyGetData) x N. Is
that understanding correct?

Yes you are right, we will avoid the function calls and try to get as
many records as possible from the buffer & insert it to the relation.

But with that change there seems to be opportunity for yet another
tiny saving possible. IIUC, now you are processing a lot more data
within the CopyGetData so it is now very likely that you will also
gobble the COPY_NEW_FE's 'c' marker. So cstate->reached_eof will be
set. So this means the calling code of CopyReadLineText may no longer
need to call the CopyLoadRawBuf one last time just to discover there
are no more bytes to read - something that it already knows if
cstate->reached_eof == true.

For example, with your change can't you also modify CopyReadLineText like below:

BEFORE
if (!CopyLoadRawBuf(cstate))
hit_eof = true;

AFTER
if (cstate->reached_eof)
{
cstate->raw_buf[0] = '\0';
cstate->raw_buf_index = cstate->raw_buf_len = 0;
hit_eof = true;
}
else if (!CopyLoadRawBuf(cstate))
{
hit_eof = true;
}

Whether such a micro-optimisation is worth doing is another question.

Yes, what you suggested can also be done, but even I have the same
question as you. Because we will reduce just one function call, the
eof check is present immediately in the function, Should we include
this or not?

Regards,
VIgnesh
EnterpriseDB: http://www.enterprisedb.com

#13

Surafel Temesgen

surafel3000@gmail.com

over 5 years ago

In reply to: vignesh C (#11)

Re: Improvements in Copy From

On Thu, Sep 10, 2020 at 1:17 PM vignesh C <vignesh21@gmail.com> wrote:

We have a patch for column matching feature [1] that may need a header

line to be further processed. Even without that I think it is preferable to
process the header line for nothing than adding those checks to the loop,
performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

The existing check is unavoidable but we can live better without the checks
added by the patch. For very large files the loop may iterate millions of
times if it is not in billion and I am sure doing the check that many times
will incur noticeable performance degradation than further processing a
single line.

regards

Surafel

#14

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Surafel Temesgen (#13)

Re: Improvements in Copy From

At Thu, 10 Sep 2020 21:55:27 +0300, Surafel Temesgen <surafel3000@gmail.com> wrote in

On Thu, Sep 10, 2020 at 1:17 PM vignesh C <vignesh21@gmail.com> wrote:

We have a patch for column matching feature [1] that may need a header

line to be further processed. Even without that I think it is preferable to
process the header line for nothing than adding those checks to the loop,
performance-wise.

I had seen that patch, I feel that change to match the header if the
header is specified can be addressed in this patch if that patch gets
committed first or vice versa. We are doing a lot of processing for
the data which we need not do anything. Shouldn't this be skipped if
not required. Similar check is present in NextCopyFromRawFields also
to skip header.

The existing check is unavoidable but we can live better without the checks
added by the patch. For very large files the loop may iterate millions of
times if it is not in billion and I am sure doing the check that many times
will incur noticeable performance degradation than further processing a
single line.

FWIW, I thought the same thing seeing the additional if-conditions. It
gives more loss than gain.

For the first part, the patch reveals COPY_NEW_FE, which I don't think
to be a knowledge for the function, to CopyGetData. Considering that
that doesn't seem to offer noticeable performance gain, I don't think
we should do that. On the contrary, if incoming data were
intermittently delayed for some reasons (heavy load of client or
in-between network), this patch would make things worse by waiting for
delayed bits before processing already received bits.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#15

Peter Smith

smithpb2250@gmail.com

over 5 years ago

In reply to: vignesh C (#12)

Re: Improvements in Copy From

On Thu, Sep 10, 2020 at 9:21 PM vignesh C <vignesh21@gmail.com> wrote:

Whether such a micro-optimisation is worth doing is another question.

Yes, what you suggested can also be done, but even I have the same
question as you. Because we will reduce just one function call, the
eof check is present immediately in the function, Should we include
this or not?

I expect the difference from my suggestion is too small to be measured.

Probably it is not worth changing the already complicated code unless
those changes can achieve something observable.

FYI, I ran a few performance tests BEFORE/AFTER applying your patch.

Perf results for \COPY 5GB CSV file to UNLOGGED table.

perf -a –g <pid>
psql -d test -c "\copy tbl from '/my/path/data_5GB.csv' with (format csv);”
perf report –g

BEFORE
#1 CopyReadLineText = 12.70%, CopyLoadRawBuf = 0.81%
#2 CopyReadLineText = 12.54%, CopyLoadRawBuf = 0.81%
#3 CopyReadLineText = 12.52%, CopyLoadRawBuf = 0.81%

AFTER
#1 CopyReadLineText = 12.55%, CopyLoadRawBuf = 1.20%
#2 CopyReadLineText = 12.15%, CopyLoadRawBuf = 1.10%
#3 CopyReadLineText = 13.11%, CopyLoadRawBuf = 1.24%
#4 CopyReadLineText = 12.86%, CopyLoadRawBuf = 1.18%

I didn't quite know how to interpret those results. It was opposite
what I expected. Perhaps the original excessive CopyLoadRawBuf calls
were so brief they could often avoid being sampled? Anyway, I hope you
have a better understanding of perf than I do and can explain it.

I then repeated/times same tests but without perf

BEFORE
#1 4min.36s
#2 4min.45s
#3 4min.43s
#4 4min.34s

AFTER
#1 4min.41s
#2 4min.37s
#3 4min.34s

As you can see, unfortunately, the patch gave no observable benefit
for my test case.

Kind Regards,
Peter Smith.
Fujitsu Australia

#16

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 5 years ago

In reply to: Peter Smith (#10)

Re: Improvements in Copy From

At Fri, 11 Sep 2020 18:44:13 +1000, Peter Smith <smithpb2250@gmail.com> wrote in

On Thu, Sep 10, 2020 at 9:21 PM vignesh C <vignesh21@gmail.com> wrote:

Whether such a micro-optimisation is worth doing is another question.

Yes, what you suggested can also be done, but even I have the same
question as you. Because we will reduce just one function call, the
eof check is present immediately in the function, Should we include
this or not?

I expect the difference from my suggestion is too small to be measured.

Probably it is not worth changing the already complicated code unless
those changes can achieve something observable.

~~

FYI, I ran a few performance tests BEFORE/AFTER applying your patch.

Perf results for \COPY 5GB CSV file to UNLOGGED table.

perf -a –g <pid>
psql -d test -c "\copy tbl from '/my/path/data_5GB.csv' with (format csv);”
perf report –g

BEFORE
#1 CopyReadLineText = 12.70%, CopyLoadRawBuf = 0.81%
#2 CopyReadLineText = 12.54%, CopyLoadRawBuf = 0.81%
#3 CopyReadLineText = 12.52%, CopyLoadRawBuf = 0.81%

AFTER
#1 CopyReadLineText = 12.55%, CopyLoadRawBuf = 1.20%
#2 CopyReadLineText = 12.15%, CopyLoadRawBuf = 1.10%
#3 CopyReadLineText = 13.11%, CopyLoadRawBuf = 1.24%
#4 CopyReadLineText = 12.86%, CopyLoadRawBuf = 1.18%

I didn't quite know how to interpret those results. It was opposite
what I expected. Perhaps the original excessive CopyLoadRawBuf calls
were so brief they could often avoid being sampled? Anyway, I hope you
have a better understanding of perf than I do and can explain it.

I then repeated/times same tests but without perf

BEFORE
#1 4min.36s
#2 4min.45s
#3 4min.43s
#4 4min.34s

AFTER
#1 4min.41s
#2 4min.37s
#3 4min.34s

As you can see, unfortunately, the patch gave no observable benefit
for my test case.

That observation agrees with my assumption.

At Fri, 11 Sep 2020 15:58:04 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
me> we should do that. On the contrary, if incoming data were
me> intermittently delayed for some reasons (heavy load of client or
me> in-between network), this patch would make things worse by waiting for
me> delayed bits before processing already received bits.

It seems that a slow network is enough to cause that behavior even
without any trouble,

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Import Notes

Reply to msg id not found: CAHut+Pv63PUZKDbxRbWnmM2GcQ+7UiJjuE5bsy_cvJ_f1gj3vw@mail.gmail.com20200911.155804.359271394064499501.horikyota.ntt@gmail.com