[PATCH] Remove Extra palloc Of raw_buf For Binary Format In COPY FROM
Hi Hackers,
There seems to be an extra palloc of 64KB of raw_buf for binary format
files which is not required
as copy logic for binary files don't use raw_buf, instead, attribute_buf
is used in CopyReadBinaryAttribute.
Attached is a patch, which places a check to avoid this unnecessary 64KB palloc.
Request the community to take this patch, if it is useful.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v1-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patchapplication/octet-stream; name=v1-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patchDownload
From 889225cdc868139f5cf889497964b1a1ae0d5681 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Fri, 26 Jun 2020 15:06:24 +0530
Subject: [PATCH v1] Remove Extra palloc Of raw_buf For Binary Format In COPY
FROM
For binary files raw_buf is not used, instead, attribute_buf
is used in CopyReadBinaryAttribute. Hence, don't palloc raw_buf.
---
src/backend/commands/copy.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d4cc..0cdac16eac 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3367,7 +3367,15 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+
+ /* For binary files raw_buf is not used,
+ * instead, attribute_buf is used in
+ * CopyReadBinaryAttribute. Hence, don't palloc
+ * raw_buf.
+ */
+ if (!cstate->binary)
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
--
2.25.1
On Fri, Jun 26, 2020 at 3:16 PM Bharath Rupireddy <
bharath.rupireddyforpostgres@gmail.com> wrote:
Hi Hackers,
There seems to be an extra palloc of 64KB of raw_buf for binary format
files which is not required
as copy logic for binary files don't use raw_buf, instead, attribute_buf
is used in CopyReadBinaryAttribute.
+1
I looked at the patch and the changes looked good. Couple of comments;
1)
+
+ /* For binary files raw_buf is not used,
+ * instead, attribute_buf is used in
+ * CopyReadBinaryAttribute. Hence, don't palloc
+ * raw_buf.
+ */
Not a PG style of commenting.
2) In non-binary mode, should assign NULL the raw_buf.
Attaching patch with those changes.
Attached is a patch, which places a check to avoid this unnecessary 64KB
palloc.Request the community to take this patch, if it is useful.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
Attachments:
v2-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patchDownload
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc4..97170d3 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3368,7 +3368,17 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+
+ /*
+ * For binary files raw_buf is not used and instead attribute_buf
+ * is used in CopyReadBinaryAttribute. Hence, don't palloc raw_buf
+ * for binary files.
+ */
+ if (!cstate->binary)
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ else
+ cstate->raw_buf = NULL;
+
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
On Fri, Jun 26, 2020 at 6:15 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
On Fri, Jun 26, 2020 at 3:16 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
Hi Hackers,
There seems to be an extra palloc of 64KB of raw_buf for binary format
files which is not required
as copy logic for binary files don't use raw_buf, instead, attribute_buf
is used in CopyReadBinaryAttribute.+1
I looked at the patch and the changes looked good. Couple of comments;
1)
+ + /* For binary files raw_buf is not used, + * instead, attribute_buf is used in + * CopyReadBinaryAttribute. Hence, don't palloc + * raw_buf. + */Not a PG style of commenting.
2) In non-binary mode, should assign NULL the raw_buf.
Attaching patch with those changes.
+1 for the patch.
One comment:
We could change below code:
+ */
+ if (!cstate->binary)
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ else
+ cstate->raw_buf = NULL;
to:
cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Thanks Rushabh and Vignesh for the comments.
One comment: We could change below code: + */ + if (!cstate->binary) + cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1); + else + cstate->raw_buf = NULL; to: cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
Attached the patch with the above changes.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v3-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patchapplication/octet-stream; name=v3-0001-Remove-Extra-palloc-Of-raw_buf-For-Binary-Format-.patchDownload
From ae8479e3abfa4e19dafcea5bc30638fdf8c4355b Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 27 Jun 2020 09:18:22 +0530
Subject: [PATCH v3] Remove Extra palloc Of raw_buf For Binary Format In COPY
FROM
For binary files raw_buf is not used and instead attribute_buf
is used in CopyReadBinaryAttribute. Hence, don't palloc raw_buf
for binary files.
---
src/backend/commands/copy.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6b1fd6d4cc..6ddf3bee99 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -3367,7 +3367,14 @@ BeginCopyFrom(ParseState *pstate,
initStringInfo(&cstate->attribute_buf);
initStringInfo(&cstate->line_buf);
cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+
+ /*
+ * For binary files raw_buf is not used and instead attribute_buf
+ * is used in CopyReadBinaryAttribute. Hence, don't palloc raw_buf
+ * for binary files.
+ */
+ cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);
+
cstate->raw_buf_index = cstate->raw_buf_len = 0;
/* Assign range table, we'll need it in CopyFrom. */
--
2.25.1
On Sat, Jun 27, 2020 at 9:23 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks Rushabh and Vignesh for the comments.
One comment: We could change below code: + */ + if (!cstate->binary) + cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1); + else + cstate->raw_buf = NULL; to: cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);Attached the patch with the above changes.
Changes look fine to me.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Thanks Vignesh and Rushabh for reviewing this.
I've added this patch to commitfest - https://commitfest.postgresql.org/28/.
Request community take this patch further if there are no further issues.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Show quoted text
On Sat, Jun 27, 2020 at 6:30 PM vignesh C <vignesh21@gmail.com> wrote:
On Sat, Jun 27, 2020 at 9:23 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:Thanks Rushabh and Vignesh for the comments.
One comment: We could change below code: + */ + if (!cstate->binary) + cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1); + else + cstate->raw_buf = NULL; to: cstate->raw_buf = (cstate->binary) ? NULL : (char *) palloc(RAW_BUF_SIZE + 1);Attached the patch with the above changes.
Changes look fine to me.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
On Tue, Jun 30, 2020 at 2:41 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
Thanks Vignesh and Rushabh for reviewing this.
I've added this patch to commitfest - https://commitfest.postgresql.org/28/.
I felt this patch is ready for committer, changing the status to ready
for committer.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
vignesh C <vignesh21@gmail.com> writes:
On Tue, Jun 30, 2020 at 2:41 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I've added this patch to commitfest - https://commitfest.postgresql.org/28/.
I felt this patch is ready for committer, changing the status to ready
for committer.
Pushed with some fiddling. Mainly, if we're going to the trouble of
checking for binary mode here, we might as well skip allocating the
line_buf too.
regards, tom lane
On Sat, Jul 11, 2020 at 11:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
vignesh C <vignesh21@gmail.com> writes:
On Tue, Jun 30, 2020 at 2:41 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:I've added this patch to commitfest - https://commitfest.postgresql.org/28/.
I felt this patch is ready for committer, changing the status to ready
for committer.Pushed with some fiddling. Mainly, if we're going to the trouble of
checking for binary mode here, we might as well skip allocating the
line_buf too.
Hi Tom,
Isn't it good if we backpatch this to versions 13, 12, 11 and so on?
As we can save good amount of memory with this patch for non-binary
copy.
Attaching the patch which applies on versions 13, 12, 11.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachments:
v1-0001-Avoid-useless-buffer-allocations-during-binary-CO.patchapplication/octet-stream; name=v1-0001-Avoid-useless-buffer-allocations-during-binary-CO.patchDownload
From b4278a09b60c9832fdab4bc11089e48d2ba8b933 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy <bharath.rupireddy@enterprisedb.com>
Date: Sat, 18 Jul 2020 10:03:50 +0530
Subject: [PATCH v1] Avoid useless buffer allocations during binary COPY FROM.
The raw_buf and line_buf buffers aren't used when reading binary format,
so skip allocating them. raw_buf is 64K so that seems like a worthwhile
savings. An unused line_buf only wastes 1K, but as long as we're checking
it's free to avoid allocating that too.
---
src/backend/commands/copy.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 6d53dc463c..72057af21f 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -193,6 +193,9 @@ typedef struct CopyStateData
* the current line. The CopyReadAttributes functions return arrays of
* pointers into this buffer. We avoid palloc/pfree overhead by re-using
* the buffer on each cycle.
+ *
+ * (In binary COPY FROM, attribute_buf holds the binary data for the
+ * current field, while the other variables are not used.)
*/
StringInfoData attribute_buf;
@@ -3364,12 +3367,19 @@ BeginCopyFrom(ParseState *pstate,
cstate->cur_attname = NULL;
cstate->cur_attval = NULL;
- /* Set up variables to avoid per-attribute overhead. */
+ /*
+ * Set up variables to avoid per-attribute overhead. attribute_buf is
+ * used in both text and binary modes, but we use line_buf and raw_buf
+ * only in text mode.
+ */
initStringInfo(&cstate->attribute_buf);
- initStringInfo(&cstate->line_buf);
- cstate->line_buf_converted = false;
- cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
- cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ if (!cstate->binary)
+ {
+ initStringInfo(&cstate->line_buf);
+ cstate->line_buf_converted = false;
+ cstate->raw_buf = (char *) palloc(RAW_BUF_SIZE + 1);
+ cstate->raw_buf_index = cstate->raw_buf_len = 0;
+ }
/* Assign range table, we'll need it in CopyFrom. */
if (pstate)
--
2.25.1
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
On Sat, Jul 11, 2020 at 11:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Pushed with some fiddling. Mainly, if we're going to the trouble of
checking for binary mode here, we might as well skip allocating the
line_buf too.
Isn't it good if we backpatch this to versions 13, 12, 11 and so on?
Given the lack of complaints, I wasn't excited about it. Transient
consumption of 64K is not a huge deal these days. (And yes, I've
worked on machines where that was the entire address space. But that
was a very long time ago.)
regards, tom lane