[PATCH] pg_convert improvement

Started by Yurii Rashkovskiiabout 2 years ago8 messages
#1Yurii Rashkovskii
yrashk@gmail.com
1 attachment(s)

Hi,

I propose a patch that ensures `pg_convert` doesn't allocate and copy data
when no conversion is done. It is an unnecessary overhead, especially when
such conversions are done frequently and for large values.

I've tried measuring the performance impact, and the patched version has a
small but non-zero gain.

The patch builds against `master` and `make check` succeeds.

Happy to hear any feedback!

--
Y.

Attachments:

v1-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patchapplication/octet-stream; name=v1-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patchDownload
From afca3ec646aa940e2bd2c244b463d4aeac2344c3 Mon Sep 17 00:00:00 2001
From: Yurii Rashkovskii <yrashk@gmail.com>
Date: Fri, 24 Nov 2023 05:56:57 -0800
Subject: [PATCH] Don't copy bytea/text in convert_from/to unless converted.

`pg_convert` allocates a new bytea whether actual conversion occurred or
not, followed by copying the result which (by definition) is the same as
the original value.

In the case where conversion has not occurred, it will now simply return
the original value, this avoiding unnecessary allocation and copying.
---
 src/backend/utils/mb/mbutils.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 67a1ab2ab2..8f89400aec 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -585,19 +585,23 @@ pg_convert(PG_FUNCTION_ARGS)
 												  src_encoding,
 												  dest_encoding);
 
-	/* update len if conversion actually happened */
-	if (dest_str != src_str)
-		len = strlen(dest_str);
 
-	/*
-	 * build bytea data type structure.
-	 */
+	/* if no actual conversion happened, return the original string */
+	/* (we are checking pointers to strings instead of encodings because
+	   `pg_do_encoding_conversion` above covers more cases than just
+	   encoding equality) */
+	if (dest_str == src_str) {
+		PG_RETURN_BYTEA_P(string);
+	}
+
+	/* if conversion happened, build a new bytea of a correct length */
+	len = strlen(dest_str);
 	retval = (bytea *) palloc(len + VARHDRSZ);
 	SET_VARSIZE(retval, len + VARHDRSZ);
 	memcpy(VARDATA(retval), dest_str, len);
 
-	if (dest_str != src_str)
-		pfree(dest_str);
+	/* free the original result of conversion */
+	pfree(dest_str);
 
 	/* free memory if allocated by the toaster */
 	PG_FREE_IF_COPY(string, 0);
-- 
2.33.0

#2Drouvot, Bertrand
bertranddrouvot.pg@gmail.com
In reply to: Yurii Rashkovskii (#1)
Re: [PATCH] pg_convert improvement

Hi,

On 11/24/23 3:05 PM, Yurii Rashkovskii wrote:

Hi,

I propose a patch that ensures `pg_convert` doesn't allocate and copy data when no conversion is done. It is an unnecessary overhead, especially when such conversions are done frequently and for large values.

+1 for the patch, I think the less is done the better.

Happy to hear any feedback!

The patch is pretty straightforward, I just have one remark:

+       /* if no actual conversion happened, return the original string */
+       /* (we are checking pointers to strings instead of encodings because
+          `pg_do_encoding_conversion` above covers more cases than just
+          encoding equality) */

I think this could be done in one single comment and follow the preferred style
for multi-line comment, see [1]https://www.postgresql.org/docs/current/source-format.html.

[1]: https://www.postgresql.org/docs/current/source-format.html

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#3Yurii Rashkovskii
yrashk@gmail.com
In reply to: Drouvot, Bertrand (#2)
1 attachment(s)
Re: [PATCH] pg_convert improvement

Hi Bertrand,

On Fri, Nov 24, 2023 at 6:26 AM Drouvot, Bertrand <
bertranddrouvot.pg@gmail.com> wrote:

The patch is pretty straightforward, I just have one remark:

+       /* if no actual conversion happened, return the original string */
+       /* (we are checking pointers to strings instead of encodings
because
+          `pg_do_encoding_conversion` above covers more cases than just
+          encoding equality) */

I think this could be done in one single comment and follow the preferred
style
for multi-line comment, see [1].

Thank you for your feedback. I've attached a revised patch.

--
Y.

Attachments:

v2-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patchapplication/octet-stream; name=v2-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-conv.patchDownload
From 344bbad9c752608d55cf9ab83f18336ca7b9aca4 Mon Sep 17 00:00:00 2001
From: Yurii Rashkovskii <yrashk@gmail.com>
Date: Fri, 24 Nov 2023 06:32:05 -0800
Subject: [PATCH] Don't copy bytea/text in convert_from/to unless converted.

`pg_convert` allocates a new bytea whether actual conversion occurred or
not, followed by copying the result which (by definition) is the same as
the original value.

In the case where conversion has not occurred, it will now simply return
the original value, this avoiding unnecessary allocation and copying.
---
 src/backend/utils/mb/mbutils.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 67a1ab2ab2..5a0e49e70b 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -585,19 +585,25 @@ pg_convert(PG_FUNCTION_ARGS)
 												  src_encoding,
 												  dest_encoding);
 
-	/* update len if conversion actually happened */
-	if (dest_str != src_str)
-		len = strlen(dest_str);
 
 	/*
-	 * build bytea data type structure.
+	 * if no actual conversion happened, return the original string
+	 * (we are checking pointers to strings instead of encodings because
+	 * `pg_do_encoding_conversion` above covers more cases than just
+	 * encoding equality)
 	 */
+	if (dest_str == src_str) {
+		PG_RETURN_BYTEA_P(string);
+	}
+
+	/* if conversion happened, build a new bytea of a correct length */
+	len = strlen(dest_str);
 	retval = (bytea *) palloc(len + VARHDRSZ);
 	SET_VARSIZE(retval, len + VARHDRSZ);
 	memcpy(VARDATA(retval), dest_str, len);
 
-	if (dest_str != src_str)
-		pfree(dest_str);
+	/* free the original result of conversion */
+	pfree(dest_str);
 
 	/* free memory if allocated by the toaster */
 	PG_FREE_IF_COPY(string, 0);
-- 
2.33.0

#4Drouvot, Bertrand
bertranddrouvot.pg@gmail.com
In reply to: Yurii Rashkovskii (#3)
1 attachment(s)
Re: [PATCH] pg_convert improvement

Hi,

On 11/24/23 3:32 PM, Yurii Rashkovskii wrote:

Hi Bertrand,

On Fri, Nov 24, 2023 at 6:26 AM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com <mailto:bertranddrouvot.pg@gmail.com>> wrote:

The patch is pretty straightforward, I just have one remark:

+       /* if no actual conversion happened, return the original string */
+       /* (we are checking pointers to strings instead of encodings because
+          `pg_do_encoding_conversion` above covers more cases than just
+          encoding equality) */

I think this could be done in one single comment and follow the preferred style
for multi-line comment, see [1].

Thank you for your feedback. I've attached a revised patch.

Did some minor changes in the attached:

- Started the multi-line comment with an upper case and finished
it with a "." and re-worded a bit.
- Ran pgindent

What do you think about the attached?

Also, might be good to create a CF entry [1]https://commitfest.postgresql.org/46/ so that the patch proposal does not get lost
and gets visibility.

[1]: https://commitfest.postgresql.org/46/

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

v3-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-c.patchtext/plain; charset=UTF-8; name=v3-0001-Don-t-copy-bytea-text-in-convert_from-to-unless-c.patchDownload
From 5eb1f5d9323b8d270411bf156ec065fe7c69b7b8 Mon Sep 17 00:00:00 2001
From: Yurii Rashkovskii <yrashk@gmail.com>
Date: Fri, 24 Nov 2023 06:32:05 -0800
Subject: [PATCH v3] Don't copy bytea/text in convert_from/to unless converted.

`pg_convert` allocates a new bytea whether actual conversion occurred or
not, followed by copying the result which (by definition) is the same as
the original value.

In the case where conversion has not occurred, it will now simply return
the original value, this avoiding unnecessary allocation and copying.
---
 src/backend/utils/mb/mbutils.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)
 100.0% src/backend/utils/mb/

diff --git a/src/backend/utils/mb/mbutils.c b/src/backend/utils/mb/mbutils.c
index 67a1ab2ab2..7666d166a5 100644
--- a/src/backend/utils/mb/mbutils.c
+++ b/src/backend/utils/mb/mbutils.c
@@ -585,19 +585,26 @@ pg_convert(PG_FUNCTION_ARGS)
 												  src_encoding,
 												  dest_encoding);
 
-	/* update len if conversion actually happened */
-	if (dest_str != src_str)
-		len = strlen(dest_str);
 
 	/*
-	 * build bytea data type structure.
+	 * If no actual conversion happened, return the original string (we are
+	 * checking pointers to strings instead of encodings because
+	 * pg_do_encoding_conversion() above covers more cases than just encoding
+	 * equality).
 	 */
+	if (dest_str == src_str)
+	{
+		PG_RETURN_BYTEA_P(string);
+	}
+
+	/* if conversion happened, build a new bytea of a correct length */
+	len = strlen(dest_str);
 	retval = (bytea *) palloc(len + VARHDRSZ);
 	SET_VARSIZE(retval, len + VARHDRSZ);
 	memcpy(VARDATA(retval), dest_str, len);
 
-	if (dest_str != src_str)
-		pfree(dest_str);
+	/* free the original result of conversion */
+	pfree(dest_str);
 
 	/* free memory if allocated by the toaster */
 	PG_FREE_IF_COPY(string, 0);
-- 
2.34.1

#5Yurii Rashkovskii
yrashk@gmail.com
In reply to: Drouvot, Bertrand (#4)
Re: [PATCH] pg_convert improvement

Hi Bertrand,

Did some minor changes in the attached:

- Started the multi-line comment with an upper case and finished
it with a "." and re-worded a bit.
- Ran pgindent

What do you think about the attached?

It looks great!

Also, might be good to create a CF entry [1] so that the patch proposal
does not get lost
and gets visibility.

Just submitted it to SF. Thank you for the review!

--
Y.

#6Drouvot, Bertrand
bertranddrouvot.pg@gmail.com
In reply to: Yurii Rashkovskii (#5)
Re: [PATCH] pg_convert improvement

Hi,

On 11/28/23 2:16 AM, Yurii Rashkovskii wrote:

 Hi Bertrand,

Did some minor changes in the attached:

- Started the multi-line comment with an upper case and finished
it with a "." and re-worded a bit.
- Ran pgindent

What do you think about the attached?

It looks great!

Also, might be good to create a CF entry [1] so that the patch proposal does not get lost
and gets visibility.

Just submitted it to SF. Thank you for the review!

Thanks! Just marked it as "Ready for Committer".

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#7Nathan Bossart
nathandbossart@gmail.com
In reply to: Drouvot, Bertrand (#4)
Re: [PATCH] pg_convert improvement

On Mon, Nov 27, 2023 at 08:11:06AM +0100, Drouvot, Bertrand wrote:

+ PG_RETURN_BYTEA_P(string);

I looked around to see whether there was some sort of project policy about
returning arguments without copying them, but the only strict rule I see is
to avoid scribbling on argument data without first copying it. However, I
do see functions that return unmodified arguments both with and without
copying. For example, unaccent_dict() is careful to copy the argument
before returning it:

PG_RETURN_TEXT_P(PG_GETARG_TEXT_P_COPY(strArg));

But replace_text() is not:

/* Return unmodified source string if empty source or pattern */
if (src_text_len < 1 || from_sub_text_len < 1)
{
PG_RETURN_TEXT_P(src_text);
}

I don't have any specific concerns about doing this, though. Otherwise,
the patch looks pretty good to me, so I will plan on committing it shortly.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

#8Nathan Bossart
nathandbossart@gmail.com
In reply to: Nathan Bossart (#7)
Re: [PATCH] pg_convert improvement

Committed.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com