PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

Started by Hans Buschmannover 4 years ago13 messages

buschmann@nidsa.net

over 4 years ago

1 attachment(s)

During some development on encoding-related parts of postgres I stumbled over the new length-checking-code in common/hex.c/pg_hex_encode.

Differently when comparing it to all versions before the output-buffer-length is checked during encoding of every individual source byte.

This may impose quite a regression when using pg_dump on databases with many/big bytea or lo columns.

Because all criteria to check buffer-length are known in advance of encoding (srclen and destlen) I propose doing the check only once before starting the while-loop.

Please find the attached patch for pg14 and master, older versions did not have this behavior.

Tested on pg14-beta3, but applies also on master.

PS: This is my very first patch, I am in no way an experienced C-developer and don't have a smoothly running development environment or experience yet. (originally mostly dealing with postgres on Windows).

If it seems useful somebody could enter it as an open item / resolved item for pg14 after beta 3.

Thanks for looking!

Hans Buschmann

Attachments:

hex_encode_length_check_outside_loop.patchapplication/octet-stream; name=hex_encode_length_check_outside_loop.patchDownload

diff --git a/postgresql-14beta3/src/common/hex.c b/postgresql-14beta3_hb/src/common/hex.c
index 3b1bc8f..fa23564 100644
--- a/postgresql-14beta3/src/common/hex.c
+++ b/postgresql-14beta3_hb/src/common/hex.c
@@ -78,22 +78,22 @@ pg_hex_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 
 	p = dst;
 
-	while (src < end)
+	/*
+	 * Leave if there is an overflow in the area allocated for the encoded
+	 * string.
+	 */
+	if (dstlen < srclen << 1)
 	{
-		/*
-		 * Leave if there is an overflow in the area allocated for the encoded
-		 * string.
-		 */
-		if ((p - dst + 2) > dstlen)
-		{
 #ifdef FRONTEND
-			pg_log_fatal("overflow of destination buffer in hex encoding");
-			exit(EXIT_FAILURE);
+		pg_log_fatal("overflow of destination buffer in hex encoding");
+		exit(EXIT_FAILURE);
 #else
-			elog(ERROR, "overflow of destination buffer in hex encoding");
+		elog(ERROR, "overflow of destination buffer in hex encoding");
 #endif
-		}
+	}
 
+	while (src < end)
+	{
 		*p++ = hextbl[(*src >> 4) & 0xF];
 		*p++ = hextbl[*src & 0xF];
 		src++;

Ranier Vilela

ranier.vf@gmail.com

over 4 years ago

In reply to: Hans Buschmann (#1)

1 attachment(s)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

Welcome.
Em seg., 16 de ago. de 2021 às 05:46, Hans Buschmann <buschmann@nidsa.net>
escreveu:

During some development on encoding-related parts of postgres I stumbled
over the new length-checking-code in common/hex.c/pg_hex_encode.

Differently when comparing it to all versions before the
output-buffer-length is checked during encoding of every individual source
byte.

Is always good to remove immutables from loops, if safe and secure.

This may impose quite a regression when using pg_dump on databases with
many/big bytea or lo columns.

Decode has regression too.

Because all criteria to check buffer-length are known in advance of
encoding (srclen and destlen) I propose doing the check only once before
starting the while-loop.

Good.

Please find the attached patch for pg14 and master, older versions did not
have this behavior.

I think that we can take one more step here.
pg_hex_decode can be improved too.
We can remove limitations from all functions that use hex functions.
byteaout from (varlena.c) has an artificial limitation to handle only
MaxAllocSize length, but
nothing prevents her from being prepared for that limitation to be removed
one day.

Tested on pg14-beta3, but applies also on master.

PS: This is my very first patch, I am in no way an experienced C-developer
and don't have a smoothly running development environment or experience
yet. (originally mostly dealing with postgres on Windows).

It seems good to me.

Please, take a look at the new version attached.
If possible can you review the tests if there is an overflow at
pg_hex_encode and pg_hex_decode functions?

regards,
Ranier Vilela

Attachments:

0001-improve-hex-functions-handling.patchapplication/octet-stream; name=0001-improve-hex-functions-handling.patchDownload

diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 8882444025..1a3f06b3fe 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -150,7 +150,7 @@ AddFileToBackupManifest(backup_manifest_info *manifest, const char *spcoid,
 	}
 	else
 	{
-		uint64		dstlen = pg_hex_enc_len(pathlen);
+		size_t		dstlen = pg_hex_enc_len(pathlen);
 
 		appendStringInfoString(&buf, "{ \"Encoded-Path\": \"");
 		enlargeStringInfo(&buf, dstlen);
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index acb8741734..5989eee651 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -305,7 +305,7 @@ byteain(PG_FUNCTION_ARGS)
 	if (inputText[0] == '\\' && inputText[1] == 'x')
 	{
 		size_t		len = strlen(inputText);
-		uint64		dstlen = pg_hex_dec_len(len - 2);
+		size_t		dstlen = pg_hex_dec_len(len - 2);
 
 		bc = dstlen + VARHDRSZ; /* maximum possible length */
 		result = palloc(bc);
@@ -399,7 +399,7 @@ byteaout(PG_FUNCTION_ARGS)
 
 	if (bytea_output == BYTEA_OUTPUT_HEX)
 	{
-		uint64		dstlen = pg_hex_enc_len(VARSIZE_ANY_EXHDR(vlena));
+		size_t		dstlen = pg_hex_enc_len(VARSIZE_ANY_EXHDR(vlena));
 
 		/* Print hex format */
 		rp = result = palloc(dstlen + 2 + 1);
@@ -413,8 +413,8 @@ byteaout(PG_FUNCTION_ARGS)
 	{
 		/* Print traditional escaped format */
 		char	   *vp;
-		uint64		len;
-		int			i;
+		size_t		len;
+		size_t		i;
 
 		len = 1;				/* empty string has 1 char */
 		vp = VARDATA_ANY(vlena);
diff --git a/src/common/hex.c b/src/common/hex.c
index 3b1bc8fa75..0aae74fc10 100644
--- a/src/common/hex.c
+++ b/src/common/hex.c
@@ -70,30 +70,30 @@ get_hex(const char *cp)
  * Encode into hex the given string.  Returns the length of the encoded
  * string.
  */
-uint64
+size_t
 pg_hex_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 {
-	const char *end = src + srclen;
+	const char *end;
 	char	   *p;
 
-	p = dst;
-
-	while (src < end)
+	/*
+	 * Leave if there is an overflow in the area allocated for the encoded
+	 * string.
+	 */
+	if (dstlen < pg_hex_enc_len(srclen))
 	{
-		/*
-		 * Leave if there is an overflow in the area allocated for the encoded
-		 * string.
-		 */
-		if ((p - dst + 2) > dstlen)
-		{
 #ifdef FRONTEND
-			pg_log_fatal("overflow of destination buffer in hex encoding");
-			exit(EXIT_FAILURE);
+		pg_log_fatal("overflow of destination buffer in hex encoding");
+		exit(EXIT_FAILURE);
 #else
-			elog(ERROR, "overflow of destination buffer in hex encoding");
+		elog(ERROR, "overflow of destination buffer in hex encoding");
 #endif
-		}
+	}
 
+	end = src + srclen;
+	p = dst;
+	while (src < end)
+	{
 		*p++ = hextbl[(*src >> 4) & 0xF];
 		*p++ = hextbl[*src & 0xF];
 		src++;
@@ -108,7 +108,7 @@ pg_hex_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
  *
  * Decode the given hex string.  Returns the length of the decoded string.
  */
-uint64
+size_t
 pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 {
 	const char *s,
@@ -117,6 +117,20 @@ pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 				v2,
 			   *p;
 
+	/*
+	 * Leave if there is an overflow in the area allocated for the encoded
+	 * string.
+	 */
+	if (dstlen < pg_hex_dec_len(srclen))
+	{
+#ifdef FRONTEND
+		pg_log_fatal("overflow of destination buffer in hex encoding");
+		exit(EXIT_FAILURE);
+#else
+		elog(ERROR, "overflow of destination buffer in hex encoding");
+#endif
+	}
+
 	srcend = src + srclen;
 	s = src;
 	p = dst;
@@ -130,7 +144,7 @@ pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		v1 = get_hex(s) << 4;
 		s++;
 
-		if (s >= srcend)
+		if (s == srcend)
 		{
 #ifdef FRONTEND
 			pg_log_fatal("invalid hexadecimal data: odd number of digits");
@@ -138,24 +152,12 @@ pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 #else
 			ereport(ERROR,
 					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("invalid hexadecimal data: odd number of digits")));
+					errmsg("invalid hexadecimal data: odd number of digits")));
 #endif
 		}
-
 		v2 = get_hex(s);
 		s++;
 
-		/* overflow check */
-		if ((p - dst + 1) > dstlen)
-		{
-#ifdef FRONTEND
-			pg_log_fatal("overflow of destination buffer in hex decoding");
-			exit(EXIT_FAILURE);
-#else
-			elog(ERROR, "overflow of destination buffer in hex decoding");
-#endif
-		}
-
 		*p++ = v1 | v2;
 	}
 
@@ -171,10 +173,10 @@ pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
  * how large a buffer allocation needs to be done before doing the actual
  * encoding.
  */
-uint64
+size_t
 pg_hex_enc_len(size_t srclen)
 {
-	return (uint64) srclen << 1;
+	return srclen << 1;
 }
 
 /*
@@ -185,8 +187,8 @@ pg_hex_enc_len(size_t srclen)
  * estimate how large a buffer allocation needs to be done before doing
  * the actual decoding.
  */
-uint64
+size_t
 pg_hex_dec_len(size_t srclen)
 {
-	return (uint64) srclen >> 1;
+	return srclen >> 1;
 }
diff --git a/src/include/common/hex.h b/src/include/common/hex.h
index 150771a14d..e515f58b8f 100644
--- a/src/include/common/hex.h
+++ b/src/include/common/hex.h
@@ -15,11 +15,11 @@
 #ifndef COMMON_HEX_H
 #define COMMON_HEX_H
 
-extern uint64 pg_hex_decode(const char *src, size_t srclen,
+extern size_t pg_hex_decode(const char *src, size_t srclen,
 							char *dst, size_t dstlen);
-extern uint64 pg_hex_encode(const char *src, size_t srclen,
+extern size_t pg_hex_encode(const char *src, size_t srclen,
 							char *dst, size_t dstlen);
-extern uint64 pg_hex_enc_len(size_t srclen);
-extern uint64 pg_hex_dec_len(size_t srclen);
+extern size_t pg_hex_enc_len(size_t srclen);
+extern size_t pg_hex_dec_len(size_t srclen);
 
 #endif							/* COMMON_HEX_H */

Hans Buschmann

buschmann@nidsa.net

over 4 years ago

In reply to: Ranier Vilela (#2)

AW: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

________________________________
Von: Ranier Vilela <ranier.vf@gmail.com>
Gesendet: Montag, 16. August 2021 17:04
An: Hans Buschmann
Cc: pgsql-hackers@postgresql.org
Betreff: Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

Hello Ranier,

Thank you for your quick response.

Is always good to remove immutables from loops, if safe and secure.

I think it's worse because a loop changed variable is involved in the test, so it seems to be not immutable.

Decode has regression too.

good catch, I overlooked it.

I think that we can take one more step here.
pg_hex_decode can be improved too.

By looking a bit more precisely I detected the same checks in common/base64.c also involving loop-changed variables. Could you please make the same changes to base64.c?

We can remove limitations from all functions that use hex functions.
byteaout from (varlena.c) has an artificial limitation to handle only MaxAllocSize length, but
nothing prevents her from being prepared for that limitation to be removed one day.

+1, but I don't know all implications of the type change to size_t

Please, take a look at the new version attached.

Two remarks for decoding (by eyeball inspection of patch file only because of still struggling with git etc.):

1. The odd-number-of-digits-check for decoding is still part of the loop. It should be before the loop and called only once (by testing for an even number of srclen)
So we can eliminate the block if (s == srcend)? {} inside the loop!

2. I noticed that the error messages for hex_decode should be changed as

s/encoding/decoding/

(as in pg_log_fatal("overflow of destination buffer in hex encoding");)

If possible can you review the tests if there is an overflow at
pg_hex_encode and pg_hex_decode functions?

I would prefer to wait for another patch version from you (my development troubles), perhaps also dealing with base64.c

I don't know how and where any call to the encoding/decoding functions creates an overflow situation in normal operation.

I will nonceless try the tests but I don't have any experience with it.

BTW the root cause for my work is an attempt to vastly accelerate these functions (hex and base64 encode/decode), but this is left for another day (not finished yet) and certainly only on master/new development.

Lateron support on this topic would be highly appreciated...

Best regards,
Hans Buschmann

Ranier Vilela

ranier.vf@gmail.com

over 4 years ago

In reply to: Hans Buschmann (#3)

1 attachment(s)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

Em seg., 16 de ago. de 2021 às 13:19, Hans Buschmann <buschmann@nidsa.net>
escreveu:

------------------------------
*Von:* Ranier Vilela <ranier.vf@gmail.com>
*Gesendet:* Montag, 16. August 2021 17:04
*An:* Hans Buschmann
*Cc:* pgsql-hackers@postgresql.org
*Betreff:* Re: PG14: Avoid checking output-buffer-length for every
encoded byte during pg_hex_encode

Hello Ranier,

Thank you for your quick response.

Is always good to remove immutables from loops, if safe and secure.

I think it's worse because a loop changed variable is involved in the
test, so it seems to be not immutable.

Decode has regression too.

good catch, I overlooked it.

I think that we can take one more step here.
pg_hex_decode can be improved too.

+1

By looking a bit more precisely I detected the same checks in
common/base64.c also involving loop-changed variables. Could you please
make the same changes to base64.c?

I will take a look.

We can remove limitations from all functions that use hex functions.
byteaout from (varlena.c) has an artificial limitation to handle only

MaxAllocSize length, but

nothing prevents her from being prepared for that limitation to be

removed one day.

+1, but I don't know all implications of the type change to size_t

uint64 and size_t in 64 bits are equivalents.
uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can
handle 1GB.

Please, take a look at the new version attached.

Two remarks for decoding (by eyeball inspection of patch file only
because of still struggling with git etc.):

1. The odd-number-of-digits-check for decoding is still part of the loop.
It should be before the loop and called only once (by testing for an even
number of srclen)
So we can eliminate the block if (s == srcend) {} inside the loop!

I'm afraid that is not possible.
I tested some variations for this test and regress fails at strings tests.
Anyway for test purposes, I changed it temporarily in this version, but I'm
afraid we'll have to go back.

2. I noticed that the error messages for hex_decode should be changed as

s/encoding/decoding/

(as in pg_log_fatal("overflow of destination buffer in hex encoding");)

Ok. Changed.

If possible can you review the tests if there is an overflow at
pg_hex_encode and pg_hex_decode functions?

I would prefer to wait for another patch version from you (my development
troubles), perhaps also dealing with base64.c

base64.c can be made in another patch.

I don't know how and where any call to the encoding/decoding functions
creates an overflow situation in normal operation.

I will nonceless try the tests but I don't have any experience with it.

Thanks.

BTW the root cause for my work is an attempt to vastly accelerate these
functions (hex and base64 encode/decode), but this is left for another day
(not finished yet) and certainly only on master/new development.

I think this can speed up a little.

Lateron support on this topic would be highly appreciated...

If I can help, count on me.

regards,
Ranier Vilela

Attachments:

0001-improve-hex-functions-handling-v1.patchapplication/octet-stream; name=0001-improve-hex-functions-handling-v1.patchDownload

diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 8882444025..1a3f06b3fe 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -150,7 +150,7 @@ AddFileToBackupManifest(backup_manifest_info *manifest, const char *spcoid,
 	}
 	else
 	{
-		uint64		dstlen = pg_hex_enc_len(pathlen);
+		size_t		dstlen = pg_hex_enc_len(pathlen);
 
 		appendStringInfoString(&buf, "{ \"Encoded-Path\": \"");
 		enlargeStringInfo(&buf, dstlen);
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index acb8741734..5989eee651 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -305,7 +305,7 @@ byteain(PG_FUNCTION_ARGS)
 	if (inputText[0] == '\\' && inputText[1] == 'x')
 	{
 		size_t		len = strlen(inputText);
-		uint64		dstlen = pg_hex_dec_len(len - 2);
+		size_t		dstlen = pg_hex_dec_len(len - 2);
 
 		bc = dstlen + VARHDRSZ; /* maximum possible length */
 		result = palloc(bc);
@@ -399,7 +399,7 @@ byteaout(PG_FUNCTION_ARGS)
 
 	if (bytea_output == BYTEA_OUTPUT_HEX)
 	{
-		uint64		dstlen = pg_hex_enc_len(VARSIZE_ANY_EXHDR(vlena));
+		size_t		dstlen = pg_hex_enc_len(VARSIZE_ANY_EXHDR(vlena));
 
 		/* Print hex format */
 		rp = result = palloc(dstlen + 2 + 1);
@@ -413,8 +413,8 @@ byteaout(PG_FUNCTION_ARGS)
 	{
 		/* Print traditional escaped format */
 		char	   *vp;
-		uint64		len;
-		int			i;
+		size_t		len;
+		size_t		i;
 
 		len = 1;				/* empty string has 1 char */
 		vp = VARDATA_ANY(vlena);
diff --git a/src/common/hex.c b/src/common/hex.c
index 3b1bc8fa75..9fb9132228 100644
--- a/src/common/hex.c
+++ b/src/common/hex.c
@@ -70,30 +70,30 @@ get_hex(const char *cp)
  * Encode into hex the given string.  Returns the length of the encoded
  * string.
  */
-uint64
+size_t
 pg_hex_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 {
-	const char *end = src + srclen;
+	const char *end;
 	char	   *p;
 
-	p = dst;
-
-	while (src < end)
+	/*
+	 * Leave if there is an overflow in the area allocated for the encoded
+	 * string.
+	 */
+	if (dstlen < pg_hex_enc_len(srclen))
 	{
-		/*
-		 * Leave if there is an overflow in the area allocated for the encoded
-		 * string.
-		 */
-		if ((p - dst + 2) > dstlen)
-		{
 #ifdef FRONTEND
-			pg_log_fatal("overflow of destination buffer in hex encoding");
-			exit(EXIT_FAILURE);
+		pg_log_fatal("overflow of destination buffer in hex encoding");
+		exit(EXIT_FAILURE);
 #else
-			elog(ERROR, "overflow of destination buffer in hex encoding");
+		elog(ERROR, "overflow of destination buffer in hex encoding");
 #endif
-		}
+	}
 
+	end = src + srclen;
+	p = dst;
+	while (src < end)
+	{
 		*p++ = hextbl[(*src >> 4) & 0xF];
 		*p++ = hextbl[*src & 0xF];
 		src++;
@@ -108,7 +108,7 @@ pg_hex_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
  *
  * Decode the given hex string.  Returns the length of the decoded string.
  */
-uint64
+size_t
 pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 {
 	const char *s,
@@ -117,6 +117,35 @@ pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 				v2,
 			   *p;
 
+	/*
+	 * Leave if there is an overflow in the area allocated for the encoded
+	 * string.
+	 */
+	if (dstlen < pg_hex_dec_len(srclen))
+	{
+#ifdef FRONTEND
+		pg_log_fatal("overflow of destination buffer in hex decoding");
+		exit(EXIT_FAILURE);
+#else
+		elog(ERROR, "overflow of destination buffer in hex decoding");
+#endif
+	}
+
+	/*
+	 * Leave if there is an input length are odd size.
+	 */
+	if ((srclen % 2) != 0)
+	{
+#ifdef FRONTEND
+		pg_log_fatal("invalid hexadecimal data: odd number of digits");
+		exit(EXIT_FAILURE);
+#else
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				errmsg("invalid hexadecimal data: odd number of digits")));
+#endif
+	}
+
 	srcend = src + srclen;
 	s = src;
 	p = dst;
@@ -130,32 +159,9 @@ pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		v1 = get_hex(s) << 4;
 		s++;
 
-		if (s >= srcend)
-		{
-#ifdef FRONTEND
-			pg_log_fatal("invalid hexadecimal data: odd number of digits");
-			exit(EXIT_FAILURE);
-#else
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("invalid hexadecimal data: odd number of digits")));
-#endif
-		}
-
 		v2 = get_hex(s);
 		s++;
 
-		/* overflow check */
-		if ((p - dst + 1) > dstlen)
-		{
-#ifdef FRONTEND
-			pg_log_fatal("overflow of destination buffer in hex decoding");
-			exit(EXIT_FAILURE);
-#else
-			elog(ERROR, "overflow of destination buffer in hex decoding");
-#endif
-		}
-
 		*p++ = v1 | v2;
 	}
 
@@ -171,10 +177,10 @@ pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
  * how large a buffer allocation needs to be done before doing the actual
  * encoding.
  */
-uint64
+size_t
 pg_hex_enc_len(size_t srclen)
 {
-	return (uint64) srclen << 1;
+	return srclen << 1;
 }
 
 /*
@@ -185,8 +191,8 @@ pg_hex_enc_len(size_t srclen)
  * estimate how large a buffer allocation needs to be done before doing
  * the actual decoding.
  */
-uint64
+size_t
 pg_hex_dec_len(size_t srclen)
 {
-	return (uint64) srclen >> 1;
+	return srclen >> 1;
 }
diff --git a/src/include/common/hex.h b/src/include/common/hex.h
index 150771a14d..e515f58b8f 100644
--- a/src/include/common/hex.h
+++ b/src/include/common/hex.h
@@ -15,11 +15,11 @@
 #ifndef COMMON_HEX_H
 #define COMMON_HEX_H
 
-extern uint64 pg_hex_decode(const char *src, size_t srclen,
+extern size_t pg_hex_decode(const char *src, size_t srclen,
 							char *dst, size_t dstlen);
-extern uint64 pg_hex_encode(const char *src, size_t srclen,
+extern size_t pg_hex_encode(const char *src, size_t srclen,
 							char *dst, size_t dstlen);
-extern uint64 pg_hex_enc_len(size_t srclen);
-extern uint64 pg_hex_dec_len(size_t srclen);
+extern size_t pg_hex_enc_len(size_t srclen);
+extern size_t pg_hex_dec_len(size_t srclen);
 
 #endif							/* COMMON_HEX_H */

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Hans Buschmann (#1)

1 attachment(s)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

On Sun, Aug 15, 2021 at 02:58:59PM +0000, Hans Buschmann wrote:

If it seems useful somebody could enter it as an open item /
resolved item for pg14 after beta 3.

Hmm. Using SQLs like the following to stress pg_hex_encode(), I can
see a measurable difference easily, so no need of pg_dump or an
instance with many LOs:
set work_mem = '1GB';
with hex_values as materialized
(select repeat(i::text, N)::bytea as i
from generate_series(1, M) t(i))
SELECT count(encode(i, 'hex')) from hex_values;

With N = 1000, M = 100000, perf shows a 34.88% use of pg_hex_encode().
On REL_13_STABLE, I get down ~27% for hex_encode(), the same backend
routine.

The runtime numbers are more interesting, HEAD gets up to 1600ms.
With the latest version of the patch, we get down to 1550ms. Now, a
simple revert of ccf4e277 and aef8948 brings me down to the same
runtimes as ~13, closer to 1450ms. There seem to be some noise in the
tests, but the difference is clear.

Considering that commit aef8948 also involved a cleanup of
src/common/hex_decode.c, I think that it would be saner to also revert
c3826f83, so as we have a clean state of the code to work on should
the work on the data encryption patch set move on in the future. It
is not decided that this would actually use any of the hex
decode/encode code, either.

Honestly, I don't like much messing with this stuff after beta3, and
one of the motives of moving the hex handling code to src/common/ was
for the data encryption code whose state is not known as of late.
This does not prevent working on such refactorings in the future, of
course, keeping the performance impact in mind.

In short, I am planning to address this regression with the attached,
for a combined revert of 0d70d30, 5c33ba5 and 92436a7.
--
Michael

Attachments:

0001-Revert-refactoring-of-hex-code-to-src-common.patchtext/x-diff; charset=us-asciiDownload

From 55ca0af0d5e40a57b468dac21c0b1ad294a96df5 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 17 Aug 2021 10:57:02 +0900
Subject: [PATCH] Revert refactoring of hex code to src/common/

This is a combined revert of aef8948, ccf4e27, c3826f8.
---
 src/include/common/hex.h                  |  25 ---
 src/include/common/sha2.h                 |   4 +
 src/include/utils/builtins.h              |   4 +
 src/backend/replication/backup_manifest.c |  30 ++--
 src/backend/utils/adt/encode.c            | 158 +++++++++++-------
 src/backend/utils/adt/varlena.c           |  15 +-
 src/common/Makefile                       |   1 -
 src/common/hex.c                          | 192 ----------------------
 src/tools/msvc/Mkvcbuild.pm               |   2 +-
 9 files changed, 127 insertions(+), 304 deletions(-)
 delete mode 100644 src/include/common/hex.h
 delete mode 100644 src/common/hex.c

diff --git a/src/include/common/hex.h b/src/include/common/hex.h
deleted file mode 100644
index 150771a14d..0000000000
--- a/src/include/common/hex.h
+++ /dev/null
@@ -1,25 +0,0 @@
-/*------------------------------------------------------------------------
- *
- *	hex.h
- *	  Encoding and decoding routines for hex strings.
- *
- *	Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
- *	Portions Copyright (c) 1994, Regents of the University of California
- *
- * IDENTIFICATION
- *		  src/include/common/hex.h
- *
- *------------------------------------------------------------------------
- */
-
-#ifndef COMMON_HEX_H
-#define COMMON_HEX_H
-
-extern uint64 pg_hex_decode(const char *src, size_t srclen,
-							char *dst, size_t dstlen);
-extern uint64 pg_hex_encode(const char *src, size_t srclen,
-							char *dst, size_t dstlen);
-extern uint64 pg_hex_enc_len(size_t srclen);
-extern uint64 pg_hex_dec_len(size_t srclen);
-
-#endif							/* COMMON_HEX_H */
diff --git a/src/include/common/sha2.h b/src/include/common/sha2.h
index dfeee6bceb..f4bae35af1 100644
--- a/src/include/common/sha2.h
+++ b/src/include/common/sha2.h
@@ -18,11 +18,15 @@
 /*** SHA224/256/384/512 Various Length Definitions ***********************/
 #define PG_SHA224_BLOCK_LENGTH			64
 #define PG_SHA224_DIGEST_LENGTH			28
+#define PG_SHA224_DIGEST_STRING_LENGTH	(PG_SHA224_DIGEST_LENGTH * 2 + 1)
 #define PG_SHA256_BLOCK_LENGTH			64
 #define PG_SHA256_DIGEST_LENGTH			32
+#define PG_SHA256_DIGEST_STRING_LENGTH	(PG_SHA256_DIGEST_LENGTH * 2 + 1)
 #define PG_SHA384_BLOCK_LENGTH			128
 #define PG_SHA384_DIGEST_LENGTH			48
+#define PG_SHA384_DIGEST_STRING_LENGTH	(PG_SHA384_DIGEST_LENGTH * 2 + 1)
 #define PG_SHA512_BLOCK_LENGTH			128
 #define PG_SHA512_DIGEST_LENGTH			64
+#define PG_SHA512_DIGEST_STRING_LENGTH	(PG_SHA512_DIGEST_LENGTH * 2 + 1)
 
 #endif							/* _PG_SHA2_H_ */
diff --git a/src/include/utils/builtins.h b/src/include/utils/builtins.h
index f3ce4fb173..40fcb0ab6d 100644
--- a/src/include/utils/builtins.h
+++ b/src/include/utils/builtins.h
@@ -31,6 +31,10 @@ extern void domain_check(Datum value, bool isnull, Oid domainType,
 extern int	errdatatype(Oid datatypeOid);
 extern int	errdomainconstraint(Oid datatypeOid, const char *conname);
 
+/* encode.c */
+extern uint64 hex_encode(const char *src, size_t len, char *dst);
+extern uint64 hex_decode(const char *src, size_t len, char *dst);
+
 /* int.c */
 extern int2vector *buildint2vector(const int16 *int2s, int n);
 
diff --git a/src/backend/replication/backup_manifest.c b/src/backend/replication/backup_manifest.c
index 8882444025..921de47a6c 100644
--- a/src/backend/replication/backup_manifest.c
+++ b/src/backend/replication/backup_manifest.c
@@ -13,11 +13,11 @@
 #include "postgres.h"
 
 #include "access/timeline.h"
-#include "common/hex.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
 #include "mb/pg_wchar.h"
 #include "replication/backup_manifest.h"
+#include "utils/builtins.h"
 #include "utils/json.h"
 
 static void AppendStringToManifest(backup_manifest_info *manifest, char *s);
@@ -150,12 +150,10 @@ AddFileToBackupManifest(backup_manifest_info *manifest, const char *spcoid,
 	}
 	else
 	{
-		uint64		dstlen = pg_hex_enc_len(pathlen);
-
 		appendStringInfoString(&buf, "{ \"Encoded-Path\": \"");
-		enlargeStringInfo(&buf, dstlen);
-		buf.len += pg_hex_encode(pathname, pathlen,
-								 &buf.data[buf.len], dstlen);
+		enlargeStringInfo(&buf, 2 * pathlen);
+		buf.len += hex_encode(pathname, pathlen,
+							  &buf.data[buf.len]);
 		appendStringInfoString(&buf, "\", ");
 	}
 
@@ -178,7 +176,6 @@ AddFileToBackupManifest(backup_manifest_info *manifest, const char *spcoid,
 	{
 		uint8		checksumbuf[PG_CHECKSUM_MAX_LENGTH];
 		int			checksumlen;
-		uint64		dstlen;
 
 		checksumlen = pg_checksum_final(checksum_ctx, checksumbuf);
 		if (checksumlen < 0)
@@ -188,10 +185,9 @@ AddFileToBackupManifest(backup_manifest_info *manifest, const char *spcoid,
 		appendStringInfo(&buf,
 						 ", \"Checksum-Algorithm\": \"%s\", \"Checksum\": \"",
 						 pg_checksum_type_name(checksum_ctx->type));
-		dstlen = pg_hex_enc_len(checksumlen);
-		enlargeStringInfo(&buf, dstlen);
-		buf.len += pg_hex_encode((char *) checksumbuf, checksumlen,
-								 &buf.data[buf.len], dstlen);
+		enlargeStringInfo(&buf, 2 * checksumlen);
+		buf.len += hex_encode((char *) checksumbuf, checksumlen,
+							  &buf.data[buf.len]);
 		appendStringInfoChar(&buf, '"');
 	}
 
@@ -311,9 +307,8 @@ SendBackupManifest(backup_manifest_info *manifest)
 {
 	StringInfoData protobuf;
 	uint8		checksumbuf[PG_SHA256_DIGEST_LENGTH];
-	char	   *checksumstringbuf;
+	char		checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH];
 	size_t		manifest_bytes_done = 0;
-	uint64		dstlen;
 
 	if (!IsManifestEnabled(manifest))
 		return;
@@ -334,11 +329,10 @@ SendBackupManifest(backup_manifest_info *manifest)
 							sizeof(checksumbuf)) < 0)
 		elog(ERROR, "failed to finalize checksum of backup manifest");
 	AppendStringToManifest(manifest, "\"Manifest-Checksum\": \"");
-	dstlen = pg_hex_enc_len(sizeof(checksumbuf));
-	checksumstringbuf = palloc0(dstlen + 1);	/* includes \0 */
-	pg_hex_encode((char *) checksumbuf, sizeof(checksumbuf),
-				  checksumstringbuf, dstlen);
-	checksumstringbuf[dstlen] = '\0';
+
+	hex_encode((char *) checksumbuf, sizeof checksumbuf, checksumstringbuf);
+	checksumstringbuf[PG_SHA256_DIGEST_STRING_LENGTH - 1] = '\0';
+
 	AppendStringToManifest(manifest, checksumstringbuf);
 	AppendStringToManifest(manifest, "\"}\n");
 
diff --git a/src/backend/utils/adt/encode.c b/src/backend/utils/adt/encode.c
index 8449aaac56..6dd93f9a32 100644
--- a/src/backend/utils/adt/encode.c
+++ b/src/backend/utils/adt/encode.c
@@ -15,7 +15,6 @@
 
 #include <ctype.h>
 
-#include "common/hex.h"
 #include "mb/pg_wchar.h"
 #include "utils/builtins.h"
 #include "utils/memutils.h"
@@ -32,12 +31,10 @@
  */
 struct pg_encoding
 {
-	uint64		(*encode_len) (const char *src, size_t srclen);
-	uint64		(*decode_len) (const char *src, size_t srclen);
-	uint64		(*encode) (const char *src, size_t srclen,
-						   char *dst, size_t dstlen);
-	uint64		(*decode) (const char *src, size_t srclen,
-						   char *dst, size_t dstlen);
+	uint64		(*encode_len) (const char *data, size_t dlen);
+	uint64		(*decode_len) (const char *data, size_t dlen);
+	uint64		(*encode) (const char *data, size_t dlen, char *res);
+	uint64		(*decode) (const char *data, size_t dlen, char *res);
 };
 
 static const struct pg_encoding *pg_find_encoding(const char *name);
@@ -83,7 +80,11 @@ binary_encode(PG_FUNCTION_ARGS)
 
 	result = palloc(VARHDRSZ + resultlen);
 
-	res = enc->encode(dataptr, datalen, VARDATA(result), resultlen);
+	res = enc->encode(dataptr, datalen, VARDATA(result));
+
+	/* Make this FATAL 'cause we've trodden on memory ... */
+	if (res > resultlen)
+		elog(FATAL, "overflow - encode estimate too small");
 
 	SET_VARSIZE(result, VARHDRSZ + res);
 
@@ -127,7 +128,11 @@ binary_decode(PG_FUNCTION_ARGS)
 
 	result = palloc(VARHDRSZ + resultlen);
 
-	res = enc->decode(dataptr, datalen, VARDATA(result), resultlen);
+	res = enc->decode(dataptr, datalen, VARDATA(result));
+
+	/* Make this FATAL 'cause we've trodden on memory ... */
+	if (res > resultlen)
+		elog(FATAL, "overflow - decode estimate too small");
 
 	SET_VARSIZE(result, VARHDRSZ + res);
 
@@ -139,20 +144,95 @@ binary_decode(PG_FUNCTION_ARGS)
  * HEX
  */
 
-/*
- * Those two wrappers are still needed to match with the layer of
- * src/common/.
- */
+static const char hextbl[] = "0123456789abcdef";
+
+static const int8 hexlookup[128] = {
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
+};
+
+uint64
+hex_encode(const char *src, size_t len, char *dst)
+{
+	const char *end = src + len;
+
+	while (src < end)
+	{
+		*dst++ = hextbl[(*src >> 4) & 0xF];
+		*dst++ = hextbl[*src & 0xF];
+		src++;
+	}
+	return (uint64) len * 2;
+}
+
+static inline char
+get_hex(const char *cp)
+{
+	unsigned char c = (unsigned char) *cp;
+	int			res = -1;
+
+	if (c < 127)
+		res = hexlookup[c];
+
+	if (res < 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("invalid hexadecimal digit: \"%.*s\"",
+						pg_mblen(cp), cp)));
+
+	return (char) res;
+}
+
+uint64
+hex_decode(const char *src, size_t len, char *dst)
+{
+	const char *s,
+			   *srcend;
+	char		v1,
+				v2,
+			   *p;
+
+	srcend = src + len;
+	s = src;
+	p = dst;
+	while (s < srcend)
+	{
+		if (*s == ' ' || *s == '\n' || *s == '\t' || *s == '\r')
+		{
+			s++;
+			continue;
+		}
+		v1 = get_hex(s) << 4;
+		s++;
+		if (s >= srcend)
+			ereport(ERROR,
+					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+					 errmsg("invalid hexadecimal data: odd number of digits")));
+
+		v2 = get_hex(s);
+		s++;
+		*p++ = v1 | v2;
+	}
+
+	return p - dst;
+}
+
 static uint64
 hex_enc_len(const char *src, size_t srclen)
 {
-	return pg_hex_enc_len(srclen);
+	return (uint64) srclen << 1;
 }
 
 static uint64
 hex_dec_len(const char *src, size_t srclen)
 {
-	return pg_hex_dec_len(srclen);
+	return (uint64) srclen >> 1;
 }
 
 /*
@@ -174,12 +254,12 @@ static const int8 b64lookup[128] = {
 };
 
 static uint64
-pg_base64_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
+pg_base64_encode(const char *src, size_t len, char *dst)
 {
 	char	   *p,
 			   *lend = dst + 76;
 	const char *s,
-			   *end = src + srclen;
+			   *end = src + len;
 	int			pos = 2;
 	uint32		buf = 0;
 
@@ -195,8 +275,6 @@ pg_base64_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		/* write it out */
 		if (pos < 0)
 		{
-			if ((p - dst + 4) > dstlen)
-				elog(ERROR, "overflow of destination buffer in base64 encoding");
 			*p++ = _base64[(buf >> 18) & 0x3f];
 			*p++ = _base64[(buf >> 12) & 0x3f];
 			*p++ = _base64[(buf >> 6) & 0x3f];
@@ -207,30 +285,25 @@ pg_base64_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		}
 		if (p >= lend)
 		{
-			if ((p - dst + 1) > dstlen)
-				elog(ERROR, "overflow of destination buffer in base64 encoding");
 			*p++ = '\n';
 			lend = p + 76;
 		}
 	}
 	if (pos != 2)
 	{
-		if ((p - dst + 4) > dstlen)
-			elog(ERROR, "overflow of destination buffer in base64 encoding");
 		*p++ = _base64[(buf >> 18) & 0x3f];
 		*p++ = _base64[(buf >> 12) & 0x3f];
 		*p++ = (pos == 0) ? _base64[(buf >> 6) & 0x3f] : '=';
 		*p++ = '=';
 	}
 
-	Assert((p - dst) <= dstlen);
 	return p - dst;
 }
 
 static uint64
-pg_base64_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
+pg_base64_decode(const char *src, size_t len, char *dst)
 {
-	const char *srcend = src + srclen,
+	const char *srcend = src + len,
 			   *s = src;
 	char	   *p = dst;
 	char		c;
@@ -278,21 +351,11 @@ pg_base64_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		pos++;
 		if (pos == 4)
 		{
-			if ((p - dst + 1) > dstlen)
-				elog(ERROR, "overflow of destination buffer in base64 decoding");
 			*p++ = (buf >> 16) & 255;
 			if (end == 0 || end > 1)
-			{
-				if ((p - dst + 1) > dstlen)
-					elog(ERROR, "overflow of destination buffer in base64 decoding");
 				*p++ = (buf >> 8) & 255;
-			}
 			if (end == 0 || end > 2)
-			{
-				if ((p - dst + 1) > dstlen)
-					elog(ERROR, "overflow of destination buffer in base64 decoding");
 				*p++ = buf & 255;
-			}
 			buf = 0;
 			pos = 0;
 		}
@@ -304,7 +367,6 @@ pg_base64_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 				 errmsg("invalid base64 end sequence"),
 				 errhint("Input data is missing padding, is truncated, or is otherwise corrupted.")));
 
-	Assert((p - dst) <= dstlen);
 	return p - dst;
 }
 
@@ -340,7 +402,7 @@ pg_base64_dec_len(const char *src, size_t srclen)
 #define DIG(VAL)		((VAL) + '0')
 
 static uint64
-esc_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
+esc_encode(const char *src, size_t srclen, char *dst)
 {
 	const char *end = src + srclen;
 	char	   *rp = dst;
@@ -352,8 +414,6 @@ esc_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 
 		if (c == '\0' || IS_HIGHBIT_SET(c))
 		{
-			if ((rp - dst + 4) > dstlen)
-				elog(ERROR, "overflow of destination buffer in escape encoding");
 			rp[0] = '\\';
 			rp[1] = DIG(c >> 6);
 			rp[2] = DIG((c >> 3) & 7);
@@ -363,8 +423,6 @@ esc_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		}
 		else if (c == '\\')
 		{
-			if ((rp - dst + 2) > dstlen)
-				elog(ERROR, "overflow of destination buffer in escape encoding");
 			rp[0] = '\\';
 			rp[1] = '\\';
 			rp += 2;
@@ -372,8 +430,6 @@ esc_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		}
 		else
 		{
-			if ((rp - dst + 1) > dstlen)
-				elog(ERROR, "overflow of destination buffer in escape encoding");
 			*rp++ = c;
 			len++;
 		}
@@ -381,12 +437,11 @@ esc_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		src++;
 	}
 
-	Assert((rp - dst) <= dstlen);
 	return len;
 }
 
 static uint64
-esc_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
+esc_decode(const char *src, size_t srclen, char *dst)
 {
 	const char *end = src + srclen;
 	char	   *rp = dst;
@@ -395,11 +450,7 @@ esc_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 	while (src < end)
 	{
 		if (src[0] != '\\')
-		{
-			if ((rp - dst + 1) > dstlen)
-				elog(ERROR, "overflow of destination buffer in escape decoding");
 			*rp++ = *src++;
-		}
 		else if (src + 3 < end &&
 				 (src[1] >= '0' && src[1] <= '3') &&
 				 (src[2] >= '0' && src[2] <= '7') &&
@@ -411,16 +462,12 @@ esc_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 			val <<= 3;
 			val += VAL(src[2]);
 			val <<= 3;
-			if ((rp - dst + 1) > dstlen)
-				elog(ERROR, "overflow of destination buffer in escape decoding");
 			*rp++ = val + VAL(src[3]);
 			src += 4;
 		}
 		else if (src + 1 < end &&
 				 (src[1] == '\\'))
 		{
-			if ((rp - dst + 1) > dstlen)
-				elog(ERROR, "overflow of destination buffer in escape decoding");
 			*rp++ = '\\';
 			src += 2;
 		}
@@ -438,7 +485,6 @@ esc_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
 		len++;
 	}
 
-	Assert((rp - dst) <= dstlen);
 	return len;
 }
 
@@ -520,7 +566,7 @@ static const struct
 	{
 		"hex",
 		{
-			hex_enc_len, hex_dec_len, pg_hex_encode, pg_hex_decode
+			hex_enc_len, hex_dec_len, hex_encode, hex_decode
 		}
 	},
 	{
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index acb8741734..bd3091bbfb 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -22,7 +22,6 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_type.h"
 #include "common/hashfn.h"
-#include "common/hex.h"
 #include "common/int.h"
 #include "common/unicode_norm.h"
 #include "lib/hyperloglog.h"
@@ -305,12 +304,10 @@ byteain(PG_FUNCTION_ARGS)
 	if (inputText[0] == '\\' && inputText[1] == 'x')
 	{
 		size_t		len = strlen(inputText);
-		uint64		dstlen = pg_hex_dec_len(len - 2);
 
-		bc = dstlen + VARHDRSZ; /* maximum possible length */
+		bc = (len - 2) / 2 + VARHDRSZ;	/* maximum possible length */
 		result = palloc(bc);
-
-		bc = pg_hex_decode(inputText + 2, len - 2, VARDATA(result), dstlen);
+		bc = hex_decode(inputText + 2, len - 2, VARDATA(result));
 		SET_VARSIZE(result, bc + VARHDRSZ); /* actual length */
 
 		PG_RETURN_BYTEA_P(result);
@@ -399,15 +396,11 @@ byteaout(PG_FUNCTION_ARGS)
 
 	if (bytea_output == BYTEA_OUTPUT_HEX)
 	{
-		uint64		dstlen = pg_hex_enc_len(VARSIZE_ANY_EXHDR(vlena));
-
 		/* Print hex format */
-		rp = result = palloc(dstlen + 2 + 1);
+		rp = result = palloc(VARSIZE_ANY_EXHDR(vlena) * 2 + 2 + 1);
 		*rp++ = '\\';
 		*rp++ = 'x';
-
-		rp += pg_hex_encode(VARDATA_ANY(vlena), VARSIZE_ANY_EXHDR(vlena), rp,
-							dstlen);
+		rp += hex_encode(VARDATA_ANY(vlena), VARSIZE_ANY_EXHDR(vlena), rp);
 	}
 	else if (bytea_output == BYTEA_OUTPUT_ESCAPE)
 	{
diff --git a/src/common/Makefile b/src/common/Makefile
index 38a8599337..880722fcf5 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -58,7 +58,6 @@ OBJS_COMMON = \
 	file_perm.o \
 	file_utils.o \
 	hashfn.o \
-	hex.o \
 	ip.o \
 	jsonapi.o \
 	keywords.o \
diff --git a/src/common/hex.c b/src/common/hex.c
deleted file mode 100644
index 3b1bc8fa75..0000000000
--- a/src/common/hex.c
+++ /dev/null
@@ -1,192 +0,0 @@
-/*-------------------------------------------------------------------------
- *
- * hex.c
- *	  Encoding and decoding routines for hex.
- *
- * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
- * Portions Copyright (c) 1994, Regents of the University of California
- *
- * IDENTIFICATION
- *	  src/common/hex.c
- *
- *-------------------------------------------------------------------------
- */
-
-
-#ifndef FRONTEND
-#include "postgres.h"
-#else
-#include "postgres_fe.h"
-#endif
-
-#include "common/hex.h"
-#ifdef FRONTEND
-#include "common/logging.h"
-#endif
-#include "mb/pg_wchar.h"
-
-
-static const int8 hexlookup[128] = {
-	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-	0, 1, 2, 3, 4, 5, 6, 7, 8, 9, -1, -1, -1, -1, -1, -1,
-	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-	-1, 10, 11, 12, 13, 14, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-};
-
-static const char hextbl[] = "0123456789abcdef";
-
-static inline char
-get_hex(const char *cp)
-{
-	unsigned char c = (unsigned char) *cp;
-	int			res = -1;
-
-	if (c < 127)
-		res = hexlookup[c];
-
-	if (res < 0)
-	{
-#ifdef FRONTEND
-		pg_log_fatal("invalid hexadecimal digit");
-		exit(EXIT_FAILURE);
-#else
-		ereport(ERROR,
-				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-				 errmsg("invalid hexadecimal digit: \"%.*s\"",
-						pg_mblen(cp), cp)));
-#endif
-	}
-
-	return (char) res;
-}
-
-/*
- * pg_hex_encode
- *
- * Encode into hex the given string.  Returns the length of the encoded
- * string.
- */
-uint64
-pg_hex_encode(const char *src, size_t srclen, char *dst, size_t dstlen)
-{
-	const char *end = src + srclen;
-	char	   *p;
-
-	p = dst;
-
-	while (src < end)
-	{
-		/*
-		 * Leave if there is an overflow in the area allocated for the encoded
-		 * string.
-		 */
-		if ((p - dst + 2) > dstlen)
-		{
-#ifdef FRONTEND
-			pg_log_fatal("overflow of destination buffer in hex encoding");
-			exit(EXIT_FAILURE);
-#else
-			elog(ERROR, "overflow of destination buffer in hex encoding");
-#endif
-		}
-
-		*p++ = hextbl[(*src >> 4) & 0xF];
-		*p++ = hextbl[*src & 0xF];
-		src++;
-	}
-
-	Assert((p - dst) <= dstlen);
-	return p - dst;
-}
-
-/*
- * pg_hex_decode
- *
- * Decode the given hex string.  Returns the length of the decoded string.
- */
-uint64
-pg_hex_decode(const char *src, size_t srclen, char *dst, size_t dstlen)
-{
-	const char *s,
-			   *srcend;
-	char		v1,
-				v2,
-			   *p;
-
-	srcend = src + srclen;
-	s = src;
-	p = dst;
-	while (s < srcend)
-	{
-		if (*s == ' ' || *s == '\n' || *s == '\t' || *s == '\r')
-		{
-			s++;
-			continue;
-		}
-		v1 = get_hex(s) << 4;
-		s++;
-
-		if (s >= srcend)
-		{
-#ifdef FRONTEND
-			pg_log_fatal("invalid hexadecimal data: odd number of digits");
-			exit(EXIT_FAILURE);
-#else
-			ereport(ERROR,
-					(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
-					 errmsg("invalid hexadecimal data: odd number of digits")));
-#endif
-		}
-
-		v2 = get_hex(s);
-		s++;
-
-		/* overflow check */
-		if ((p - dst + 1) > dstlen)
-		{
-#ifdef FRONTEND
-			pg_log_fatal("overflow of destination buffer in hex decoding");
-			exit(EXIT_FAILURE);
-#else
-			elog(ERROR, "overflow of destination buffer in hex decoding");
-#endif
-		}
-
-		*p++ = v1 | v2;
-	}
-
-	Assert((p - dst) <= dstlen);
-	return p - dst;
-}
-
-/*
- * pg_hex_enc_len
- *
- * Returns to caller the length of the string if it were encoded with
- * hex based on the length provided by caller.  This is useful to estimate
- * how large a buffer allocation needs to be done before doing the actual
- * encoding.
- */
-uint64
-pg_hex_enc_len(size_t srclen)
-{
-	return (uint64) srclen << 1;
-}
-
-/*
- * pg_hex_dec_len
- *
- * Returns to caller the length of the string if it were to be decoded
- * with hex, based on the length given by caller.  This is useful to
- * estimate how large a buffer allocation needs to be done before doing
- * the actual decoding.
- */
-uint64
-pg_hex_dec_len(size_t srclen)
-{
-	return (uint64) srclen >> 1;
-}
diff --git a/src/tools/msvc/Mkvcbuild.pm b/src/tools/msvc/Mkvcbuild.pm
index 3cb46832ab..4362bd44fd 100644
--- a/src/tools/msvc/Mkvcbuild.pm
+++ b/src/tools/msvc/Mkvcbuild.pm
@@ -125,7 +125,7 @@ sub mkvcbuild
 	our @pgcommonallfiles = qw(
 	  archive.c base64.c checksum_helper.c
 	  config_info.c controldata_utils.c d2s.c encnames.c exec.c
-	  f2s.c file_perm.c file_utils.c hashfn.c hex.c ip.c jsonapi.c
+	  f2s.c file_perm.c file_utils.c hashfn.c ip.c jsonapi.c
 	  keywords.c kwlookup.c link-canary.c md5_common.c
 	  pg_get_line.c pg_lzcompress.c pgfnames.c psprintf.c relpath.c rmtree.c
 	  saslprep.c scram-common.c string.c stringinfo.c unicode_norm.c username.c
-- 
2.32.0

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Ranier Vilela (#4)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

On Mon, Aug 16, 2021 at 02:06:31PM -0300, Ranier Vilela wrote:

uint64 and size_t in 64 bits are equivalents.
uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits can
handle 1GB.

There is usually a reason why things are done as they are, so before
suggesting changing something I would advise to read the threads that
created those changes more carefully because they could be mentioned.
In this case, we are talking about aef8948, and this part of its
related thread:
/messages/by-id/20210105034739.GG7432@momjian.us

base64.c can be made in another patch.

This file is a set of code paths used by authentication and SCRAM, it
will never get as hot as the code paths that Hans has reported as
these are one-time operations. Please note as well cfc40d3 for the
reasons why things are handled this way. We absolutely have to use
safe routines here.
--
Michael

Ranier Vilela

ranier.vf@gmail.com

over 4 years ago

In reply to: Michael Paquier (#6)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

Em ter., 17 de ago. de 2021 às 00:43, Michael Paquier <michael@paquier.xyz>
escreveu:

On Mon, Aug 16, 2021 at 02:06:31PM -0300, Ranier Vilela wrote:

uint64 and size_t in 64 bits are equivalents.
uint64 and size_t in 32 bits can be weird, but anyway size_t at 32 bits

can

handle 1GB.

There is usually a reason why things are done as they are, so before
suggesting changing something I would advise to read the threads that
created those changes more carefully because they could be mentioned.
In this case, we are talking about aef8948, and this part of its
related thread:
/messages/by-id/20210105034739.GG7432@momjian.us

Because things have always been done a certain way doesn't mean it's right.
Using int to address strings is an mistake.
Even with an artificial limitation to only deal with 1GB, the correct one
to use would be size_t.

base64.c can be made in another patch.

This file is a set of code paths used by authentication and SCRAM, it
will never get as hot as the code paths that Hans has reported as
these are one-time operations. Please note as well cfc40d3 for the
reasons why things are handled this way. We absolutely have to use
safe routines here.

I have no plans to touch base64.c

regards,
Ranier Vilela

Tom Lane

tgl@sss.pgh.pa.us

over 4 years ago

In reply to: Michael Paquier (#5)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

Michael Paquier <michael@paquier.xyz> writes:

In short, I am planning to address this regression with the attached,
for a combined revert of 0d70d30, 5c33ba5 and 92436a7.

OK, but the commit message should explain why they're getting reverted.

regards, tom lane

Bruce Momjian

bruce@momjian.us

over 4 years ago

In reply to: Tom Lane (#8)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

On Tue, Aug 17, 2021 at 09:39:30AM -0400, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

In short, I am planning to address this regression with the attached,
for a combined revert of 0d70d30, 5c33ba5 and 92436a7.

OK, but the commit message should explain why they're getting reverted.

Uh, I don't see those commits, e.g.:

$ git diff 0d70d30
fatal: ambiguous argument '0d70d30': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

$ git diff 5c33ba5
fatal: ambiguous argument '5c33ba5': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

$ git diff 92436a7
fatal: ambiguous argument '92436a7': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

#10

Julien Rouhaud

rjuju123@gmail.com

over 4 years ago

In reply to: Bruce Momjian (#9)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

On Tue, Aug 17, 2021 at 11:26 PM Bruce Momjian <bruce@momjian.us> wrote:

On Tue, Aug 17, 2021 at 09:39:30AM -0400, Tom Lane wrote:

Michael Paquier <michael@paquier.xyz> writes:

In short, I am planning to address this regression with the attached,
for a combined revert of 0d70d30, 5c33ba5 and 92436a7.

OK, but the commit message should explain why they're getting reverted.

Uh, I don't see those commits, e.g.:

$ git diff 0d70d30
fatal: ambiguous argument '0d70d30': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

$ git diff 5c33ba5
fatal: ambiguous argument '5c33ba5': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

$ git diff 92436a7
fatal: ambiguous argument '92436a7': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Same here. I'm assuming that the real commits are the one mentioned
in the patch, which are c3826f8, aef8948 and ccf4e27.

#11

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Julien Rouhaud (#10)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

On Wed, Aug 18, 2021 at 12:34:45AM +0800, Julien Rouhaud wrote:

On Tue, Aug 17, 2021 at 11:26 PM Bruce Momjian <bruce@momjian.us> wrote:

Uh, I don't see those commits, e.g.:

$ git diff 0d70d30
fatal: ambiguous argument '0d70d30': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

$ git diff 5c33ba5
fatal: ambiguous argument '5c33ba5': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

$ git diff 92436a7
fatal: ambiguous argument '92436a7': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'

Same here. I'm assuming that the real commits are the one mentioned
in the patch, which are c3826f8, aef8948 and ccf4e27.

Oops, incorrect copy-paste from my side. Yes, the commits impacted
here are aef8948, ccf4e27 and c3826f8. The small-ish commit message
of the patch got thet right.
--
Michael

#12

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Tom Lane (#8)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

On Tue, Aug 17, 2021 at 09:39:30AM -0400, Tom Lane wrote:

OK, but the commit message should explain why they're getting reverted.

Sure. aef8948 gets down because of the performance impact. ccf4e27
was a cleanup following up aef8948, that loses its meaning. And
c3826f8 cannot be let alone because of the reasons why aef8948 was
introduced, as it has no safety net for out-of-bound handling in the
result buffer allocated.
--
Michael

#13

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Michael Paquier (#12)

Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode

On Wed, Aug 18, 2021 at 09:14:14AM +0900, Michael Paquier wrote:

Sure. aef8948 gets down because of the performance impact. ccf4e27
was a cleanup following up aef8948, that loses its meaning. And
c3826f8 cannot be let alone because of the reasons why aef8948 was
introduced, as it has no safety net for out-of-bound handling in the
result buffer allocated.

And done, with such a description in the commit log.
--
Michael