[PATCH] Add get_bytes() and set_bytes() functions

Started by Aleksander Alekseevover 1 year ago34 messages
#1Aleksander Alekseev
aleksander@timescale.com
1 attachment(s)

Hi,

While discussing another patch [1]/messages/by-id/CAJ7c6TNMTGnqnG=yXXUQh9E88JDckmR45H2Q+=ucaCLMOW1QQw@mail.gmail.com it was discovered that we don't
have a convenient way of casting a bytea to an integer / bigint and
vice versa, extracting integers larger than one byte from byteas, etc.

For instance, casting '\x11223344' to 0x11223344 may look like this:

```
WITH vals AS (
SELECT '\x11223344'::bytea AS x
)
SELECT
(get_byte(x, 0) :: bigint << 24) |
(get_byte(x, 1) << 16) |
(get_byte(x, 2) << 8) |
get_byte(x, 3)
FROM vals;
```

There seems to be a demand for this functionality [2]https://stackoverflow.com/questions/32944267/postgresql-converting-bytea-to-bigint[3]/messages/by-id/AANLkTikip9xs8iXc8e+Mgz1T1701i8Xk6QtbVB3KJQzX@mail.gmail.com -- Best regards, Aleksander Alekseev and it costs
us nothing to maintain it, so I propose adding it.

The proposed patch adds get_bytes() and set_bytes() functions. The
semantics is similar to get_byte() and set_byte() we already have but
the functions operate with bigints rather than bytes and the user can
specify the size of the integer. This allows working with int2s,
int4s, int8s or even int5s if needed.

Examples:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
?column?
----------
t

SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
set_bytes
--------------------
\x11aabb4455667788
```

Thoughts?

[1]: /messages/by-id/CAJ7c6TNMTGnqnG=yXXUQh9E88JDckmR45H2Q+=ucaCLMOW1QQw@mail.gmail.com
[2]: https://stackoverflow.com/questions/32944267/postgresql-converting-bytea-to-bigint
[3]: /messages/by-id/AANLkTikip9xs8iXc8e+Mgz1T1701i8Xk6QtbVB3KJQzX@mail.gmail.com -- Best regards, Aleksander Alekseev
--
Best regards,
Aleksander Alekseev

Attachments:

v1-0001-Add-get_bytes-and-set_bytes-functions.patchapplication/octet-stream; name=v1-0001-Add-get_bytes-and-set_bytes-functions.patchDownload
From 0b12f247752930f4873555045a7885ea16bc76a2 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 14 Aug 2024 12:11:47 +0300
Subject: [PATCH v1] Add get_bytes() and set_bytes() functions.

The new functions provide a convenient way of converting between integer
types and bytea. Previously there were only get_byte() and set_byte() which
operate with a single byte.

Aleksander Alekseev, reviewed by TODO FIXME
Discussion: TODO FIXME

BUMP CATVERSION
---
 doc/src/sgml/func.sgml                | 46 ++++++++++++++-
 src/backend/utils/adt/varlena.c       | 80 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  6 ++
 src/test/regress/expected/strings.out | 36 ++++++++++++
 src/test/regress/sql/strings.sql      | 10 ++++
 5 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index cdde647513..4034cc6040 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4562,6 +4562,27 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>get_bytes</primary>
+        </indexterm>
+        <function>get_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para>
+        Extracts <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        from binary string.
+       </para>
+       <para>
+        <literal>get_bytes('\x0123456789ABCDEF'::bytea, 5, 2)</literal>
+        <returnvalue>43981</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4662,6 +4683,28 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>set_bytes</primary>
+        </indexterm>
+        <function>set_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type>,
+        <parameter>newvalue</parameter> <type>bigint</type> )
+        <returnvalue>bytea</returnvalue>
+       </para>
+       <para>
+        Sets <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        in binary string to <parameter>newvalue</parameter>.
+       </para>
+       <para>
+        <literal>set_bytes('\x0123456789abcdef'::bytea, 5, 2, 0x1122)</literal>
+        <returnvalue>\x01234567891122ef</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4761,7 +4804,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
   </table>
 
   <para id="functions-zerobased-note">
-   Functions <function>get_byte</function> and <function>set_byte</function>
+   Functions <function>get_byte</function>, <function>set_byte</function>,
+   <function>get_bytes</function> and <function>set_bytes</function>
    number the first byte of a binary string as byte 0.
    Functions <function>get_bit</function> and <function>set_bit</function>
    number bits from the right within each byte; for example bit 0 is the least
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 4f9a676c93..971aab4a45 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3212,6 +3212,46 @@ byteaGetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(byte);
 }
 
+/*-------------------------------------------------------------
+ * byteaGetBytes
+ *
+ * this routine treats "bytea" as an array of bytes.
+ * It returns the N bytes at a given offset as a bigint value.
+ *-------------------------------------------------------------
+ */
+Datum
+byteaGetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		result = 0;
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(v);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+
+	while(size--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	PG_RETURN_INT64(result);
+}
+
 /*-------------------------------------------------------------
  * byteaGetBit
  *
@@ -3282,6 +3322,46 @@ byteaSetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(res);
 }
 
+/*-------------------------------------------------------------
+ * byteaSetBytes
+ *
+ * Given an instance of type 'bytea' creates a new one with
+ * the N bytes at a given offset set to the provided bigint value.
+ *
+ *-------------------------------------------------------------
+ */
+Datum
+byteaSetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *res = PG_GETARG_BYTEA_P_COPY(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		newValue = PG_GETARG_INT64(3);
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(res);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+	while(size)
+	{
+		((unsigned char*) VARDATA_ANY(res))[offset+size-1] = newValue & 0xFF;
+		newValue = newValue >> 8;
+		size--;
+	}
+
+	PG_RETURN_BYTEA_P(res);
+}
+
 /*-------------------------------------------------------------
  * byteaSetBit
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..5daa1132c7 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1441,6 +1441,12 @@
 { oid => '722', descr => 'set byte',
   proname => 'set_byte', prorettype => 'bytea',
   proargtypes => 'bytea int4 int4', prosrc => 'byteaSetByte' },
+{ oid => '8573', descr => 'get bytes',
+  proname => 'get_bytes', prorettype => 'int8', proargtypes => 'bytea int4 int4',
+  prosrc => 'byteaGetBytes' },
+{ oid => '8574', descr => 'set bytes',
+  proname => 'set_bytes', prorettype => 'bytea',
+  proargtypes => 'bytea int4 int4 int8', prosrc => 'byteaSetBytes' },
 { oid => '723', descr => 'get bit',
   proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
   prosrc => 'byteaGetBit' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d536..9fd52a1e11 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2358,6 +2358,42 @@ SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
 ERROR:  index 99 out of valid range, 0..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+ERROR:  index 1 out of valid range, 0..0
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+     set_bytes      
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+     set_bytes      
+--------------------
+ \x11aabb4455667788
+(1 row)
+
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
+ERROR:  index 1 out of valid range, 0..0
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75..d0aa44ee2c 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -750,6 +750,16 @@ SELECT get_byte('\x1234567890abcdef00'::bytea, 3);
 SELECT get_byte('\x1234567890abcdef00'::bytea, 99);  -- error
 SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
 
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
-- 
2.46.0

#2Joel Jacobson
joel@compiler.org
In reply to: Aleksander Alekseev (#1)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Wed, Aug 14, 2024, at 13:01, Aleksander Alekseev wrote:

The proposed patch adds get_bytes() and set_bytes() functions. The
semantics is similar to get_byte() and set_byte() we already have but
the functions operate with bigints rather than bytes and the user can
specify the size of the integer. This allows working with int2s,
int4s, int8s or even int5s if needed.

+1

I wanted this myself many times.

I wonder if get_bytes() and set_bytes() will behave differently
on little-endian vs big-endian systems?

If so, then I think it would be nice to enforce a consistent byte order
(e.g., big-endian), to ensure consistent behavior across platforms.

Regards,
Joel

#3Aleksander Alekseev
aleksander@timescale.com
In reply to: Joel Jacobson (#2)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi,

+1

I wanted this myself many times.

I wonder if get_bytes() and set_bytes() will behave differently
on little-endian vs big-endian systems?

If so, then I think it would be nice to enforce a consistent byte order
(e.g., big-endian), to ensure consistent behavior across platforms.

No, the returned value will not depend on the CPU endiness. Current
implementation uses big-endian / network order which in my humble
opinion is what most users would expect.

I believe we also need reverse(bytea) and repeat(bytea, integer)
functions e.g. for those who want little-endian. However I want to
propose them separately when we are done with this patch.

--
Best regards,
Aleksander Alekseev

#4Joel Jacobson
joel@compiler.org
In reply to: Aleksander Alekseev (#3)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Wed, Aug 14, 2024, at 13:31, Aleksander Alekseev wrote:

I wonder if get_bytes() and set_bytes() will behave differently
on little-endian vs big-endian systems?

No, the returned value will not depend on the CPU endiness. Current
implementation uses big-endian / network order which in my humble
opinion is what most users would expect.

Nice.

I've reviewed and tested the patch.
It looks straight-forward to me.
I don't see any potential problems.
I've marked it Ready for Committer.

I believe we also need reverse(bytea) and repeat(bytea, integer)
functions e.g. for those who want little-endian. However I want to
propose them separately when we are done with this patch.

I agree those functions would be nice too.

I also think it would be nice to provide these convenience functions:
to_bytes(bigint) -> bytea
from_bytes(bytea) -> bigint

Since if not having a current bytea value,
and just wanting to convert a bigint to bytea,
then one would need to construct an zeroed bytea
of the proper size first, to then use set_bytes().

And if just wanting to convert the entire bytea to a bigint,
then one would need to pass 0 as offset and the length
of the bytea as size.

Regards,

Joel

#5David Fetter
david@fetter.org
In reply to: Joel Jacobson (#4)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Wed, Aug 14, 2024 at 02:34:06PM +0200, Joel Jacobson wrote:

On Wed, Aug 14, 2024, at 13:31, Aleksander Alekseev wrote:

I wonder if get_bytes() and set_bytes() will behave differently
on little-endian vs big-endian systems?

No, the returned value will not depend on the CPU endiness. Current
implementation uses big-endian / network order which in my humble
opinion is what most users would expect.

Nice.

Indeed!

I've reviewed and tested the patch.
It looks straight-forward to me.
I don't see any potential problems.
I've marked it Ready for Committer.

I believe we also need reverse(bytea) and repeat(bytea, integer)
functions e.g. for those who want little-endian. However I want to
propose them separately when we are done with this patch.

I agree those functions would be nice too.

I also think it would be nice to provide these convenience functions:
to_bytes(bigint) -> bytea
from_bytes(bytea) -> bigint

Along with these, would it make sense to have other forms of these
that won't choke at 63 bits, e.g. NUMERIC or TEXT?

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

#6Joel Jacobson
joel@compiler.org
In reply to: David Fetter (#5)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:

I also think it would be nice to provide these convenience functions:
to_bytes(bigint) -> bytea
from_bytes(bytea) -> bigint

Along with these, would it make sense to have other forms of these
that won't choke at 63 bits, e.g. NUMERIC or TEXT?

I wonder what would be good names for such functions though?

Since NUMERIC can have decimal digits, and since BYTEA can already be casted to
TEXT, which will just be \x followed by the hex digits, maybe such names should
include "integer" or some other word, to indicate what is being returned?

It's already quite easy to convert to NUMERIC though,
for users who are aware of tricks like this:
SELECT ('0x'||encode('\xCAFEBABEDEADBEEF'::bytea,'hex'))::numeric;
numeric
----------------------
14627333968688430831
(1 row)

But, I think it would be better to provide functions,
since many users probably have to google+stackoverflow or gpt
to learn such tricks, which are not in the official documentation.

Regards,
Joel

#7David Fetter
david@fetter.org
In reply to: Joel Jacobson (#6)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Wed, Aug 14, 2024 at 05:39:32PM +0200, Joel Jacobson wrote:

On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:

I also think it would be nice to provide these convenience functions:
to_bytes(bigint) -> bytea
from_bytes(bytea) -> bigint

Along with these, would it make sense to have other forms of these
that won't choke at 63 bits, e.g. NUMERIC or TEXT?

I wonder what would be good names for such functions though?

decimal_to_bytes(numeric), decimal_to_bytes(text) on one side
decimal_from_bytes(bytea, typeoid)

Naming Things™ is one of the hard problems in computer science. Bad
joke that includes cache coherency and off-by-one included by
reference.

Since NUMERIC can have decimal digits, and since BYTEA can already be casted to
TEXT, which will just be \x followed by the hex digits, maybe such names should
include "integer" or some other word, to indicate what is being returned?

It's already quite easy to convert to NUMERIC though,
for users who are aware of tricks like this:
SELECT ('0x'||encode('\xCAFEBABEDEADBEEF'::bytea,'hex'))::numeric;
numeric
----------------------
14627333968688430831
(1 row)

But, I think it would be better to provide functions,
since many users probably have to google+stackoverflow or gpt
to learn such tricks, which are not in the official documentation.

As usual, I see "official documentation lacks helpful and/or
non-obvious examples" as a problem best approached by making good the
lack. I am aware that my ideas about pedagogy, documentation, etc. are
not shared universally, but they're widely shared by people whose main
interaction with documents is trying to get help from them.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

#8Joel Jacobson
joel@compiler.org
In reply to: David Fetter (#7)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Wed, Aug 14, 2024, at 18:31, David Fetter wrote:

On Wed, Aug 14, 2024 at 05:39:32PM +0200, Joel Jacobson wrote:

On Wed, Aug 14, 2024, at 16:43, David Fetter wrote:

I also think it would be nice to provide these convenience functions:
to_bytes(bigint) -> bytea
from_bytes(bytea) -> bigint

Along with these, would it make sense to have other forms of these
that won't choke at 63 bits, e.g. NUMERIC or TEXT?

I wonder what would be good names for such functions though?

decimal_to_bytes(numeric), decimal_to_bytes(text) on one side
decimal_from_bytes(bytea, typeoid)

I assume decimal_to_bytes() will only accept integer numerics,
that is, that don't have a decimal digits part?
Hmm, it's perhaps then a bit counter intuitive that the name
contains "decimal", since some people might associate the word
"decimal" stronger with "decimal digits" rather than the radix/base 10.

What do we want to happen if passing a numeric with decimal digits,
to decimal_to_bytes()? It must be an error, right?

Example: SELECT decimal_to_bytes(1.23);

Naming Things™ is one of the hard problems in computer science. Bad
joke that includes cache coherency and off-by-one included by
reference.

So true :)

As usual, I see "official documentation lacks helpful and/or
non-obvious examples" as a problem best approached by making good the
lack. I am aware that my ideas about pedagogy, documentation, etc. are
not shared universally, but they're widely shared by people whose main
interaction with documents is trying to get help from them.

Well spoken.

Regards,
Joel

#9Joel Jacobson
joel@compiler.org
In reply to: Joel Jacobson (#8)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Wed, Aug 14, 2024, at 19:25, Joel Jacobson wrote:

What do we want to happen if passing a numeric with decimal digits,
to decimal_to_bytes()? It must be an error, right?

Example: SELECT decimal_to_bytes(1.23);

Hmm, an error feels quite ugly on second thought.
Would be nicer if all numerics could be represented,
in a bytea representation that is meaningful to other systems.

I think we need a tuple somehow.

Example:

SELECT numeric_to_bytes(223195403574957);
(\xcafebabedeadbeef,0,false)

SELECT numeric_to_bytes(-223195403574957);
(\xcafebabedeadbeef,0,true)

SELECT numeric_to_bytes(2231954035749.57);
(\xcafebabedeadbeef,2,false)

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,0,false);
223195403574957

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,0,true);
-223195403574957

SELECT numeric_from_bytes('\xcafebabedeadbeef'::bytea,2,false);
2231954035749.57

But then what about Inf,-Inf,NaN?

Regards,
Joel

#10Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Joel Jacobson (#9)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Thu, 15 Aug 2024 at 05:20, Joel Jacobson <joel@compiler.org> wrote:

On Wed, Aug 14, 2024, at 19:25, Joel Jacobson wrote:

What do we want to happen if passing a numeric with decimal digits,
to decimal_to_bytes()? It must be an error, right?

Example: SELECT decimal_to_bytes(1.23);

Hmm, an error feels quite ugly on second thought.
Would be nicer if all numerics could be represented,

But then what about Inf,-Inf,NaN?

Perhaps we should also add casts between bytea and the integer/numeric
types. That might be easier to use than functions in some
circumstances.

When casting a numeric to an integer, the result is rounded to the
nearest integer, and NaN/Inf generate errors, so we should probably do
the same here.

Regards,
Dean

#11Aleksander Alekseev
aleksander@timescale.com
In reply to: Dean Rasheed (#10)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi,

Perhaps we should also add casts between bytea and the integer/numeric
types. That might be easier to use than functions in some
circumstances.

When casting a numeric to an integer, the result is rounded to the
nearest integer, and NaN/Inf generate errors, so we should probably do
the same here.

Yes, I was also thinking about adding NUMERIC versions of get_bytes()
/ set_bytes(). This would allow converting more than 8 bytes to/from
an integer. I dropped this idea because I thought there would be not
much practical use for it. On the flip side you never know who uses
Postgres and for what purpose.

I will add corresponding casts unless the idea will get a push-back
from the community. IMO the existence of these casts will at least not
make things worse.

--
Best regards,
Aleksander Alekseev

#12Yugo Nagata
nagata@sraoss.co.jp
In reply to: Aleksander Alekseev (#11)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Thu, 15 Aug 2024 13:58:03 +0300
Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

Perhaps we should also add casts between bytea and the integer/numeric
types. That might be easier to use than functions in some
circumstances.

When casting a numeric to an integer, the result is rounded to the
nearest integer, and NaN/Inf generate errors, so we should probably do
the same here.

Yes, I was also thinking about adding NUMERIC versions of get_bytes()
/ set_bytes(). This would allow converting more than 8 bytes to/from
an integer. I dropped this idea because I thought there would be not
much practical use for it. On the flip side you never know who uses
Postgres and for what purpose.

I will add corresponding casts unless the idea will get a push-back
from the community. IMO the existence of these casts will at least not
make things worse.

When we add such casts between bytea and the integer/numeric types,
one of the problems mentioned the first of the thread, that is,
"we don't have a convenient way of casting a bytea to an integer / bigint
and vice versa", would seem be resolved.

On the other hand, I suppose get_bytes() and set_bytes() are still useful
for extracting bytes from byteas, etc. If casting is no longer the main
purpose of these functions, are variations that get_bytes returns bytea
instead of bigint, and set_bytes receives bytea as the newvalue argument
useful? I wonder it would eliminate the restrict that size cannot be larger
than 8.

Here are my very trivial comments on the patch.

+ * this routine treats "bytea" as an array of bytes.

Maybe, the sentence should start with "This ... ".

+ while(size)
+ {

I wonder inserting a space after "while" is the standard style.

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

#13Aleksander Alekseev
aleksander@timescale.com
In reply to: Yugo Nagata (#12)
1 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi,

When we add such casts between bytea and the integer/numeric types,
one of the problems mentioned the first of the thread, that is,
"we don't have a convenient way of casting a bytea to an integer / bigint
and vice versa", would seem be resolved.

On the other hand, I suppose get_bytes() and set_bytes() are still useful
for extracting bytes from byteas, etc. If casting is no longer the main
purpose of these functions, are variations that get_bytes returns bytea
instead of bigint, and set_bytes receives bytea as the newvalue argument
useful? I wonder it would eliminate the restrict that size cannot be larger
than 8.

No, casting between bytea and numeric will not replace get_bytes() /
set_bytes() for performance reasons.

Consider the case when you want to extract an int4 from a bytea.
get_bytes() is going to be very fast while substr() -> ::numeric ->
::integer chain will require unnecessary copying and conversions.
Casting between bytea and numeric is only useful when one has to deal
with integers larger than 8 bytes. Whether this happens often is a
debatable question.

Here are my very trivial comments on the patch.

+ * this routine treats "bytea" as an array of bytes.

Maybe, the sentence should start with "This ... ".

+ while(size)
+ {

I wonder inserting a space after "while" is the standard style.

Thanks, fixed.

--
Best regards,
Aleksander Alekseev

Attachments:

v2-0001-Add-get_bytes-and-set_bytes-functions.patchapplication/octet-stream; name=v2-0001-Add-get_bytes-and-set_bytes-functions.patchDownload
From 010dd3aa8611ed943abf3e6464b2f1d6403662dc Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 14 Aug 2024 12:11:47 +0300
Subject: [PATCH v2] Add get_bytes() and set_bytes() functions.

The new functions provide a convenient way of converting between integer
types and bytea. Previously there were only get_byte() and set_byte() which
operate with a single byte.

Aleksander Alekseev, reviewed by Joel Jacobson, Yugo Nagata
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 doc/src/sgml/func.sgml                | 46 ++++++++++++++-
 src/backend/utils/adt/varlena.c       | 80 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  6 ++
 src/test/regress/expected/strings.out | 36 ++++++++++++
 src/test/regress/sql/strings.sql      | 10 ++++
 5 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5dd95d73a1..f794eed8e6 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4562,6 +4562,27 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>get_bytes</primary>
+        </indexterm>
+        <function>get_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para>
+        Extracts <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        from binary string.
+       </para>
+       <para>
+        <literal>get_bytes('\x0123456789ABCDEF'::bytea, 5, 2)</literal>
+        <returnvalue>43981</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4662,6 +4683,28 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>set_bytes</primary>
+        </indexterm>
+        <function>set_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type>,
+        <parameter>newvalue</parameter> <type>bigint</type> )
+        <returnvalue>bytea</returnvalue>
+       </para>
+       <para>
+        Sets <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        in binary string to <parameter>newvalue</parameter>.
+       </para>
+       <para>
+        <literal>set_bytes('\x0123456789abcdef'::bytea, 5, 2, 0x1122)</literal>
+        <returnvalue>\x01234567891122ef</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4761,7 +4804,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
   </table>
 
   <para id="functions-zerobased-note">
-   Functions <function>get_byte</function> and <function>set_byte</function>
+   Functions <function>get_byte</function>, <function>set_byte</function>,
+   <function>get_bytes</function> and <function>set_bytes</function>
    number the first byte of a binary string as byte 0.
    Functions <function>get_bit</function> and <function>set_bit</function>
    number bits from the right within each byte; for example bit 0 is the least
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 4f9a676c93..ffc22f1107 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3212,6 +3212,46 @@ byteaGetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(byte);
 }
 
+/*-------------------------------------------------------------
+ * byteaGetBytes
+ *
+ * This routine treats "bytea" as an array of bytes.
+ * It returns the N bytes at a given offset as a bigint value.
+ *-------------------------------------------------------------
+ */
+Datum
+byteaGetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		result = 0;
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(v);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+
+	while (size--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	PG_RETURN_INT64(result);
+}
+
 /*-------------------------------------------------------------
  * byteaGetBit
  *
@@ -3282,6 +3322,46 @@ byteaSetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(res);
 }
 
+/*-------------------------------------------------------------
+ * byteaSetBytes
+ *
+ * Given an instance of type 'bytea' creates a new one with
+ * the N bytes at a given offset set to the provided bigint value.
+ *
+ *-------------------------------------------------------------
+ */
+Datum
+byteaSetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *res = PG_GETARG_BYTEA_P_COPY(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		newValue = PG_GETARG_INT64(3);
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(res);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+	while (size)
+	{
+		((unsigned char*) VARDATA_ANY(res))[offset+size-1] = newValue & 0xFF;
+		newValue = newValue >> 8;
+		size--;
+	}
+
+	PG_RETURN_BYTEA_P(res);
+}
+
 /*-------------------------------------------------------------
  * byteaSetBit
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..5daa1132c7 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1441,6 +1441,12 @@
 { oid => '722', descr => 'set byte',
   proname => 'set_byte', prorettype => 'bytea',
   proargtypes => 'bytea int4 int4', prosrc => 'byteaSetByte' },
+{ oid => '8573', descr => 'get bytes',
+  proname => 'get_bytes', prorettype => 'int8', proargtypes => 'bytea int4 int4',
+  prosrc => 'byteaGetBytes' },
+{ oid => '8574', descr => 'set bytes',
+  proname => 'set_bytes', prorettype => 'bytea',
+  proargtypes => 'bytea int4 int4 int8', prosrc => 'byteaSetBytes' },
 { oid => '723', descr => 'get bit',
   proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
   prosrc => 'byteaGetBit' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d536..9fd52a1e11 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2358,6 +2358,42 @@ SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
 ERROR:  index 99 out of valid range, 0..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+ERROR:  index 1 out of valid range, 0..0
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+     set_bytes      
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+     set_bytes      
+--------------------
+ \x11aabb4455667788
+(1 row)
+
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
+ERROR:  index 1 out of valid range, 0..0
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75..d0aa44ee2c 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -750,6 +750,16 @@ SELECT get_byte('\x1234567890abcdef00'::bytea, 3);
 SELECT get_byte('\x1234567890abcdef00'::bytea, 99);  -- error
 SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
 
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
-- 
2.46.0

#14Peter Eisentraut
peter@eisentraut.org
In reply to: Aleksander Alekseev (#1)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On 14.08.24 13:01, Aleksander Alekseev wrote:

The proposed patch adds get_bytes() and set_bytes() functions. The
semantics is similar to get_byte() and set_byte() we already have but
the functions operate with bigints rather than bytes and the user can
specify the size of the integer. This allows working with int2s,
int4s, int8s or even int5s if needed.

Examples:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
?column?
----------
t

SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
set_bytes
--------------------
\x11aabb4455667788
```

I think these functions do about three things at once, and I don't think
they address the originally requested purpose very well.

Converting between integers and byte arrays of matching size seems like
reasonable functionality. (You can already do one half of that by
calling int2send(), int4send(), and int8send(), but the other direction
(intXrecv()) is not user-callable).

The other things are extracting that byte array from a larger byte array
and sticking it back into a larger byte array; those seem like separate
operations. There is already substr() for bytea for the first part, and
there might be another string-like operationg for the second part, or
maybe we could add one.

#15Yugo Nagata
nagata@sraoss.co.jp
In reply to: Aleksander Alekseev (#13)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Fri, 16 Aug 2024 11:41:37 +0300
Aleksander Alekseev <aleksander@timescale.com> wrote:

Hi,

When we add such casts between bytea and the integer/numeric types,
one of the problems mentioned the first of the thread, that is,
"we don't have a convenient way of casting a bytea to an integer / bigint
and vice versa", would seem be resolved.

On the other hand, I suppose get_bytes() and set_bytes() are still useful
for extracting bytes from byteas, etc. If casting is no longer the main
purpose of these functions, are variations that get_bytes returns bytea
instead of bigint, and set_bytes receives bytea as the newvalue argument
useful? I wonder it would eliminate the restrict that size cannot be larger
than 8.

No, casting between bytea and numeric will not replace get_bytes() /
set_bytes() for performance reasons.

Consider the case when you want to extract an int4 from a bytea.
get_bytes() is going to be very fast while substr() -> ::numeric ->
::integer chain will require unnecessary copying and conversions.
Casting between bytea and numeric is only useful when one has to deal
with integers larger than 8 bytes. Whether this happens often is a
debatable question.

Thank you for explanation. I understood the performance drawback.

I supposed interfaces similar to lo_get, lo_put, loread, lowrite of
large objects since they might be useful to access or modify a part of
bytea like a binary file read by pg_read_binary_file.

Here are my very trivial comments on the patch.

+ * this routine treats "bytea" as an array of bytes.

Maybe, the sentence should start with "This ... ".

+ while(size)
+ {

I wonder inserting a space after "while" is the standard style.

Thanks, fixed.

Should we fix the comment on byteaGetByte in passing, too?

Regards,
Yugo Nagata

--
Yugo Nagata <nagata@sraoss.co.jp>

#16Aleksander Alekseev
aleksander@timescale.com
In reply to: Peter Eisentraut (#14)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi Peter,

Thanks for the feedback.

I think these functions do about three things at once, and I don't think
they address the originally requested purpose very well.

The amount of things that the function does is a matter of
interpretation. I can claim that it does one thing ("extracts an
integer from a bytea"), or as many things as there are lines of code.
IMO the actual question is whether this is a good user interface or
not. Since we already have get_byte() / set_byte() the interface is
arguably OK.

Converting between integers and byte arrays of matching size seems like
reasonable functionality. (You can already do one half of that by
calling int2send(), int4send(), and int8send(), but the other direction
(intXrecv()) is not user-callable).

The other things are extracting that byte array from a larger byte array
and sticking it back into a larger byte array; those seem like separate
operations. There is already substr() for bytea for the first part, and
there might be another string-like operationg for the second part, or
maybe we could add one.

If I understand correctly, you propose doing (1):

```
SELECT substr('\x1122334455667788'::bytea, 2, 2) :: int2;
```

... instead of:

```
SELECT get_bytes('\x1122334455667788'::bytea, 1, 2)
```

... and (2):

```
WITH vals AS (
SELECT '\x1122334455667788'::bytea AS x
) SELECT substr(x, 1, 1) || int2send(1234::int2) || substr(x, 4, 5) FROM vals;
```

... instead of:

```
SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
```

There is nothing to do for query (2), it already works. It's not much
better than the query from my first email though.

To clarify, supporting bytea<->integer (and/or bytea<->numeric) casts
doesn't strike me as a terrible idea but it doesn't address the issue
I'm proposing to solve.

--
Best regards,
Aleksander Alekseev

#17Peter Eisentraut
peter@eisentraut.org
In reply to: Aleksander Alekseev (#16)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On 19.08.24 16:10, Aleksander Alekseev wrote:

To clarify, supporting bytea<->integer (and/or bytea<->numeric) casts
doesn't strike me as a terrible idea but it doesn't address the issue
I'm proposing to solve.

What is the issue you are proposing to solve?

You linked to a couple of threads and stackoverflow pages, and you
concluded from that that we should add get_bytes() and set_bytes()
functions. It's not obvious how you get from the former to the latter,
and I don't think the functions you propose are well-designed in isolation.

#18Aleksander Alekseev
aleksander@timescale.com
In reply to: Peter Eisentraut (#17)
2 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi,

On 19.08.24 16:10, Aleksander Alekseev wrote:

To clarify, supporting bytea<->integer (and/or bytea<->numeric) casts
doesn't strike me as a terrible idea but it doesn't address the issue
I'm proposing to solve.

What is the issue you are proposing to solve?

You linked to a couple of threads and stackoverflow pages, and you
concluded from that that we should add get_bytes() and set_bytes()
functions. It's not obvious how you get from the former to the latter,
and I don't think the functions you propose are well-designed in isolation.

I guess there are in fact two problems, not one.

1. Converting between bytea and integer types
2. Multibyte versions of get_byte() / set_byte()

As you rightly pointed out, for (1) we just need to add missing casts.
Here is the corresponding patch, v3-0001. Note that I couldn't re-use
int{2,4,8}recv because its first argument is StringInfo, so I ended up
implementing my own bytea->int{2,4,8} functions.

I think there may be value in (2) as well. It's implemented in v3-0002
and I did my best to clarify the commit message. On the flip side the
situation when one wants something like extracting int4 from a
bytea(or vice versa) and is not happy with convenience and/or
performance of substr()+casts is arguably rare. I'll be fine with
whatever consensus the community reaches about this patch.

--
Best regards,
Aleksander Alekseev

Attachments:

v3-0001-Allow-casting-between-bytea-and-integer-types.patchapplication/octet-stream; name=v3-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From 8881aec49da43b6512f8804f6d64a604220df92c Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 26 Aug 2024 12:09:59 +0300
Subject: [PATCH v3 1/2] Allow casting between bytea and integer types.

For instance:

SELECT '\x12345678'::bytea::integer;
SELECT 0x12345678::bytea;

This works with int2's, int4's and int8's.

Aleksander Alekseev, reviewed by TODO FIXME
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 src/backend/utils/adt/int.c              | 55 ++++++++++++++++
 src/include/catalog/pg_cast.dat          | 14 ++++
 src/include/catalog/pg_proc.dat          | 18 ++++++
 src/test/regress/expected/opr_sanity.out | 15 ++++-
 src/test/regress/expected/strings.out    | 82 ++++++++++++++++++++++++
 src/test/regress/sql/strings.sql         | 24 +++++++
 6 files changed, 205 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 234f20796b..a9ab582283 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -330,6 +330,61 @@ int4send(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ *		Common code for bytea_int2, bytea_int4 and bytea_int8
+ */
+static int64
+bytea_integer(bytea* v, int max_size)
+{
+	int 	len = VARSIZE_ANY_EXHDR(v);
+	int 	offset = 0;
+	int64 	result = 0;
+
+	if (len > max_size)
+		ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("bytea size %d out of valid range, 0..%d",
+					len, max_size)));
+	while (len--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	return result;
+}
+
+/*
+ *		bytea_int2			- converts bytea to int2
+ */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT16((int16)bytea_integer(v, sizeof(int16)));
+}
+
+/*
+ *		bytea_int4			- converts bytea to int4
+ */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT32((int32)bytea_integer(v, sizeof(int32)));
+}
+
+/*
+ *		bytea_int8			- converts bytea to int8
+ */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT64(bytea_integer(v, sizeof(int64)));
+}
+
 /*
  *		===================
  *		CONVERSION ROUTINES
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index ca7b6d7191..c0350e7f09 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 4abc6d9526..b8640218b0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1164,6 +1164,24 @@
 { oid => '409', descr => 'convert char(n) to name',
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
+{ oid => '8575', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2send' },
+{ oid => '8576', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4send' },
+{ oid => '8577', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8send' },
+{ oid => '8578', descr => 'convert bytea to int2',
+  proname => 'int2', proleakproof => 't', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8579', descr => 'convert bytea to int4',
+  proname => 'int4', proleakproof => 't', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8580', descr => 'convert bytea to int8',
+  proname => 'int8', proleakproof => 't', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
 
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 0d734169f1..2a80b3ced2 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,12 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname  | oid  | proname 
+------+----------+------+---------
+ 2405 | int2send | 8575 | bytea
+ 2407 | int4send | 8576 | bytea
+ 2409 | int8send | 8577 | bytea
+(3 rows)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -876,6 +879,12 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
+int2(bytea)
+int4(bytea)
+int8(bytea)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d536..709f58ad7f 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2690,3 +2690,85 @@ ERROR:  invalid Unicode code point: 2FFFFF
 SELECT unistr('wrong: \xyz');
 ERROR:  invalid Unicode escape
 HINT:  Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+SELECT 0x1234::int2::bytea;
+ bytea  
+--------
+ \x1234
+(1 row)
+
+SELECT 0x12345678::int4::bytea;
+   bytea    
+------------
+ \x12345678
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea;
+       bytea        
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT ''::bytea::int2 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int2 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1234'::bytea::int2 = 0x1234;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  bytea size 3 out of valid range, 0..2
+SELECT ''::bytea::int4 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int4 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  bytea size 5 out of valid range, 0..4
+SELECT ''::bytea::int8 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int8 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bytea size 9 out of valid range, 0..8
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75..b8858c9aea 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -848,3 +848,27 @@ SELECT unistr('wrong: \udb99\u0061');
 SELECT unistr('wrong: \U0000db99\U00000061');
 SELECT unistr('wrong: \U002FFFFF');
 SELECT unistr('wrong: \xyz');
+
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+
+SELECT 0x1234::int2::bytea;
+SELECT 0x12345678::int4::bytea;
+SELECT 0x1122334455667788::int8::bytea;
+
+SELECT ''::bytea::int2 = 0;
+SELECT '\x12'::bytea::int2 = 0x12;
+SELECT '\x1234'::bytea::int2 = 0x1234;
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 = 0;
+SELECT '\x12'::bytea::int4 = 0x12;
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 = 0;
+SELECT '\x12'::bytea::int8 = 0x12;
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+SELECT '\x112233445566778899'::bytea::int8; -- error
-- 
2.46.0

v3-0002-Add-get_bytes-and-set_bytes-functions.patchapplication/octet-stream; name=v3-0002-Add-get_bytes-and-set_bytes-functions.patchDownload
From 20631f194e9820ba6629892b566f5d0d7a401560 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 14 Aug 2024 12:11:47 +0300
Subject: [PATCH v3 2/2] Add get_bytes() and set_bytes() functions.

The new functions provide a convenient way of extracting integers from
bytea's and putting integer values into bytea's. Previously there were only
get_byte() and set_byte() which operate with a single byte.

Aleksander Alekseev, reviewed by Joel Jacobson, Yugo Nagata
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 doc/src/sgml/func.sgml                | 46 ++++++++++++++-
 src/backend/utils/adt/varlena.c       | 80 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  6 ++
 src/test/regress/expected/strings.out | 36 ++++++++++++
 src/test/regress/sql/strings.sql      | 10 ++++
 5 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 461fc3f437..8f7f65b954 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4562,6 +4562,27 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>get_bytes</primary>
+        </indexterm>
+        <function>get_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para>
+        Extracts <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        from binary string.
+       </para>
+       <para>
+        <literal>get_bytes('\x0123456789ABCDEF'::bytea, 5, 2)</literal>
+        <returnvalue>43981</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4662,6 +4683,28 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>set_bytes</primary>
+        </indexterm>
+        <function>set_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type>,
+        <parameter>newvalue</parameter> <type>bigint</type> )
+        <returnvalue>bytea</returnvalue>
+       </para>
+       <para>
+        Sets <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        in binary string to <parameter>newvalue</parameter>.
+       </para>
+       <para>
+        <literal>set_bytes('\x0123456789abcdef'::bytea, 5, 2, 0x1122)</literal>
+        <returnvalue>\x01234567891122ef</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4761,7 +4804,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
   </table>
 
   <para id="functions-zerobased-note">
-   Functions <function>get_byte</function> and <function>set_byte</function>
+   Functions <function>get_byte</function>, <function>set_byte</function>,
+   <function>get_bytes</function> and <function>set_bytes</function>
    number the first byte of a binary string as byte 0.
    Functions <function>get_bit</function> and <function>set_bit</function>
    number bits from the right within each byte; for example bit 0 is the least
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 7c6391a276..ff26591f63 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3206,6 +3206,46 @@ byteaGetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(byte);
 }
 
+/*-------------------------------------------------------------
+ * byteaGetBytes
+ *
+ * This routine treats "bytea" as an array of bytes.
+ * It returns the N bytes at a given offset as a bigint value.
+ *-------------------------------------------------------------
+ */
+Datum
+byteaGetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		result = 0;
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(v);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+
+	while (size--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	PG_RETURN_INT64(result);
+}
+
 /*-------------------------------------------------------------
  * byteaGetBit
  *
@@ -3276,6 +3316,46 @@ byteaSetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(res);
 }
 
+/*-------------------------------------------------------------
+ * byteaSetBytes
+ *
+ * Given an instance of type 'bytea' creates a new one with
+ * the N bytes at a given offset set to the provided bigint value.
+ *
+ *-------------------------------------------------------------
+ */
+Datum
+byteaSetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *res = PG_GETARG_BYTEA_P_COPY(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		newValue = PG_GETARG_INT64(3);
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(res);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+	while (size)
+	{
+		((unsigned char*) VARDATA_ANY(res))[offset+size-1] = newValue & 0xFF;
+		newValue = newValue >> 8;
+		size--;
+	}
+
+	PG_RETURN_BYTEA_P(res);
+}
+
 /*-------------------------------------------------------------
  * byteaSetBit
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b8640218b0..3424c903e0 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1459,6 +1459,12 @@
 { oid => '722', descr => 'set byte',
   proname => 'set_byte', prorettype => 'bytea',
   proargtypes => 'bytea int4 int4', prosrc => 'byteaSetByte' },
+{ oid => '8573', descr => 'get bytes',
+  proname => 'get_bytes', prorettype => 'int8', proargtypes => 'bytea int4 int4',
+  prosrc => 'byteaGetBytes' },
+{ oid => '8574', descr => 'set bytes',
+  proname => 'set_bytes', prorettype => 'bytea',
+  proargtypes => 'bytea int4 int4 int8', prosrc => 'byteaSetBytes' },
 { oid => '723', descr => 'get bit',
   proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
   prosrc => 'byteaGetBit' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 709f58ad7f..c58fc9b918 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2358,6 +2358,42 @@ SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
 ERROR:  index 99 out of valid range, 0..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+ERROR:  index 1 out of valid range, 0..0
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+     set_bytes      
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+     set_bytes      
+--------------------
+ \x11aabb4455667788
+(1 row)
+
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
+ERROR:  index 1 out of valid range, 0..0
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b8858c9aea..82d677562a 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -750,6 +750,16 @@ SELECT get_byte('\x1234567890abcdef00'::bytea, 3);
 SELECT get_byte('\x1234567890abcdef00'::bytea, 99);  -- error
 SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
 
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
-- 
2.46.0

#19Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#18)
2 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi,

I guess there are in fact two problems, not one.

1. Converting between bytea and integer types
2. Multibyte versions of get_byte() / set_byte()

As you rightly pointed out, for (1) we just need to add missing casts.
Here is the corresponding patch, v3-0001. Note that I couldn't re-use
int{2,4,8}recv because its first argument is StringInfo, so I ended up
implementing my own bytea->int{2,4,8} functions.

I think there may be value in (2) as well. It's implemented in v3-0002
and I did my best to clarify the commit message. On the flip side the
situation when one wants something like extracting int4 from a
bytea(or vice versa) and is not happy with convenience and/or
performance of substr()+casts is arguably rare. I'll be fine with
whatever consensus the community reaches about this patch.

PFA the rebased patch set v4.

--
Best regards,
Aleksander Alekseev

Attachments:

v4-0001-Allow-casting-between-bytea-and-integer-types.patchapplication/octet-stream; name=v4-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From d31bd5bd525566f6dee815b4f47d4fe64ac304c2 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 26 Aug 2024 12:09:59 +0300
Subject: [PATCH v4 1/2] Allow casting between bytea and integer types.

For instance:

SELECT '\x12345678'::bytea::integer;
SELECT 0x12345678::bytea;

This works with int2's, int4's and int8's.

Aleksander Alekseev, reviewed by TODO FIXME
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 src/backend/utils/adt/int.c              | 55 ++++++++++++++++
 src/include/catalog/pg_cast.dat          | 14 ++++
 src/include/catalog/pg_proc.dat          | 18 ++++++
 src/test/regress/expected/opr_sanity.out | 15 ++++-
 src/test/regress/expected/strings.out    | 82 ++++++++++++++++++++++++
 src/test/regress/sql/strings.sql         | 24 +++++++
 6 files changed, 205 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 234f20796b..a9ab582283 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -330,6 +330,61 @@ int4send(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ *		Common code for bytea_int2, bytea_int4 and bytea_int8
+ */
+static int64
+bytea_integer(bytea* v, int max_size)
+{
+	int 	len = VARSIZE_ANY_EXHDR(v);
+	int 	offset = 0;
+	int64 	result = 0;
+
+	if (len > max_size)
+		ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("bytea size %d out of valid range, 0..%d",
+					len, max_size)));
+	while (len--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	return result;
+}
+
+/*
+ *		bytea_int2			- converts bytea to int2
+ */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT16((int16)bytea_integer(v, sizeof(int16)));
+}
+
+/*
+ *		bytea_int4			- converts bytea to int4
+ */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT32((int32)bytea_integer(v, sizeof(int32)));
+}
+
+/*
+ *		bytea_int8			- converts bytea to int8
+ */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT64(bytea_integer(v, sizeof(int64)));
+}
+
 /*
  *		===================
  *		CONVERSION ROUTINES
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index ca7b6d7191..c0350e7f09 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 77f54a79e6..fb4e53ddaa 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1164,6 +1164,24 @@
 { oid => '409', descr => 'convert char(n) to name',
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
+{ oid => '8577', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2send' },
+{ oid => '8578', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4send' },
+{ oid => '8579', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8send' },
+{ oid => '8580', descr => 'convert bytea to int2',
+  proname => 'int2', proleakproof => 't', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8581', descr => 'convert bytea to int4',
+  proname => 'int4', proleakproof => 't', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8582', descr => 'convert bytea to int8',
+  proname => 'int8', proleakproof => 't', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
 
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 0d734169f1..16135cb765 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,12 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname  | oid  | proname 
+------+----------+------+---------
+ 2405 | int2send | 8577 | bytea
+ 2407 | int4send | 8578 | bytea
+ 2409 | int8send | 8579 | bytea
+(3 rows)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -876,6 +879,12 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
+int2(bytea)
+int4(bytea)
+int8(bytea)
 -- restore normal output mode
 \a\t
 -- List of functions used by libpq's fe-lobj.c
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d536..709f58ad7f 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2690,3 +2690,85 @@ ERROR:  invalid Unicode code point: 2FFFFF
 SELECT unistr('wrong: \xyz');
 ERROR:  invalid Unicode escape
 HINT:  Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+SELECT 0x1234::int2::bytea;
+ bytea  
+--------
+ \x1234
+(1 row)
+
+SELECT 0x12345678::int4::bytea;
+   bytea    
+------------
+ \x12345678
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea;
+       bytea        
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT ''::bytea::int2 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int2 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1234'::bytea::int2 = 0x1234;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  bytea size 3 out of valid range, 0..2
+SELECT ''::bytea::int4 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int4 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  bytea size 5 out of valid range, 0..4
+SELECT ''::bytea::int8 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int8 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bytea size 9 out of valid range, 0..8
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75..b8858c9aea 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -848,3 +848,27 @@ SELECT unistr('wrong: \udb99\u0061');
 SELECT unistr('wrong: \U0000db99\U00000061');
 SELECT unistr('wrong: \U002FFFFF');
 SELECT unistr('wrong: \xyz');
+
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+
+SELECT 0x1234::int2::bytea;
+SELECT 0x12345678::int4::bytea;
+SELECT 0x1122334455667788::int8::bytea;
+
+SELECT ''::bytea::int2 = 0;
+SELECT '\x12'::bytea::int2 = 0x12;
+SELECT '\x1234'::bytea::int2 = 0x1234;
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 = 0;
+SELECT '\x12'::bytea::int4 = 0x12;
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 = 0;
+SELECT '\x12'::bytea::int8 = 0x12;
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+SELECT '\x112233445566778899'::bytea::int8; -- error
-- 
2.46.0

v4-0002-Add-get_bytes-and-set_bytes-functions.patchapplication/octet-stream; name=v4-0002-Add-get_bytes-and-set_bytes-functions.patchDownload
From 8bff868366462aa8c0edc273d05eb30ad0214512 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 14 Aug 2024 12:11:47 +0300
Subject: [PATCH v4 2/2] Add get_bytes() and set_bytes() functions.

The new functions provide a convenient way of extracting integers from
bytea's and putting integer values into bytea's. Previously there were only
get_byte() and set_byte() which operate with a single byte.

Aleksander Alekseev, reviewed by Joel Jacobson, Yugo Nagata
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 doc/src/sgml/func.sgml                | 46 ++++++++++++++-
 src/backend/utils/adt/varlena.c       | 80 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  6 ++
 src/test/regress/expected/strings.out | 36 ++++++++++++
 src/test/regress/sql/strings.sql      | 10 ++++
 5 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 7b4fbb5047..c1ef05de7d 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4562,6 +4562,27 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>get_bytes</primary>
+        </indexterm>
+        <function>get_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para>
+        Extracts <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        from binary string.
+       </para>
+       <para>
+        <literal>get_bytes('\x0123456789ABCDEF'::bytea, 5, 2)</literal>
+        <returnvalue>43981</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4662,6 +4683,28 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>set_bytes</primary>
+        </indexterm>
+        <function>set_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type>,
+        <parameter>newvalue</parameter> <type>bigint</type> )
+        <returnvalue>bytea</returnvalue>
+       </para>
+       <para>
+        Sets <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        in binary string to <parameter>newvalue</parameter>.
+       </para>
+       <para>
+        <literal>set_bytes('\x0123456789abcdef'::bytea, 5, 2, 0x1122)</literal>
+        <returnvalue>\x01234567891122ef</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4761,7 +4804,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
   </table>
 
   <para id="functions-zerobased-note">
-   Functions <function>get_byte</function> and <function>set_byte</function>
+   Functions <function>get_byte</function>, <function>set_byte</function>,
+   <function>get_bytes</function> and <function>set_bytes</function>
    number the first byte of a binary string as byte 0.
    Functions <function>get_bit</function> and <function>set_bit</function>
    number bits from the right within each byte; for example bit 0 is the least
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index d46ed3ccf9..e53c1a9898 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3201,6 +3201,46 @@ byteaGetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(byte);
 }
 
+/*-------------------------------------------------------------
+ * byteaGetBytes
+ *
+ * This routine treats "bytea" as an array of bytes.
+ * It returns the N bytes at a given offset as a bigint value.
+ *-------------------------------------------------------------
+ */
+Datum
+byteaGetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		result = 0;
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(v);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+
+	while (size--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	PG_RETURN_INT64(result);
+}
+
 /*-------------------------------------------------------------
  * byteaGetBit
  *
@@ -3271,6 +3311,46 @@ byteaSetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(res);
 }
 
+/*-------------------------------------------------------------
+ * byteaSetBytes
+ *
+ * Given an instance of type 'bytea' creates a new one with
+ * the N bytes at a given offset set to the provided bigint value.
+ *
+ *-------------------------------------------------------------
+ */
+Datum
+byteaSetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *res = PG_GETARG_BYTEA_P_COPY(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		newValue = PG_GETARG_INT64(3);
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(res);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+	while (size)
+	{
+		((unsigned char*) VARDATA_ANY(res))[offset+size-1] = newValue & 0xFF;
+		newValue = newValue >> 8;
+		size--;
+	}
+
+	PG_RETURN_BYTEA_P(res);
+}
+
 /*-------------------------------------------------------------
  * byteaSetBit
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index fb4e53ddaa..7de3d0cf71 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1495,6 +1495,12 @@
 { oid => '722', descr => 'set byte',
   proname => 'set_byte', prorettype => 'bytea',
   proargtypes => 'bytea int4 int4', prosrc => 'byteaSetByte' },
+{ oid => '8575', descr => 'get bytes',
+  proname => 'get_bytes', prorettype => 'int8', proargtypes => 'bytea int4 int4',
+  prosrc => 'byteaGetBytes' },
+{ oid => '8576', descr => 'set bytes',
+  proname => 'set_bytes', prorettype => 'bytea',
+  proargtypes => 'bytea int4 int4 int8', prosrc => 'byteaSetBytes' },
 { oid => '723', descr => 'get bit',
   proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
   prosrc => 'byteaGetBit' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 709f58ad7f..c58fc9b918 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2358,6 +2358,42 @@ SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
 ERROR:  index 99 out of valid range, 0..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+ERROR:  index 1 out of valid range, 0..0
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+     set_bytes      
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+     set_bytes      
+--------------------
+ \x11aabb4455667788
+(1 row)
+
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
+ERROR:  index 1 out of valid range, 0..0
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b8858c9aea..82d677562a 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -750,6 +750,16 @@ SELECT get_byte('\x1234567890abcdef00'::bytea, 3);
 SELECT get_byte('\x1234567890abcdef00'::bytea, 99);  -- error
 SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
 
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
-- 
2.46.0

#20Aleksander Alekseev
aleksander@timescale.com
In reply to: Aleksander Alekseev (#19)
2 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi,

I guess there are in fact two problems, not one.

1. Converting between bytea and integer types
2. Multibyte versions of get_byte() / set_byte()

As you rightly pointed out, for (1) we just need to add missing casts.
Here is the corresponding patch, v3-0001. Note that I couldn't re-use
int{2,4,8}recv because its first argument is StringInfo, so I ended up
implementing my own bytea->int{2,4,8} functions.

I think there may be value in (2) as well. It's implemented in v3-0002
and I did my best to clarify the commit message. On the flip side the
situation when one wants something like extracting int4 from a
bytea(or vice versa) and is not happy with convenience and/or
performance of substr()+casts is arguably rare. I'll be fine with
whatever consensus the community reaches about this patch.

PFA the rebased patch set v4.

Rebased, v5.

--
Best regards,
Aleksander Alekseev

Attachments:

v5-0001-Allow-casting-between-bytea-and-integer-types.patchapplication/octet-stream; name=v5-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From b7cf59a809539cfaa5207cb23aca578af2b04977 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 26 Aug 2024 12:09:59 +0300
Subject: [PATCH v5 1/2] Allow casting between bytea and integer types.

For instance:

SELECT '\x12345678'::bytea::integer;
SELECT 0x12345678::bytea;

This works with int2's, int4's and int8's.

Aleksander Alekseev, reviewed by TODO FIXME
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 src/backend/utils/adt/int.c              | 55 ++++++++++++++++
 src/include/catalog/pg_cast.dat          | 14 ++++
 src/include/catalog/pg_proc.dat          | 18 ++++++
 src/test/regress/expected/opr_sanity.out | 15 ++++-
 src/test/regress/expected/strings.out    | 82 ++++++++++++++++++++++++
 src/test/regress/sql/strings.sql         | 24 +++++++
 6 files changed, 205 insertions(+), 3 deletions(-)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index 234f20796b..a9ab582283 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -330,6 +330,61 @@ int4send(PG_FUNCTION_ARGS)
 }
 
 
+/*
+ *		Common code for bytea_int2, bytea_int4 and bytea_int8
+ */
+static int64
+bytea_integer(bytea* v, int max_size)
+{
+	int 	len = VARSIZE_ANY_EXHDR(v);
+	int 	offset = 0;
+	int64 	result = 0;
+
+	if (len > max_size)
+		ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("bytea size %d out of valid range, 0..%d",
+					len, max_size)));
+	while (len--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	return result;
+}
+
+/*
+ *		bytea_int2			- converts bytea to int2
+ */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT16((int16)bytea_integer(v, sizeof(int16)));
+}
+
+/*
+ *		bytea_int4			- converts bytea to int4
+ */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT32((int32)bytea_integer(v, sizeof(int32)));
+}
+
+/*
+ *		bytea_int8			- converts bytea to int8
+ */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT64(bytea_integer(v, sizeof(int64)));
+}
+
 /*
  *		===================
  *		CONVERSION ROUTINES
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index ca7b6d7191..c0350e7f09 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 7c0b74fe05..0336110b0d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1164,6 +1164,24 @@
 { oid => '409', descr => 'convert char(n) to name',
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
+{ oid => '8577', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2send' },
+{ oid => '8578', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4send' },
+{ oid => '8579', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8send' },
+{ oid => '8580', descr => 'convert bytea to int2',
+  proname => 'int2', proleakproof => 't', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8581', descr => 'convert bytea to int4',
+  proname => 'int4', proleakproof => 't', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8582', descr => 'convert bytea to int8',
+  proname => 'int8', proleakproof => 't', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
 
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index 34a32bd11d..4b1ebef8fa 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,12 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname  | oid  | proname 
+------+----------+------+---------
+ 2405 | int2send | 8577 | bytea
+ 2407 | int4send | 8578 | bytea
+ 2409 | int8send | 8579 | bytea
+(3 rows)
 
 -- Look for uses of different type OIDs in the argument/result type fields
 -- for different aliases of the same built-in function.
@@ -876,6 +879,12 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
+int2(bytea)
+int4(bytea)
+int8(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
 -- restore normal output mode
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d536..709f58ad7f 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2690,3 +2690,85 @@ ERROR:  invalid Unicode code point: 2FFFFF
 SELECT unistr('wrong: \xyz');
 ERROR:  invalid Unicode escape
 HINT:  Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+SELECT 0x1234::int2::bytea;
+ bytea  
+--------
+ \x1234
+(1 row)
+
+SELECT 0x12345678::int4::bytea;
+   bytea    
+------------
+ \x12345678
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea;
+       bytea        
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT ''::bytea::int2 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int2 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1234'::bytea::int2 = 0x1234;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  bytea size 3 out of valid range, 0..2
+SELECT ''::bytea::int4 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int4 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  bytea size 5 out of valid range, 0..4
+SELECT ''::bytea::int8 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int8 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bytea size 9 out of valid range, 0..8
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75..b8858c9aea 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -848,3 +848,27 @@ SELECT unistr('wrong: \udb99\u0061');
 SELECT unistr('wrong: \U0000db99\U00000061');
 SELECT unistr('wrong: \U002FFFFF');
 SELECT unistr('wrong: \xyz');
+
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+
+SELECT 0x1234::int2::bytea;
+SELECT 0x12345678::int4::bytea;
+SELECT 0x1122334455667788::int8::bytea;
+
+SELECT ''::bytea::int2 = 0;
+SELECT '\x12'::bytea::int2 = 0x12;
+SELECT '\x1234'::bytea::int2 = 0x1234;
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 = 0;
+SELECT '\x12'::bytea::int4 = 0x12;
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 = 0;
+SELECT '\x12'::bytea::int8 = 0x12;
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+SELECT '\x112233445566778899'::bytea::int8; -- error
-- 
2.47.0

v5-0002-Add-get_bytes-and-set_bytes-functions.patchapplication/octet-stream; name=v5-0002-Add-get_bytes-and-set_bytes-functions.patchDownload
From 6fd52c99926bfdac215a94afc00f2b5a8afb3690 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Wed, 14 Aug 2024 12:11:47 +0300
Subject: [PATCH v5 2/2] Add get_bytes() and set_bytes() functions.

The new functions provide a convenient way of extracting integers from
bytea's and putting integer values into bytea's. Previously there were only
get_byte() and set_byte() which operate with a single byte.

Aleksander Alekseev, reviewed by Joel Jacobson, Yugo Nagata
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 doc/src/sgml/func.sgml                | 46 ++++++++++++++-
 src/backend/utils/adt/varlena.c       | 80 +++++++++++++++++++++++++++
 src/include/catalog/pg_proc.dat       |  6 ++
 src/test/regress/expected/strings.out | 36 ++++++++++++
 src/test/regress/sql/strings.sql      | 10 ++++
 5 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f8a0d76d12..5d907c9e44 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4562,6 +4562,27 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>get_bytes</primary>
+        </indexterm>
+        <function>get_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type> )
+        <returnvalue>bigint</returnvalue>
+       </para>
+       <para>
+        Extracts <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        from binary string.
+       </para>
+       <para>
+        <literal>get_bytes('\x0123456789ABCDEF'::bytea, 5, 2)</literal>
+        <returnvalue>43981</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4662,6 +4683,28 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
        </para></entry>
       </row>
 
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>set_bytes</primary>
+        </indexterm>
+        <function>set_bytes</function> ( <parameter>bytes</parameter> <type>bytea</type>,
+        <parameter>offset</parameter> <type>integer</type>,
+        <parameter>size</parameter> <type>integer</type>,
+        <parameter>newvalue</parameter> <type>bigint</type> )
+        <returnvalue>bytea</returnvalue>
+       </para>
+       <para>
+        Sets <parameter>size</parameter> bytes at a given
+        <link linkend="functions-zerobased-note">offset</link>
+        in binary string to <parameter>newvalue</parameter>.
+       </para>
+       <para>
+        <literal>set_bytes('\x0123456789abcdef'::bytea, 5, 2, 0x1122)</literal>
+        <returnvalue>\x01234567891122ef</returnvalue>
+       </para></entry>
+      </row>
+
       <row>
        <entry role="func_table_entry"><para role="func_signature">
         <indexterm>
@@ -4761,7 +4804,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
   </table>
 
   <para id="functions-zerobased-note">
-   Functions <function>get_byte</function> and <function>set_byte</function>
+   Functions <function>get_byte</function>, <function>set_byte</function>,
+   <function>get_bytes</function> and <function>set_bytes</function>
    number the first byte of a binary string as byte 0.
    Functions <function>get_bit</function> and <function>set_bit</function>
    number bits from the right within each byte; for example bit 0 is the least
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 533bebc1c7..0a1560556d 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3201,6 +3201,46 @@ byteaGetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(byte);
 }
 
+/*-------------------------------------------------------------
+ * byteaGetBytes
+ *
+ * This routine treats "bytea" as an array of bytes.
+ * It returns the N bytes at a given offset as a bigint value.
+ *-------------------------------------------------------------
+ */
+Datum
+byteaGetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		result = 0;
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(v);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+
+	while (size--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	PG_RETURN_INT64(result);
+}
+
 /*-------------------------------------------------------------
  * byteaGetBit
  *
@@ -3271,6 +3311,46 @@ byteaSetByte(PG_FUNCTION_ARGS)
 	PG_RETURN_BYTEA_P(res);
 }
 
+/*-------------------------------------------------------------
+ * byteaSetBytes
+ *
+ * Given an instance of type 'bytea' creates a new one with
+ * the N bytes at a given offset set to the provided bigint value.
+ *
+ *-------------------------------------------------------------
+ */
+Datum
+byteaSetBytes(PG_FUNCTION_ARGS)
+{
+	bytea	   *res = PG_GETARG_BYTEA_P_COPY(0);
+	int32		offset = PG_GETARG_INT32(1);
+	int32		size = PG_GETARG_INT32(2);
+	int64		newValue = PG_GETARG_INT64(3);
+	int			len;
+
+	if (size < 1 || size > 8)
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("size %d out of valid range, 1..8",
+						size)));
+
+	len = VARSIZE_ANY_EXHDR(res);
+
+	if (offset < 0 || offset > (len-size))
+		ereport(ERROR,
+				(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+				 errmsg("index %d out of valid range, 0..%d",
+						offset, len - size)));
+	while (size)
+	{
+		((unsigned char*) VARDATA_ANY(res))[offset+size-1] = newValue & 0xFF;
+		newValue = newValue >> 8;
+		size--;
+	}
+
+	PG_RETURN_BYTEA_P(res);
+}
+
 /*-------------------------------------------------------------
  * byteaSetBit
  *
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0336110b0d..8ff28a4139 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1502,6 +1502,12 @@
 { oid => '722', descr => 'set byte',
   proname => 'set_byte', prorettype => 'bytea',
   proargtypes => 'bytea int4 int4', prosrc => 'byteaSetByte' },
+{ oid => '8575', descr => 'get bytes',
+  proname => 'get_bytes', prorettype => 'int8', proargtypes => 'bytea int4 int4',
+  prosrc => 'byteaGetBytes' },
+{ oid => '8576', descr => 'set bytes',
+  proname => 'set_bytes', prorettype => 'bytea',
+  proargtypes => 'bytea int4 int4 int8', prosrc => 'byteaSetBytes' },
 { oid => '723', descr => 'get bit',
   proname => 'get_bit', prorettype => 'int4', proargtypes => 'bytea int8',
   prosrc => 'byteaGetBit' },
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index 709f58ad7f..c58fc9b918 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2358,6 +2358,42 @@ SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
 ERROR:  index 99 out of valid range, 0..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+ERROR:  index 1 out of valid range, 0..0
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+     set_bytes      
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+     set_bytes      
+--------------------
+ \x11aabb4455667788
+(1 row)
+
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+ERROR:  size 0 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+ERROR:  size 9 out of valid range, 1..8
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
+ERROR:  index 1 out of valid range, 0..0
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index b8858c9aea..82d677562a 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -750,6 +750,16 @@ SELECT get_byte('\x1234567890abcdef00'::bytea, 3);
 SELECT get_byte('\x1234567890abcdef00'::bytea, 99);  -- error
 SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 8) = 0x1122334455667788;
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 2) = 0x2233;
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 0); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 0, 9); -- error
+SELECT get_bytes('\x1122334455667788'::bytea, 1, 8); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 8, 0x1122334455667788);
+SELECT set_bytes('\x1122334455667788'::bytea, 1, 2, 0xAABB);
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 0, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 0, 9, 123); -- error
+SELECT set_bytes('\x0123456789abcdef'::bytea, 1, 8, 123); -- error
 
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
-- 
2.47.0

#21Michael Paquier
michael@paquier.xyz
In reply to: Aleksander Alekseev (#20)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Fri, Oct 18, 2024 at 05:20:42PM +0300, Aleksander Alekseev wrote:

Rebased, v5.

v5-0001 includes the following output:

--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,12 @@ WHERE p1.oid < p2.oid AND
      p1.proretset != p2.proretset OR
      p1.provolatile != p2.provolatile OR
      p1.pronargs != p2.pronargs);
- oid | proname | oid | proname 
------+---------+-----+---------
-(0 rows)
+ oid  | proname  | oid  | proname 
+------+----------+------+---------
+ 2405 | int2send | 8577 | bytea
+ 2407 | int4send | 8578 | bytea
+ 2409 | int8send | 8579 | bytea
+(3 rows)

This should not happen, as you are using multiple times the same
function.
--
Michael

#22Aleksander Alekseev
aleksander@timescale.com
In reply to: Michael Paquier (#21)
1 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi Michael,

v5-0001 includes the following output:

--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -126,9 +126,12 @@ WHERE p1.oid < p2.oid AND
p1.proretset != p2.proretset OR
p1.provolatile != p2.provolatile OR
p1.pronargs != p2.pronargs);
- oid | proname | oid | proname
------+---------+-----+---------
-(0 rows)
+ oid  | proname  | oid  | proname
+------+----------+------+---------
+ 2405 | int2send | 8577 | bytea
+ 2407 | int4send | 8578 | bytea
+ 2409 | int8send | 8579 | bytea
+(3 rows)

This should not happen, as you are using multiple times the same
function.

Thanks. Here is the corrected patch.

Besides fixing opr_sanity test I corrected error messages:

-ERROR:  bytea size 3 out of valid range, 0..2
+ERROR:  bytea out of valid range, ''..'\xFFFF'

... and also included tests for min/max integer values.

I discarded the 0002 patch that implemented get_bytes() / set_bytes().
This part doesn't seem to get much support, so let's focus on casting.

--
Best regards,
Aleksander Alekseev

Attachments:

v6-0001-Allow-casting-between-bytea-and-integer-types.patchapplication/x-patch; name=v6-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From 4b25ab46f10c8d284b2b2f8f3a7998a7b00c12d0 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 26 Aug 2024 12:09:59 +0300
Subject: [PATCH v6] Allow casting between bytea and integer types.

For instance:

SELECT '\x12345678'::bytea::integer;
SELECT 0x12345678::bytea;

This works with int2's, int4's and int8's.

Aleksander Alekseev, reviewed by Peter Eisentraut, Michael Paquier

Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 src/backend/utils/adt/int.c              |  68 +++++++++++++
 src/include/catalog/pg_cast.dat          |  14 +++
 src/include/catalog/pg_proc.dat          |  18 ++++
 src/test/regress/expected/opr_sanity.out |   6 ++
 src/test/regress/expected/strings.out    | 120 +++++++++++++++++++++++
 src/test/regress/sql/strings.sql         |  34 +++++++
 6 files changed, 260 insertions(+)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index b5781989a64..a240f647a48 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -336,6 +336,74 @@ int4send(PG_FUNCTION_ARGS)
  *		===================
  */
 
+/* Common code for bytea_int2, bytea_int4 and bytea_int8 */
+static int64
+bytea_integer(bytea* v, int max_size, const char* max_value)
+{
+	int 	len = VARSIZE_ANY_EXHDR(v);
+	int 	offset = 0;
+	int64 	result = 0;
+
+	if (len > max_size)
+		ereport(ERROR,
+			(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+			errmsg("bytea out of valid range, ''..'\\x%s'",
+					max_value)));
+	while (len--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	return result;
+}
+
+/* Cast bytea -> int2 */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT16((int16)bytea_integer(v, sizeof(int16), "FFFF"));
+}
+
+/* Cast bytea -> int4 */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT32((int32)bytea_integer(v, sizeof(int32), "FFFFFFFF"));
+}
+
+/* Cast bytea -> int8 */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	*v = PG_GETARG_BYTEA_PP(0);
+	PG_RETURN_INT64(bytea_integer(v, sizeof(int64), "FFFFFFFFFFFFFFFF"));
+}
+
+/* Cast int2 -> bytea; currently just a wrapper for int2send() */
+Datum
+int2_bytea(PG_FUNCTION_ARGS)
+{
+	return int2send(fcinfo);
+}
+
+/* Cast int4 -> bytea; currently just a wrapper for int4send() */
+Datum
+int4_bytea(PG_FUNCTION_ARGS)
+{
+	return int4send(fcinfo);
+}
+
+/* Cast int8 -> bytea; currently just a wrapper for int8send() */
+Datum
+int8_bytea(PG_FUNCTION_ARGS)
+{
+	return int8send(fcinfo);
+}
+
 Datum
 i2toi4(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index a26ba34e869..ab46be606f0 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index b37e8a6f882..c9d9b91e282 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1164,6 +1164,24 @@
 { oid => '409', descr => 'convert char(n) to name',
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
+{ oid => '8577', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2_bytea' },
+{ oid => '8578', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4_bytea' },
+{ oid => '8579', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8_bytea' },
+{ oid => '8580', descr => 'convert bytea to int2',
+  proname => 'int2', proleakproof => 't', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8581', descr => 'convert bytea to int4',
+  proname => 'int4', proleakproof => 't', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8582', descr => 'convert bytea to int8',
+  proname => 'int8', proleakproof => 't', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
 
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index b673642ad1d..462c56b7887 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -875,6 +875,12 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
+int2(bytea)
+int4(bytea)
+int8(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
 -- Check that functions without argument are not marked as leakproof.
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d5368..e35ae5d29c5 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2690,3 +2690,123 @@ ERROR:  invalid Unicode code point: 2FFFFF
 SELECT unistr('wrong: \xyz');
 ERROR:  invalid Unicode escape
 HINT:  Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+SELECT 0x1234::int2::bytea;
+ bytea  
+--------
+ \x1234
+(1 row)
+
+SELECT 0x12345678::int4::bytea;
+   bytea    
+------------
+ \x12345678
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea;
+       bytea        
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT ''::bytea::int2 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int2 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1234'::bytea::int2 = 0x1234;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  bytea out of valid range, ''..'\xFFFF'
+SELECT ''::bytea::int4 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int4 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  bytea out of valid range, ''..'\xFFFFFFFF'
+SELECT ''::bytea::int8 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int8 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bytea out of valid range, ''..'\xFFFFFFFFFFFFFFFF'
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
+ ?column? 
+----------
+ t
+(1 row)
+
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75f..16cceb5c746 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -848,3 +848,37 @@ SELECT unistr('wrong: \udb99\u0061');
 SELECT unistr('wrong: \U0000db99\U00000061');
 SELECT unistr('wrong: \U002FFFFF');
 SELECT unistr('wrong: \xyz');
+
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+
+SELECT 0x1234::int2::bytea;
+SELECT 0x12345678::int4::bytea;
+SELECT 0x1122334455667788::int8::bytea;
+
+SELECT ''::bytea::int2 = 0;
+SELECT '\x12'::bytea::int2 = 0x12;
+SELECT '\x1234'::bytea::int2 = 0x1234;
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 = 0;
+SELECT '\x12'::bytea::int4 = 0x12;
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 = 0;
+SELECT '\x12'::bytea::int8 = 0x12;
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+SELECT '\x112233445566778899'::bytea::int8; -- error
+
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
-- 
2.47.1

#23Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Aleksander Alekseev (#22)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Mon, 13 Jan 2025 at 17:36, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Besides fixing opr_sanity test I corrected error messages:

-ERROR:  bytea size 3 out of valid range, 0..2
+ERROR:  bytea out of valid range, ''..'\xFFFF'

"smallint out of range", "integer out of range" and "bigint out of
range" would be more consistent with existing error messages.

Regards,
Dean

#24Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Dean Rasheed (#23)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On 2025-Jan-13, Dean Rasheed wrote:

On Mon, 13 Jan 2025 at 17:36, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Besides fixing opr_sanity test I corrected error messages:

-ERROR:  bytea size 3 out of valid range, 0..2
+ERROR:  bytea out of valid range, ''..'\xFFFF'

"smallint out of range", "integer out of range" and "bigint out of
range" would be more consistent with existing error messages.

But these don't show the acceptable range. We have these that do:

#: utils/adt/varbit.c:1824 utils/adt/varbit.c:1882
#, c-format
msgid "bit index %d out of valid range (0..%d)"

#: utils/adt/varlena.c:3218 utils/adt/varlena.c:3285
#, c-format
msgid "index %d out of valid range, 0..%d"

#: utils/adt/varlena.c:3249 utils/adt/varlena.c:3321
#, c-format
msgid "index %lld out of valid range, 0..%lld"

#: utils/misc/guc.c:3130
#, c-format
msgid "%d%s%s is outside the valid range for parameter \"%s\" (%d .. %d)"

The quoting in Aleksander's proposal is not great.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"Update: super-fast reaction on the Postgres bugs mailing list. The report
was acknowledged [...], and a fix is under discussion.
The wonders of open-source !"
https://twitter.com/gunnarmorling/status/1596080409259003906

#25Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Alvaro Herrera (#24)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Mon, 13 Jan 2025 at 19:23, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

But these don't show the acceptable range. We have these that do:

#: utils/adt/varbit.c:1824 utils/adt/varbit.c:1882
#, c-format
msgid "bit index %d out of valid range (0..%d)"

#: utils/adt/varlena.c:3218 utils/adt/varlena.c:3285
#, c-format
msgid "index %d out of valid range, 0..%d"

#: utils/adt/varlena.c:3249 utils/adt/varlena.c:3321
#, c-format
msgid "index %lld out of valid range, 0..%lld"

#: utils/misc/guc.c:3130
#, c-format
msgid "%d%s%s is outside the valid range for parameter \"%s\" (%d .. %d)"

Those are all instances of a value that's outside a specific range
that you might not otherwise know, rather than being out of range of
the type itself. For that, we generally don't say what the range of
the type is. For example, we currently do:

select repeat('1', 50)::bit(50)::int;
ERROR: integer out of range

Regards,
Dean

#26Aleksander Alekseev
aleksander@timescale.com
In reply to: Dean Rasheed (#25)
1 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi Dean,

Those are all instances of a value that's outside a specific range
that you might not otherwise know, rather than being out of range of
the type itself. For that, we generally don't say what the range of
the type is. For example, we currently do:

select repeat('1', 50)::bit(50)::int;
ERROR: integer out of range

Thanks. I agree that the proposed error messages look nicer than the
one I used in v6. Here is the corrected patch.

--
Best regards,
Aleksander Alekseev

Attachments:

v7-0001-Allow-casting-between-bytea-and-integer-types.patchapplication/x-patch; name=v7-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From b9b5e358d00945bc0c4bb4a1b6e52497a6014690 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 26 Aug 2024 12:09:59 +0300
Subject: [PATCH v7] Allow casting between bytea and integer types.

For instance:

SELECT '\x12345678'::bytea::integer;
SELECT 0x12345678::bytea;

This works with int2's, int4's and int8's.

Author: Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Michael Paquier, Dean Rasheed
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 src/backend/utils/adt/int.c              |  84 ++++++++++++++++
 src/include/catalog/pg_cast.dat          |  14 +++
 src/include/catalog/pg_proc.dat          |  18 ++++
 src/test/regress/expected/opr_sanity.out |   6 ++
 src/test/regress/expected/strings.out    | 120 +++++++++++++++++++++++
 src/test/regress/sql/strings.sql         |  34 +++++++
 6 files changed, 276 insertions(+)

diff --git a/src/backend/utils/adt/int.c b/src/backend/utils/adt/int.c
index b5781989a64..0be739bae64 100644
--- a/src/backend/utils/adt/int.c
+++ b/src/backend/utils/adt/int.c
@@ -336,6 +336,90 @@ int4send(PG_FUNCTION_ARGS)
  *		===================
  */
 
+/* Common code for bytea_int2, bytea_int4 and bytea_int8 */
+static int64
+bytea_integer(const bytea *v, int len)
+{
+	int			offset = 0;
+	int64		result = 0;
+
+	while (len--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	return result;
+}
+
+/* Cast bytea -> int2 */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int16))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("smallint out of range")));
+
+	PG_RETURN_INT16((int16) bytea_integer(v, len));
+}
+
+/* Cast bytea -> int4 */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int32))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("integer out of range")));
+
+	PG_RETURN_INT32((int32) bytea_integer(v, len));
+}
+
+/* Cast bytea -> int8 */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int64))
+		ereport(ERROR,
+				(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+				 errmsg("bigint out of range")));
+
+
+	PG_RETURN_INT64(bytea_integer(v, len));
+}
+
+/* Cast int2 -> bytea; currently just a wrapper for int2send() */
+Datum
+int2_bytea(PG_FUNCTION_ARGS)
+{
+	return int2send(fcinfo);
+}
+
+/* Cast int4 -> bytea; currently just a wrapper for int4send() */
+Datum
+int4_bytea(PG_FUNCTION_ARGS)
+{
+	return int4send(fcinfo);
+}
+
+/* Cast int8 -> bytea; currently just a wrapper for int8send() */
+Datum
+int8_bytea(PG_FUNCTION_ARGS)
+{
+	return int8send(fcinfo);
+}
+
 Datum
 i2toi4(PG_FUNCTION_ARGS)
 {
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index a26ba34e869..ab46be606f0 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 872cd6e01a3..18187f9375a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1164,6 +1164,24 @@
 { oid => '409', descr => 'convert char(n) to name',
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
+{ oid => '8577', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2_bytea' },
+{ oid => '8578', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4_bytea' },
+{ oid => '8579', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8_bytea' },
+{ oid => '8580', descr => 'convert bytea to int2',
+  proname => 'int2', proleakproof => 't', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8581', descr => 'convert bytea to int4',
+  proname => 'int4', proleakproof => 't', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8582', descr => 'convert bytea to int8',
+  proname => 'int8', proleakproof => 't', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
 
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index b673642ad1d..462c56b7887 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -875,6 +875,12 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
+int2(bytea)
+int4(bytea)
+int8(bytea)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
 -- Check that functions without argument are not marked as leakproof.
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d5368..69f1e66aa98 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2690,3 +2690,123 @@ ERROR:  invalid Unicode code point: 2FFFFF
 SELECT unistr('wrong: \xyz');
 ERROR:  invalid Unicode escape
 HINT:  Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+SELECT 0x1234::int2::bytea;
+ bytea  
+--------
+ \x1234
+(1 row)
+
+SELECT 0x12345678::int4::bytea;
+   bytea    
+------------
+ \x12345678
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea;
+       bytea        
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT ''::bytea::int2 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int2 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1234'::bytea::int2 = 0x1234;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  smallint out of range
+SELECT ''::bytea::int4 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int4 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  integer out of range
+SELECT ''::bytea::int8 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int8 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bigint out of range
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
+ ?column? 
+----------
+ t
+(1 row)
+
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75f..16cceb5c746 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -848,3 +848,37 @@ SELECT unistr('wrong: \udb99\u0061');
 SELECT unistr('wrong: \U0000db99\U00000061');
 SELECT unistr('wrong: \U002FFFFF');
 SELECT unistr('wrong: \xyz');
+
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+
+SELECT 0x1234::int2::bytea;
+SELECT 0x12345678::int4::bytea;
+SELECT 0x1122334455667788::int8::bytea;
+
+SELECT ''::bytea::int2 = 0;
+SELECT '\x12'::bytea::int2 = 0x12;
+SELECT '\x1234'::bytea::int2 = 0x1234;
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 = 0;
+SELECT '\x12'::bytea::int4 = 0x12;
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 = 0;
+SELECT '\x12'::bytea::int8 = 0x12;
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+SELECT '\x112233445566778899'::bytea::int8; -- error
+
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
-- 
2.47.1

#27Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Aleksander Alekseev (#26)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Tue, 14 Jan 2025 at 13:25, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Thanks. I agree that the proposed error messages look nicer than the
one I used in v6. Here is the corrected patch.

This should use ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, rather than
ERRCODE_INVALID_PARAMETER_VALUE, for consistency with other similar
errors.

The bytea -> int[248] cast functions should not be marked as leakproof
-- see the docs on the CREATE FUNCTION page: functions that raise
errors for some input values but not others, are not leakproof. This
is why, for example, the int -> bigint cast is leakproof, but the
bigint -> int cast is not.

Functions working with int8 values should normally go in
utils/adt/int8.c, not utils/adt/int.c. However, I think that
utils/adt/varlena.c would be a better place for all these functions,
because they have more to do with bytea than integer types, and this
would allow them to be kept together, similar to how all the bit <->
integer cast functions are in utils/adt/varbit.c.

There's no documentation for these new casts. The obvious place to put
it would be in section 9.5 "Binary String Functions and Operators",
which would be consistent with the idea that these are being regarded
primarily as bytea operations, rather than integer operations (just as
the bit <-> integer casts are documented in 9.6 "Bit String Functions
and Operators").

Regards,
Dean

#28Aleksander Alekseev
aleksander@timescale.com
In reply to: Dean Rasheed (#27)
1 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi Dean,

This should use ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, rather than
ERRCODE_INVALID_PARAMETER_VALUE, for consistency with other similar
errors.

The bytea -> int[248] cast functions should not be marked as leakproof
-- see the docs on the CREATE FUNCTION page: functions that raise
errors for some input values but not others, are not leakproof. This
is why, for example, the int -> bigint cast is leakproof, but the
bigint -> int cast is not.

Functions working with int8 values should normally go in
utils/adt/int8.c, not utils/adt/int.c. However, I think that
utils/adt/varlena.c would be a better place for all these functions,
because they have more to do with bytea than integer types, and this
would allow them to be kept together, similar to how all the bit <->
integer cast functions are in utils/adt/varbit.c.

There's no documentation for these new casts. The obvious place to put
it would be in section 9.5 "Binary String Functions and Operators",
which would be consistent with the idea that these are being regarded
primarily as bytea operations, rather than integer operations (just as
the bit <-> integer casts are documented in 9.6 "Bit String Functions
and Operators").

Many thanks for your great feedback. Here is the corrected patch.

On Fri, Jan 17, 2025 at 7:29 PM Dean Rasheed <dean.a.rasheed@gmail.com>
wrote:

On Tue, 14 Jan 2025 at 13:25, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Thanks. I agree that the proposed error messages look nicer than the
one I used in v6. Here is the corrected patch.

This should use ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, rather than
ERRCODE_INVALID_PARAMETER_VALUE, for consistency with other similar
errors.

The bytea -> int[248] cast functions should not be marked as leakproof
-- see the docs on the CREATE FUNCTION page: functions that raise
errors for some input values but not others, are not leakproof. This
is why, for example, the int -> bigint cast is leakproof, but the
bigint -> int cast is not.

Functions working with int8 values should normally go in
utils/adt/int8.c, not utils/adt/int.c. However, I think that
utils/adt/varlena.c would be a better place for all these functions,
because they have more to do with bytea than integer types, and this
would allow them to be kept together, similar to how all the bit <->
integer cast functions are in utils/adt/varbit.c.

There's no documentation for these new casts. The obvious place to put
it would be in section 9.5 "Binary String Functions and Operators",
which would be consistent with the idea that these are being regarded
primarily as bytea operations, rather than integer operations (just as
the bit <-> integer casts are documented in 9.6 "Bit String Functions
and Operators").

Regards,
Dean

--
Best regards,
Aleksander Alekseev

Attachments:

v8-0001-Allow-casting-between-bytea-and-integer-types.patchapplication/octet-stream; name=v8-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From 0c8e3055c0ff10abd10db5d460424a9c7cebb625 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 26 Aug 2024 12:09:59 +0300
Subject: [PATCH v8] Allow casting between bytea and integer types.

For instance:

SELECT '\x12345678'::bytea::integer;
SELECT 0x12345678::bytea;

This works with int2's, int4's and int8's.

Author: Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Michael Paquier, Dean Rasheed
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 doc/src/sgml/func.sgml                   |  15 +++
 src/backend/utils/adt/varlena.c          |  84 ++++++++++++++++
 src/include/catalog/pg_cast.dat          |  14 +++
 src/include/catalog/pg_proc.dat          |  18 ++++
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/strings.out    | 120 +++++++++++++++++++++++
 src/test/regress/sql/strings.sql         |  34 +++++++
 7 files changed, 288 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 47370e581ae..d95fa83406b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4997,6 +4997,21 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    </variablelist>
   </para>
 
+   <para>
+    In addition, it is possible to cast integral values to and from type
+    <type>bytea</type>.
+<programlisting>
+''::bytea::int4                <lineannotation>0</lineannotation>
+0x1234::int2::bytea            <lineannotation>\x1234</lineannotation>
+'\x12'::bytea::int4            <lineannotation>0x12</lineannotation>
+'\x8000'::bytea::int2          <lineannotation>-32768</lineannotation>
+'\x123456'::bytea::int2        <lineannotation>ERROR</lineannotation>
+</programlisting>
+    Note that casting <type>bytea</type> to an integer type that is too small
+    to represent given value produces an error. Casting integer types to
+    <type>bytea</type> never fails.
+   </para>
+
   <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/> and the large object functions
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 34796f2e27c..eceecb25124 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3985,6 +3985,90 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* Common code for bytea_int2, bytea_int4 and bytea_int8 */
+static int64
+bytea_integer(const bytea *v, int len)
+{
+	int			offset = 0;
+	int64		result = 0;
+
+	while (len--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	return result;
+}
+
+/* Cast bytea -> int2 */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int16))
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("smallint out of range")));
+
+	PG_RETURN_INT16((int16) bytea_integer(v, len));
+}
+
+/* Cast bytea -> int4 */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int32))
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("integer out of range")));
+
+	PG_RETURN_INT32((int32) bytea_integer(v, len));
+}
+
+/* Cast bytea -> int8 */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int64))
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("bigint out of range")));
+
+
+	PG_RETURN_INT64(bytea_integer(v, len));
+}
+
+/* Cast int2 -> bytea; currently just a wrapper for int2send() */
+Datum
+int2_bytea(PG_FUNCTION_ARGS)
+{
+	return int2send(fcinfo);
+}
+
+/* Cast int4 -> bytea; currently just a wrapper for int4send() */
+Datum
+int4_bytea(PG_FUNCTION_ARGS)
+{
+	return int4send(fcinfo);
+}
+
+/* Cast int8 -> bytea; currently just a wrapper for int8send() */
+Datum
+int8_bytea(PG_FUNCTION_ARGS)
+{
+	return int8send(fcinfo);
+}
+
 /*
  * appendStringInfoText
  *
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index a26ba34e869..ab46be606f0 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 18560755d26..ce3b979d676 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1164,6 +1164,24 @@
 { oid => '409', descr => 'convert char(n) to name',
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
+{ oid => '8577', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2_bytea' },
+{ oid => '8578', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4_bytea' },
+{ oid => '8579', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8_bytea' },
+{ oid => '8580', descr => 'convert bytea to int2',
+  proname => 'int2', proleakproof => 'f', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8581', descr => 'convert bytea to int4',
+  proname => 'int4', proleakproof => 'f', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8582', descr => 'convert bytea to int8',
+  proname => 'int8', proleakproof => 'f', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
 
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index b673642ad1d..20bf9ea9cdf 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -875,6 +875,9 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
 -- Check that functions without argument are not marked as leakproof.
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d5368..69f1e66aa98 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2690,3 +2690,123 @@ ERROR:  invalid Unicode code point: 2FFFFF
 SELECT unistr('wrong: \xyz');
 ERROR:  invalid Unicode escape
 HINT:  Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+SELECT 0x1234::int2::bytea;
+ bytea  
+--------
+ \x1234
+(1 row)
+
+SELECT 0x12345678::int4::bytea;
+   bytea    
+------------
+ \x12345678
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea;
+       bytea        
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT ''::bytea::int2 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int2 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1234'::bytea::int2 = 0x1234;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  smallint out of range
+SELECT ''::bytea::int4 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int4 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  integer out of range
+SELECT ''::bytea::int8 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int8 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bigint out of range
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
+ ?column? 
+----------
+ t
+(1 row)
+
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75f..16cceb5c746 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -848,3 +848,37 @@ SELECT unistr('wrong: \udb99\u0061');
 SELECT unistr('wrong: \U0000db99\U00000061');
 SELECT unistr('wrong: \U002FFFFF');
 SELECT unistr('wrong: \xyz');
+
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+
+SELECT 0x1234::int2::bytea;
+SELECT 0x12345678::int4::bytea;
+SELECT 0x1122334455667788::int8::bytea;
+
+SELECT ''::bytea::int2 = 0;
+SELECT '\x12'::bytea::int2 = 0x12;
+SELECT '\x1234'::bytea::int2 = 0x1234;
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 = 0;
+SELECT '\x12'::bytea::int4 = 0x12;
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 = 0;
+SELECT '\x12'::bytea::int8 = 0x12;
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+SELECT '\x112233445566778899'::bytea::int8; -- error
+
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
-- 
2.48.1

#29Peter Eisentraut
peter@eisentraut.org
In reply to: Aleksander Alekseev (#28)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On 20.01.25 15:01, Aleksander Alekseev wrote:

This should use ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE, rather than
ERRCODE_INVALID_PARAMETER_VALUE, for consistency with other similar
errors.

The bytea -> int[248] cast functions should not be marked as leakproof
-- see the docs on the CREATE FUNCTION page: functions that raise
errors for some input values but not others, are not leakproof. This
is why, for example, the int -> bigint cast is leakproof, but the
bigint -> int cast is not.

Functions working with int8 values should normally go in
utils/adt/int8.c, not utils/adt/int.c. However, I think that
utils/adt/varlena.c would be a better place for all these functions,
because they have more to do with bytea than integer types, and this
would allow them to be kept together, similar to how all the bit <->
integer cast functions are in utils/adt/varbit.c.

There's no documentation for these new casts. The obvious place to put
it would be in section 9.5 "Binary String Functions and Operators",
which would be consistent with the idea that these are being regarded
primarily as bytea operations, rather than integer operations (just as
the bit <-> integer casts are documented in 9.6 "Bit String Functions
and Operators").

Many thanks for your great feedback. Here is the corrected patch.

These casts appear to use a particular endianness, but they don't
document which one, and there is no explanation anywhere why that one is
the right one.

#30Aleksander Alekseev
aleksander@timescale.com
In reply to: Peter Eisentraut (#29)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi Peter,

These casts appear to use a particular endianness, but they don't
document which one, and there is no explanation anywhere why that one is
the right one.

Right, I choose network order / big-endian. I agree that it would make
sense to emphasise this fact in the documentation below the examples.
Unfortunately I have no good reason for this particular design choice
other than "well this is how it is custom to represent numbers in a
consistent manner between different platforms but other than that the
choice was rather arbitrary". Should I just rephrase it a bit and add
to the documentation?

--
Best regards,
Aleksander Alekseev

#31Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Aleksander Alekseev (#30)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Thu, 23 Jan 2025 at 14:30, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Hi Peter,

These casts appear to use a particular endianness, but they don't
document which one, and there is no explanation anywhere why that one is
the right one.

Right, I choose network order / big-endian. I agree that it would make
sense to emphasise this fact in the documentation below the examples.
Unfortunately I have no good reason for this particular design choice
other than "well this is how it is custom to represent numbers in a
consistent manner between different platforms but other than that the
choice was rather arbitrary". Should I just rephrase it a bit and add
to the documentation?

IMO big-endian is the most convenient byte-ordering to use here,
because then the string representation of the bytea is consistent with
the hex representation of the integer. It's also consistent with the
integer-to-bit casts, which output the most significant bits first,
starting with the sign bit.

As far as the docs go, it's important to document precisely what
format is used, but I don't think it needs to explain why that choice
was made. It should also mention the size of the result and that it's
the two's complement representation, since there are other possible
representations of integers. So I think it would be sufficient for the
initial paragraph to say something like "Casting an integer to a bytea
produces 2, 4, or 8 bytes, depending on the width of the integer type.
The result is the two's complement representation of the integer, with
the most significant byte first.", and then list the examples to
demonstrate that.

Regards,
Dean

#32Aleksander Alekseev
aleksander@timescale.com
In reply to: Dean Rasheed (#31)
1 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

Hi Dean,

IMO big-endian is the most convenient byte-ordering to use here,
because then the string representation of the bytea is consistent with
the hex representation of the integer. It's also consistent with the
integer-to-bit casts, which output the most significant bits first,
starting with the sign bit.

As far as the docs go, it's important to document precisely what
format is used, but I don't think it needs to explain why that choice
was made. It should also mention the size of the result and that it's
the two's complement representation, since there are other possible
representations of integers. So I think it would be sufficient for the
initial paragraph to say something like "Casting an integer to a bytea
produces 2, 4, or 8 bytes, depending on the width of the integer type.
The result is the two's complement representation of the integer, with
the most significant byte first.", and then list the examples to
demonstrate that.

Thank you. Here is the corrected patch.

--
Best regards,
Aleksander Alekseev

Attachments:

v9-0001-Allow-casting-between-bytea-and-integer-types.patchapplication/x-patch; name=v9-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From 731dadc5f87d4e62d4f064c7d0ce91bdbafd1888 Mon Sep 17 00:00:00 2001
From: Aleksander Alekseev <aleksander@timescale.com>
Date: Mon, 26 Aug 2024 12:09:59 +0300
Subject: [PATCH v9] Allow casting between bytea and integer types.

For instance:

SELECT '\x12345678'::bytea::integer;
SELECT 0x12345678::bytea;

This works with int2's, int4's and int8's.

Author: Aleksander Alekseev
Reviewed-by: Peter Eisentraut, Michael Paquier, Dean Rasheed
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com

BUMP CATVERSION
---
 doc/src/sgml/func.sgml                   |  18 ++++
 src/backend/utils/adt/varlena.c          |  84 ++++++++++++++++
 src/include/catalog/pg_cast.dat          |  14 +++
 src/include/catalog/pg_proc.dat          |  18 ++++
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/strings.out    | 120 +++++++++++++++++++++++
 src/test/regress/sql/strings.sql         |  34 +++++++
 7 files changed, 291 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5678e7621a5..0c1790a5e46 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -4997,6 +4997,24 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    </variablelist>
   </para>
 
+   <para>
+    In addition, it is possible to cast integral values to and from type
+    <type>bytea</type>. Casting an integer to a <type>bytea</type> produces
+    2, 4, or 8 bytes, depending on the width of the integer type. The result
+    is the two's complement representation of the integer, with the most
+    significant byte first. Examples:
+<programlisting>
+''::bytea::int4                <lineannotation>0</lineannotation>
+0x1234::int2::bytea            <lineannotation>\x1234</lineannotation>
+'\x12'::bytea::int4            <lineannotation>0x12</lineannotation>
+'\x8000'::bytea::int2          <lineannotation>-32768</lineannotation>
+'\x123456'::bytea::int2        <lineannotation>ERROR</lineannotation>
+</programlisting>
+    Note that casting <type>bytea</type> to an integer type that is too small
+    to represent given value produces an error. Casting integer types to
+    <type>bytea</type> never fails.
+   </para>
+
   <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/> and the large object functions
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 34796f2e27c..eceecb25124 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -3985,6 +3985,90 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* Common code for bytea_int2, bytea_int4 and bytea_int8 */
+static int64
+bytea_integer(const bytea *v, int len)
+{
+	int			offset = 0;
+	int64		result = 0;
+
+	while (len--)
+	{
+		result = result << 8;
+		result |= ((unsigned char *) VARDATA_ANY(v))[offset];
+		offset++;
+	}
+
+	return result;
+}
+
+/* Cast bytea -> int2 */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int16))
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("smallint out of range")));
+
+	PG_RETURN_INT16((int16) bytea_integer(v, len));
+}
+
+/* Cast bytea -> int4 */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int32))
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("integer out of range")));
+
+	PG_RETURN_INT32((int32) bytea_integer(v, len));
+}
+
+/* Cast bytea -> int8 */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+
+	if (len > sizeof(int64))
+		ereport(ERROR,
+				(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				 errmsg("bigint out of range")));
+
+
+	PG_RETURN_INT64(bytea_integer(v, len));
+}
+
+/* Cast int2 -> bytea; currently just a wrapper for int2send() */
+Datum
+int2_bytea(PG_FUNCTION_ARGS)
+{
+	return int2send(fcinfo);
+}
+
+/* Cast int4 -> bytea; currently just a wrapper for int4send() */
+Datum
+int4_bytea(PG_FUNCTION_ARGS)
+{
+	return int4send(fcinfo);
+}
+
+/* Cast int8 -> bytea; currently just a wrapper for int8send() */
+Datum
+int8_bytea(PG_FUNCTION_ARGS)
+{
+	return int8send(fcinfo);
+}
+
 /*
  * appendStringInfoText
  *
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index a26ba34e869..ab46be606f0 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 18560755d26..ce3b979d676 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1164,6 +1164,24 @@
 { oid => '409', descr => 'convert char(n) to name',
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
+{ oid => '8577', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2_bytea' },
+{ oid => '8578', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4_bytea' },
+{ oid => '8579', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8_bytea' },
+{ oid => '8580', descr => 'convert bytea to int2',
+  proname => 'int2', proleakproof => 'f', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8581', descr => 'convert bytea to int4',
+  proname => 'int4', proleakproof => 'f', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8582', descr => 'convert bytea to int8',
+  proname => 'int8', proleakproof => 'f', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
 
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index b673642ad1d..20bf9ea9cdf 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -875,6 +875,9 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
 -- Check that functions without argument are not marked as leakproof.
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d5368..69f1e66aa98 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2690,3 +2690,123 @@ ERROR:  invalid Unicode code point: 2FFFFF
 SELECT unistr('wrong: \xyz');
 ERROR:  invalid Unicode escape
 HINT:  Unicode escapes must be \XXXX, \+XXXXXX, \uXXXX, or \UXXXXXXXX.
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+SELECT 0x1234::int2::bytea;
+ bytea  
+--------
+ \x1234
+(1 row)
+
+SELECT 0x12345678::int4::bytea;
+   bytea    
+------------
+ \x12345678
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea;
+       bytea        
+--------------------
+ \x1122334455667788
+(1 row)
+
+SELECT ''::bytea::int2 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int2 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1234'::bytea::int2 = 0x1234;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  smallint out of range
+SELECT ''::bytea::int4 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int4 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  integer out of range
+SELECT ''::bytea::int8 = 0;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x12'::bytea::int8 = 0x12;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bigint out of range
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+ ?column? 
+----------
+ t
+(1 row)
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+ ?column? 
+----------
+ t
+(1 row)
+
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
+ ?column? 
+----------
+ t
+(1 row)
+
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75f..16cceb5c746 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -848,3 +848,37 @@ SELECT unistr('wrong: \udb99\u0061');
 SELECT unistr('wrong: \U0000db99\U00000061');
 SELECT unistr('wrong: \U002FFFFF');
 SELECT unistr('wrong: \xyz');
+
+--
+-- Test coercions between bytea and integer types
+--
+SET bytea_output TO hex;
+
+SELECT 0x1234::int2::bytea;
+SELECT 0x12345678::int4::bytea;
+SELECT 0x1122334455667788::int8::bytea;
+
+SELECT ''::bytea::int2 = 0;
+SELECT '\x12'::bytea::int2 = 0x12;
+SELECT '\x1234'::bytea::int2 = 0x1234;
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 = 0;
+SELECT '\x12'::bytea::int4 = 0x12;
+SELECT '\x12345678'::bytea::int4 = 0x12345678;
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 = 0;
+SELECT '\x12'::bytea::int8 = 0x12;
+SELECT '\x1122334455667788'::bytea::int8 = 0x1122334455667788;
+SELECT '\x112233445566778899'::bytea::int8; -- error
+
+-- max integer values
+SELECT '\x7FFF'::bytea::int2 = 0x7FFF;
+SELECT '\x7FFFFFFF'::bytea::int4 = 0x7FFFFFFF;
+SELECT '\x7FFFFFFFFFFFFFFF'::bytea::int8 = 0x7FFFFFFFFFFFFFFF;
+
+-- min integer values
+SELECT '\x8000'::bytea::int2 = -0x8000;
+SELECT '\x80000000'::bytea::int4 = -0x80000000;
+SELECT '\x8000000000000000'::bytea::int8 = -0x8000000000000000;
-- 
2.48.1

#33Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Aleksander Alekseev (#32)
1 attachment(s)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Fri, 24 Jan 2025 at 13:00, Aleksander Alekseev
<aleksander@timescale.com> wrote:

Thank you. Here is the corrected patch.

This looks pretty good to me. I have a just a couple of minor comments:

* The loop in bytea_integer() can be written more simply as a "for"
loop. Given that it's only a few lines of code, it might as well just
be coded directly in each cast function, which avoids the need to go
via a 64-bit integer for every case. In addition, it should use the
BITS_PER_BYTE macro rather than "8". Doing that leads to code that's
consistent with bittoint4() and bittoint8().

* In pg_proc.dat, it's not necessary to write "proleakproof => 'f'",
because that's the default, and no other function does that.

* I think it's worth using slightly more non-trivial doc examples,
including positive and negative cases, to make the behaviour more
obvious.

* I also tweaked the regression tests a bit, and copied the existing
test style which displays both the expected and actual results from
each test.

With those updates, I think this is ready for commit, which I'll do in
a day or two, if there are no further comments.

Regards,
Dean

Attachments:

v10-0001-Allow-casting-between-bytea-and-integer-types.patchtext/x-patch; charset=US-ASCII; name=v10-0001-Allow-casting-between-bytea-and-integer-types.patchDownload
From 5c0572a0930067f7244606384542d670cd1e4aa6 Mon Sep 17 00:00:00 2001
From: Dean Rasheed <dean.a.rasheed@gmail.com>
Date: Sat, 1 Mar 2025 10:49:24 +0000
Subject: [PATCH v10] Allow casting between bytea and integer types.

This allows smallint, integer, and bigint values to be cast to and
from bytea. The bytea value is the two's complement representation of
the integer, with the most significant byte first. For example:

  1234::bytea -> \x000004d2
  (-1234)::bytea -> \xfffffb2e

Author: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Joel Jacobson <joel@compiler.org>
Reviewed-by: Yugo Nagata <nagata@sraoss.co.jp>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAJ7c6TPtOp6%2BkFX5QX3fH1SVr7v65uHr-7yEJ%3DGMGQi5uhGtcA%40mail.gmail.com
---
 doc/src/sgml/func.sgml                   |  17 ++++
 src/backend/utils/adt/varlena.c          |  90 ++++++++++++++++++++
 src/include/catalog/pg_cast.dat          |  14 ++++
 src/include/catalog/pg_proc.dat          |  19 +++++
 src/test/regress/expected/opr_sanity.out |   3 +
 src/test/regress/expected/strings.out    | 102 +++++++++++++++++++++++
 src/test/regress/sql/strings.sql         |  29 +++++++
 7 files changed, 274 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 0e6c534965..6b15f167d0 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -5035,6 +5035,23 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three');
    </variablelist>
   </para>
 
+  <para>
+   In addition, it is possible to cast integral values to and from type
+   <type>bytea</type>. Casting an integer to <type>bytea</type> produces
+   2, 4, or 8 bytes, depending on the width of the integer type. The result
+   is the two's complement representation of the integer, with the most
+   significant byte first. Some examples:
+<programlisting>
+1234::smallint::bytea          <lineannotation>\x04d2</lineannotation>
+cast(1234 as bytea)            <lineannotation>\x000004d2</lineannotation>
+cast(-1234 as bytea)           <lineannotation>\xfffffb2e</lineannotation>
+'\x8000'::bytea::smallint      <lineannotation>-32768</lineannotation>
+'\x8000'::bytea::integer       <lineannotation>32768</lineannotation>
+</programlisting>
+   Casting a <type>bytea</type> to an integer will raise an error if the
+   length of the <type>bytea</type> exceeds the width of the integer type.
+  </para>
+
   <para>
    See also the aggregate function <function>string_agg</function> in
    <xref linkend="functions-aggregate"/> and the large object functions
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index e455657170..d22648a7e4 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4057,6 +4057,96 @@ bytea_sortsupport(PG_FUNCTION_ARGS)
 	PG_RETURN_VOID();
 }
 
+/* Cast bytea -> int2 */
+Datum
+bytea_int2(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	uint16		result;
+
+	if (len > sizeof(result))
+		ereport(ERROR,
+				errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				errmsg("smallint out of range"));
+
+	result = 0;
+	for (int i = 0; i < len; i++)
+	{
+		result <<= BITS_PER_BYTE;
+		result |= ((unsigned char *) VARDATA_ANY(v))[i];
+	}
+
+	PG_RETURN_INT16(result);
+}
+
+/* Cast bytea -> int4 */
+Datum
+bytea_int4(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	uint32		result;
+
+	if (len > sizeof(result))
+		ereport(ERROR,
+				errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				errmsg("integer out of range"));
+
+	result = 0;
+	for (int i = 0; i < len; i++)
+	{
+		result <<= BITS_PER_BYTE;
+		result |= ((unsigned char *) VARDATA_ANY(v))[i];
+	}
+
+	PG_RETURN_INT32(result);
+}
+
+/* Cast bytea -> int8 */
+Datum
+bytea_int8(PG_FUNCTION_ARGS)
+{
+	bytea	   *v = PG_GETARG_BYTEA_PP(0);
+	int			len = VARSIZE_ANY_EXHDR(v);
+	uint64		result;
+
+	if (len > sizeof(result))
+		ereport(ERROR,
+				errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
+				errmsg("bigint out of range"));
+
+	result = 0;
+	for (int i = 0; i < len; i++)
+	{
+		result <<= BITS_PER_BYTE;
+		result |= ((unsigned char *) VARDATA_ANY(v))[i];
+	}
+
+	PG_RETURN_INT64(result);
+}
+
+/* Cast int2 -> bytea; can just use int2send() */
+Datum
+int2_bytea(PG_FUNCTION_ARGS)
+{
+	return int2send(fcinfo);
+}
+
+/* Cast int4 -> bytea; can just use int4send() */
+Datum
+int4_bytea(PG_FUNCTION_ARGS)
+{
+	return int4send(fcinfo);
+}
+
+/* Cast int8 -> bytea; can just use int8send() */
+Datum
+int8_bytea(PG_FUNCTION_ARGS)
+{
+	return int8send(fcinfo);
+}
+
 /*
  * appendStringInfoText
  *
diff --git a/src/include/catalog/pg_cast.dat b/src/include/catalog/pg_cast.dat
index a26ba34e86..ab46be606f 100644
--- a/src/include/catalog/pg_cast.dat
+++ b/src/include/catalog/pg_cast.dat
@@ -320,6 +320,20 @@
 { castsource => 'varchar', casttarget => 'name', castfunc => 'name(varchar)',
   castcontext => 'i', castmethod => 'f' },
 
+# Allow explicit coercions between bytea and integer types
+{ castsource => 'int2', casttarget => 'bytea', castfunc => 'bytea(int2)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int4', casttarget => 'bytea', castfunc => 'bytea(int4)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'int8', casttarget => 'bytea', castfunc => 'bytea(int8)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int2', castfunc => 'int2(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int4', castfunc => 'int4(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+{ castsource => 'bytea', casttarget => 'int8', castfunc => 'int8(bytea)',
+  castcontext => 'e', castmethod => 'f' },
+
 # Allow explicit coercions between int4 and "char"
 { castsource => 'char', casttarget => 'int4', castfunc => 'int4(char)',
   castcontext => 'e', castmethod => 'f' },
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cd9422d0ba..e25dd0bb0a 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -1165,6 +1165,25 @@
   proname => 'name', proleakproof => 't', prorettype => 'name',
   proargtypes => 'bpchar', prosrc => 'bpchar_name' },
 
+{ oid => '8577', descr => 'convert int2 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int2', prosrc => 'int2_bytea' },
+{ oid => '8578', descr => 'convert int4 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int4', prosrc => 'int4_bytea' },
+{ oid => '8579', descr => 'convert int8 to bytea',
+  proname => 'bytea', proleakproof => 't', prorettype => 'bytea',
+  proargtypes => 'int8', prosrc => 'int8_bytea' },
+{ oid => '8580', descr => 'convert bytea to int2',
+  proname => 'int2', prorettype => 'int2',
+  proargtypes => 'bytea', prosrc => 'bytea_int2' },
+{ oid => '8581', descr => 'convert bytea to int4',
+  proname => 'int4', prorettype => 'int4',
+  proargtypes => 'bytea', prosrc => 'bytea_int4' },
+{ oid => '8582', descr => 'convert bytea to int8',
+  proname => 'int8', prorettype => 'int8',
+  proargtypes => 'bytea', prosrc => 'bytea_int8' },
+
 { oid => '449', descr => 'hash',
   proname => 'hashint2', prorettype => 'int4', proargtypes => 'int2',
   prosrc => 'hashint2' },
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
index b673642ad1..20bf9ea9cd 100644
--- a/src/test/regress/expected/opr_sanity.out
+++ b/src/test/regress/expected/opr_sanity.out
@@ -875,6 +875,9 @@ uuid_extract_timestamp(uuid)
 uuid_extract_version(uuid)
 crc32(bytea)
 crc32c(bytea)
+bytea(smallint)
+bytea(integer)
+bytea(bigint)
 bytea_larger(bytea,bytea)
 bytea_smaller(bytea,bytea)
 -- Check that functions without argument are not marked as leakproof.
diff --git a/src/test/regress/expected/strings.out b/src/test/regress/expected/strings.out
index b65bb2d536..f8cba9f5b2 100644
--- a/src/test/regress/expected/strings.out
+++ b/src/test/regress/expected/strings.out
@@ -2358,6 +2358,108 @@ SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
 ERROR:  index 99 out of valid range, 0..8
+--
+-- conversions between bytea and integer types
+--
+SELECT 0x1234::int2::bytea AS "\x1234", (-0x1234)::int2::bytea AS "\xedcc";
+ \x1234 | \xedcc 
+--------+--------
+ \x1234 | \xedcc
+(1 row)
+
+SELECT 0x12345678::int4::bytea AS "\x12345678", (-0x12345678)::int4::bytea AS "\xedcba988";
+ \x12345678 | \xedcba988 
+------------+------------
+ \x12345678 | \xedcba988
+(1 row)
+
+SELECT 0x1122334455667788::int8::bytea AS "\x1122334455667788",
+       (-0x1122334455667788)::int8::bytea AS "\xeeddccbbaa998878";
+ \x1122334455667788 | \xeeddccbbaa998878 
+--------------------+--------------------
+ \x1122334455667788 | \xeeddccbbaa998878
+(1 row)
+
+SELECT ''::bytea::int2 AS "0";
+ 0 
+---
+ 0
+(1 row)
+
+SELECT '\x12'::bytea::int2 AS "18";
+ 18 
+----
+ 18
+(1 row)
+
+SELECT '\x1234'::bytea::int2 AS "4460";
+ 4460 
+------
+ 4660
+(1 row)
+
+SELECT '\x123456'::bytea::int2; -- error
+ERROR:  smallint out of range
+SELECT ''::bytea::int4 AS "0";
+ 0 
+---
+ 0
+(1 row)
+
+SELECT '\x12'::bytea::int4 AS "18";
+ 18 
+----
+ 18
+(1 row)
+
+SELECT '\x12345678'::bytea::int4 AS "305419896";
+ 305419896 
+-----------
+ 305419896
+(1 row)
+
+SELECT '\x123456789A'::bytea::int4; -- error
+ERROR:  integer out of range
+SELECT ''::bytea::int8 AS "0";
+ 0 
+---
+ 0
+(1 row)
+
+SELECT '\x12'::bytea::int8 AS "18";
+ 18 
+----
+ 18
+(1 row)
+
+SELECT '\x1122334455667788'::bytea::int8 AS "1234605616436508552";
+ 1234605616436508552 
+---------------------
+ 1234605616436508552
+(1 row)
+
+SELECT '\x112233445566778899'::bytea::int8; -- error
+ERROR:  bigint out of range
+-- min/max integer values
+SELECT '\x8000'::bytea::int2 AS "-32768", '\x7FFF'::bytea::int2 AS "32767";
+ -32768 | 32767 
+--------+-------
+ -32768 | 32767
+(1 row)
+
+SELECT '\x80000000'::bytea::int4 AS "-2147483648", '\x7FFFFFFF'::bytea::int4 AS "2147483647";
+ -2147483648 | 2147483647 
+-------------+------------
+ -2147483648 | 2147483647
+(1 row)
+
+SELECT '\x8000000000000000'::bytea::int8 AS "-9223372036854775808",
+       '\x7FFFFFFFFFFFFFFF'::bytea::int8 AS "9223372036854775807";
+ -9223372036854775808 | 9223372036854775807 
+----------------------+---------------------
+ -9223372036854775808 | 9223372036854775807
+(1 row)
+
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
 --
diff --git a/src/test/regress/sql/strings.sql b/src/test/regress/sql/strings.sql
index 8e0f3a0e75..4deb0683d5 100644
--- a/src/test/regress/sql/strings.sql
+++ b/src/test/regress/sql/strings.sql
@@ -751,6 +751,35 @@ SELECT get_byte('\x1234567890abcdef00'::bytea, 99);  -- error
 SELECT set_byte('\x1234567890abcdef00'::bytea, 7, 11);
 SELECT set_byte('\x1234567890abcdef00'::bytea, 99, 11);  -- error
 
+--
+-- conversions between bytea and integer types
+--
+SELECT 0x1234::int2::bytea AS "\x1234", (-0x1234)::int2::bytea AS "\xedcc";
+SELECT 0x12345678::int4::bytea AS "\x12345678", (-0x12345678)::int4::bytea AS "\xedcba988";
+SELECT 0x1122334455667788::int8::bytea AS "\x1122334455667788",
+       (-0x1122334455667788)::int8::bytea AS "\xeeddccbbaa998878";
+
+SELECT ''::bytea::int2 AS "0";
+SELECT '\x12'::bytea::int2 AS "18";
+SELECT '\x1234'::bytea::int2 AS "4460";
+SELECT '\x123456'::bytea::int2; -- error
+
+SELECT ''::bytea::int4 AS "0";
+SELECT '\x12'::bytea::int4 AS "18";
+SELECT '\x12345678'::bytea::int4 AS "305419896";
+SELECT '\x123456789A'::bytea::int4; -- error
+
+SELECT ''::bytea::int8 AS "0";
+SELECT '\x12'::bytea::int8 AS "18";
+SELECT '\x1122334455667788'::bytea::int8 AS "1234605616436508552";
+SELECT '\x112233445566778899'::bytea::int8; -- error
+
+-- min/max integer values
+SELECT '\x8000'::bytea::int2 AS "-32768", '\x7FFF'::bytea::int2 AS "32767";
+SELECT '\x80000000'::bytea::int4 AS "-2147483648", '\x7FFFFFFF'::bytea::int4 AS "2147483647";
+SELECT '\x8000000000000000'::bytea::int8 AS "-9223372036854775808",
+       '\x7FFFFFFFFFFFFFFF'::bytea::int8 AS "9223372036854775807";
+
 --
 -- test behavior of escape_string_warning and standard_conforming_strings options
 --
-- 
2.43.0

#34Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Dean Rasheed (#33)
Re: [PATCH] Add get_bytes() and set_bytes() functions

On Sat, 1 Mar 2025 at 11:30, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:

With those updates, I think this is ready for commit, which I'll do in
a day or two, if there are no further comments.

Committed.

Regards,
Dean