JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

Started by Shay Rojansky10 months ago8 messages
#1Shay Rojansky
roji@roji.org

Greetings hackers,

The de-facto standard for storing binary data in JSON documents seems to be
base64-encoded strings, so I was expecting JSON_VALUE's RETURNING bytea to
do base64 decoding. However, that does not seem to be the case:

SELECT decode('AQID', 'base64'); -- 0x010203
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQID

This forces an additional explicit decode() function call:

SELECT decode(JSON_VALUE(jsonb '"AQID"', '$'), 'base64'); -- 0x010203

Is the above behavior intentional?

Shay

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: Shay Rojansky (#1)
Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

On Wednesday, March 5, 2025, Shay Rojansky <roji@roji.org> wrote:

SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQID

I get \x41514944 which is precisely what I would expect since it what this
query results in as well:

select 'AQID'::bytea;

David J.

#3Greg Sabino Mullane
htamfids@gmail.com
In reply to: Shay Rojansky (#1)
Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

It looks like your bytea_output is set to 'escape', which would explain
what you are seeing. Try adding this in first:

SET bytea_output = hex;
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea);

That (hex) is the default value, so you must be setting it to escape
somewhere. You can see where by running:

select * from pg_settings where name = 'bytea_output';

Examine the source, sourcefile, and sourceline columns

I personally prefer hex, but perhaps it's set to octet in your system for a
reason. If not, maybe change it globally?

Cheers,
Greg

--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support

#4Shay Rojansky
roji@roji.org
In reply to: David G. Johnston (#2)
Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQID

I get \x41514944 which is precisely what I would expect since it what this
query results in as well:

select 'AQID'::bytea;

If the behavior of RETURNING is meant to be identical to that of simply
applying a cast, is there any actual advantage in using JSON_VALUE with
RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
'$')::bytea instead of using RETURNING? I thought the point was precisely
for RETURNING to be able to perform JSON-specific conversions (e.g. take
into account that the base64 is being converted from a *JSON* string, and
therefore apply base64 decoding to it).

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Shay Rojansky (#4)
Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

On Wednesday, March 5, 2025, Shay Rojansky <roji@roji.org> wrote:

SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQID

I get \x41514944 which is precisely what I would expect since it what
this query results in as well:

select 'AQID'::bytea;

If the behavior of RETURNING is meant to be identical to that of simply
applying a cast, is there any actual advantage in using JSON_VALUE with
RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
'$')::bytea instead of using RETURNING? I thought the point was precisely
for RETURNING to be able to perform JSON-specific conversions (e.g. take
into account that the base64 is being converted from a *JSON* string, and
therefore apply base64 decoding to it).

Not really…it does seem to just be syntactic sugar. Not that we’d be
likely to assume the contents of a JSON string are a base64 encoding as it
is just, as you claim, a de-facto standard. Unless we have some standard
(namely the one defining json_value) telling us that the contents are
indeed always base64 encoded data we’ll just assume it’s plain text and act
accordingly - in this case passing it into bytea’s input function.

David J.

#6Shay Rojansky
roji@roji.org
In reply to: David G. Johnston (#5)
Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

If the behavior of RETURNING is meant to be identical to that of simply

applying a cast, is there any actual advantage in using JSON_VALUE with
RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
'$')::bytea instead of using RETURNING? I thought the point was precisely
for RETURNING to be able to perform JSON-specific conversions (e.g. take
into account that the base64 is being converted from a *JSON* string, and
therefore apply base64 decoding to it).

Not really…it does seem to just be syntactic sugar. Not that we’d be
likely to assume the contents of a JSON string are a base64 encoding as it
is just, as you claim, a de-facto standard. Unless we have some standard
(namely the one defining json_value) telling us that the contents are
indeed always base64 encoded data we’ll just assume it’s plain text and act
accordingly - in this case passing it into bytea’s input function.

OK. For whatever it's worth, I'll note that SQL Server's OPENJSON does do
this (so when a JSON string property is extracted as a binary type, base64
encoding is assumed). Other databases also have very specific documented
conversion rules for JSON_VALUE RETURNING (Oracle
<https://docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-used-in-functions-and-conditions-for-json.html#GUID-DE9F29D3-1C23-4271-9DCD-E585866576D2&gt;,
DB2
<https://www.ibm.com/docs/en/i/7.3?topic=functions-json-table#rbafzscajsontable__json_result&gt;
(table 1)). I'm basically trying to show that RETURNING definitely isn't a
simple cast-from-string in other databases, but is a distinct conversion
mechanism that takes into account the fact the the origin data comes from
JSON.

JSON is of course a very light on formal/official standards, but some very
strong de-facto standards have established themselves (e.g. ISO8601 for
timestamps), and even beyond JSON, base64 seems to be the de-facto standard
for encoding binary data as a string (which is what this is about). I'll
also point out again that if the user really is looking only to get a
string out and apply regular PG convert-from-string casting, they can do
just that (i.e. omit RETURNING and apply regular PG casting); to me that
points to RETURNING doing something beyond that, adding JSON-specific
usefulness.

#7Peter Eisentraut
peter@eisentraut.org
In reply to: Shay Rojansky (#6)
Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

On 06.03.25 11:55, Shay Rojansky wrote:

For whatever it's worth, I'll note that SQL Server's OPENJSON does do
this (so when a JSON string property is extracted as a binary type,
base64 encoding is assumed). Other databases also have very specific
documented conversion rules for JSON_VALUE RETURNING (Oracle <https://
docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-
used-in-functions-and-conditions-for-json.html#GUID-
DE9F29D3-1C23-4271-9DCD-E585866576D2>, DB2 <https://www.ibm.com/docs/en/
i/7.3?topic=functions-json-table#rbafzscajsontable__json_result> (table
1)). I'm basically trying to show that RETURNING definitely isn't a
simple cast-from-string in other databases, but is a distinct conversion
mechanism that takes into account the fact the the origin data comes
from JSON.

According to the SQL standard, once you account for various special
cases (non-scalar values, null values), it comes down to a cast.

#8Shay Rojansky
roji@roji.org
In reply to: Peter Eisentraut (#7)
Re: JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)

For whatever it's worth, I'll note that SQL Server's OPENJSON does do
this (so when a JSON string property is extracted as a binary type,
base64 encoding is assumed). Other databases also have very specific
documented conversion rules for JSON_VALUE RETURNING (Oracle <https://
docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-
used-in-functions-and-conditions-for-json.html#GUID-
DE9F29D3-1C23-4271-9DCD-E585866576D2>, DB2 <https://www.ibm.com/docs/en/
i/7.3?topic=functions-json-table#rbafzscajsontable__json_result> (table
1)). I'm basically trying to show that RETURNING definitely isn't a
simple cast-from-string in other databases, but is a distinct conversion
mechanism that takes into account the fact the the origin data comes
from JSON.

According to the SQL standard, once you account for various special
cases (non-scalar values, null values), it comes down to a cast.

OK. I don't have the SQL standard here, but I'll just note that this
doesn't seem to be what most/all other databases are doing - there's maybe
room for interpretation there (but again, I have no idea). Applying certain
transformations where needed certainly seems like the more useful thing to
do, like in this case.