JSON_VALUE() behavior when RETURNING bytea (expected base64 decoding)
Greetings hackers,
The de-facto standard for storing binary data in JSON documents seems to be
base64-encoded strings, so I was expecting JSON_VALUE's RETURNING bytea to
do base64 decoding. However, that does not seem to be the case:
SELECT decode('AQID', 'base64'); -- 0x010203
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQID
This forces an additional explicit decode() function call:
SELECT decode(JSON_VALUE(jsonb '"AQID"', '$'), 'base64'); -- 0x010203
Is the above behavior intentional?
Shay
On Wednesday, March 5, 2025, Shay Rojansky <roji@roji.org> wrote:
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQID
I get \x41514944 which is precisely what I would expect since it what this
query results in as well:
select 'AQID'::bytea;
David J.
It looks like your bytea_output is set to 'escape', which would explain
what you are seeing. Try adding this in first:
SET bytea_output = hex;
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea);
That (hex) is the default value, so you must be setting it to escape
somewhere. You can see where by running:
select * from pg_settings where name = 'bytea_output';
Examine the source, sourcefile, and sourceline columns
I personally prefer hex, but perhaps it's set to octet in your system for a
reason. If not, maybe change it globally?
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQIDI get \x41514944 which is precisely what I would expect since it what this
query results in as well:select 'AQID'::bytea;
If the behavior of RETURNING is meant to be identical to that of simply
applying a cast, is there any actual advantage in using JSON_VALUE with
RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
'$')::bytea instead of using RETURNING? I thought the point was precisely
for RETURNING to be able to perform JSON-specific conversions (e.g. take
into account that the base64 is being converted from a *JSON* string, and
therefore apply base64 decoding to it).
On Wednesday, March 5, 2025, Shay Rojansky <roji@roji.org> wrote:
SELECT JSON_VALUE(jsonb '"AQID"', '$' RETURNING bytea); -- Expected
0x010203, got AQIDI get \x41514944 which is precisely what I would expect since it what
this query results in as well:select 'AQID'::bytea;
If the behavior of RETURNING is meant to be identical to that of simply
applying a cast, is there any actual advantage in using JSON_VALUE with
RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
'$')::bytea instead of using RETURNING? I thought the point was precisely
for RETURNING to be able to perform JSON-specific conversions (e.g. take
into account that the base64 is being converted from a *JSON* string, and
therefore apply base64 decoding to it).
Not really…it does seem to just be syntactic sugar. Not that we’d be
likely to assume the contents of a JSON string are a base64 encoding as it
is just, as you claim, a de-facto standard. Unless we have some standard
(namely the one defining json_value) telling us that the contents are
indeed always base64 encoded data we’ll just assume it’s plain text and act
accordingly - in this case passing it into bytea’s input function.
David J.
If the behavior of RETURNING is meant to be identical to that of simply
applying a cast, is there any actual advantage in using JSON_VALUE with
RETURNING? In other words, why not just do JSON_VALUE(json '"AQID"',
'$')::bytea instead of using RETURNING? I thought the point was precisely
for RETURNING to be able to perform JSON-specific conversions (e.g. take
into account that the base64 is being converted from a *JSON* string, and
therefore apply base64 decoding to it).Not really…it does seem to just be syntactic sugar. Not that we’d be
likely to assume the contents of a JSON string are a base64 encoding as it
is just, as you claim, a de-facto standard. Unless we have some standard
(namely the one defining json_value) telling us that the contents are
indeed always base64 encoded data we’ll just assume it’s plain text and act
accordingly - in this case passing it into bytea’s input function.
OK. For whatever it's worth, I'll note that SQL Server's OPENJSON does do
this (so when a JSON string property is extracted as a binary type, base64
encoding is assumed). Other databases also have very specific documented
conversion rules for JSON_VALUE RETURNING (Oracle
<https://docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-used-in-functions-and-conditions-for-json.html#GUID-DE9F29D3-1C23-4271-9DCD-E585866576D2>,
DB2
<https://www.ibm.com/docs/en/i/7.3?topic=functions-json-table#rbafzscajsontable__json_result>
(table 1)). I'm basically trying to show that RETURNING definitely isn't a
simple cast-from-string in other databases, but is a distinct conversion
mechanism that takes into account the fact the the origin data comes from
JSON.
JSON is of course a very light on formal/official standards, but some very
strong de-facto standards have established themselves (e.g. ISO8601 for
timestamps), and even beyond JSON, base64 seems to be the de-facto standard
for encoding binary data as a string (which is what this is about). I'll
also point out again that if the user really is looking only to get a
string out and apply regular PG convert-from-string casting, they can do
just that (i.e. omit RETURNING and apply regular PG casting); to me that
points to RETURNING doing something beyond that, adding JSON-specific
usefulness.
On 06.03.25 11:55, Shay Rojansky wrote:
For whatever it's worth, I'll note that SQL Server's OPENJSON does do
this (so when a JSON string property is extracted as a binary type,
base64 encoding is assumed). Other databases also have very specific
documented conversion rules for JSON_VALUE RETURNING (Oracle <https://
docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-
used-in-functions-and-conditions-for-json.html#GUID-
DE9F29D3-1C23-4271-9DCD-E585866576D2>, DB2 <https://www.ibm.com/docs/en/
i/7.3?topic=functions-json-table#rbafzscajsontable__json_result> (table
1)). I'm basically trying to show that RETURNING definitely isn't a
simple cast-from-string in other databases, but is a distinct conversion
mechanism that takes into account the fact the the origin data comes
from JSON.
According to the SQL standard, once you account for various special
cases (non-scalar values, null values), it comes down to a cast.
For whatever it's worth, I'll note that SQL Server's OPENJSON does do
this (so when a JSON string property is extracted as a binary type,
base64 encoding is assumed). Other databases also have very specific
documented conversion rules for JSON_VALUE RETURNING (Oracle <https://
docs.oracle.com/en/database/oracle/oracle-database/19/adjsn/clauses-
used-in-functions-and-conditions-for-json.html#GUID-
DE9F29D3-1C23-4271-9DCD-E585866576D2>, DB2 <https://www.ibm.com/docs/en/
i/7.3?topic=functions-json-table#rbafzscajsontable__json_result> (table
1)). I'm basically trying to show that RETURNING definitely isn't a
simple cast-from-string in other databases, but is a distinct conversion
mechanism that takes into account the fact the the origin data comes
from JSON.According to the SQL standard, once you account for various special
cases (non-scalar values, null values), it comes down to a cast.
OK. I don't have the SQL standard here, but I'll just note that this
doesn't seem to be what most/all other databases are doing - there's maybe
room for interpretation there (but again, I have no idea). Applying certain
transformations where needed certainly seems like the more useful thing to
do, like in this case.