PATCH: Implement value_to_json for single-datum conversion

Started by Craig Ringerover 13 years ago8 messages
#1Craig Ringer
ringerc@ringerc.id.au
1 attachment(s)

Hi all

Whenever I try to work with the new json types I trip over the lack of a
function to escape text to json. The attached patch against master
provides one by exposing the existing datum_to_json function to SQL.
I've used the name value_to_json, but I'm not sure it's necessarily the
right choice of name.

Please consider this for the 9.2 branch as well as to HEAD, as IMO it's
very important for basic usability of the json functionality. It applies
to 9.2 fine and passes "make check". I know it's late in the game, but
it's also a very small change and it's very hard to build up JSON data
structures other than simple rows or arrays without at minimum a way of
escaping `text' to json strings.

This feels basic enough that I'm wondering if there's a reason it wasn't
included from the start, but I don't see any comments in json.c talking
about anything like this, nor did I find any -hackers discussion about
it. I suspect it's just an oversight.

As value_to_json directly wraps datum_to_json it actually accepts record
and array types too. I didn't see any reason to prevent that and force
the user to instead use row_to_json or array_to_json for those cases. If
you don't want to accept this, I can provide a wrapper for escape_json
that only accepts a text argument instead, but I think *some* way to
escape text to JSON is vital to have in 9.2.

A docs patch will follow shortly if you're happy that this patch is
reasonable.

--
Craig Ringer

Attachments:

0001-Implement-value_to_json-exposing-the-existing-datum_.patchtext/x-patch; name=0001-Implement-value_to_json-exposing-the-existing-datum_.patchDownload
#2Craig Ringer
ringerc@ringerc.id.au
In reply to: Craig Ringer (#1)
Re: PATCH: Implement value_to_json for single-datum conversion

Whoops. It actually looks like the posted patch muffed up opr_sanity
checks. I'm totally new to pg_proc.h wrangling so I'm not sure why yet,
looking.

Sorry, not sure how I missed that. I'll follow up shortly.

--
Craig Ringer

#3Craig Ringer
ringerc@ringerc.id.au
In reply to: Craig Ringer (#2)
1 attachment(s)
Re: PATCH: Implement value_to_json for single-datum conversion

OK, opr_sanity was failing because I added the value_to_json(text) alias
to ensure that:

value_to_json('some_literal')

worked, following the same approach as quote_literal(anyelement) and
quote_literal(text). That should be reasonable, right? The comments on
the affected check in opr_sanity say that it's not necessarily wrong so
long as the called function is prepared to handle the different
arguments its self - which it is, since it's already accepting anyelement.

The test comment reads:

Note that the expected output of this part of the test will
need to be modified whenever new pairs of types are made
binary-equivalent,
or when new polymorphic built-in functions are added

so that seems reasonable.

postgres=# \df quote_literal
List of functions
Schema | Name | Result data type | Argument data types |
Type
------------+---------------+------------------+---------------------+--------
pg_catalog | quote_literal | text | anyelement | normal
pg_catalog | quote_literal | text | text | normal
(2 rows)

postgres=# \df value_to_json
List of functions
Schema | Name | Result data type | Argument data types |
Type
------------+---------------+------------------+---------------------+--------
pg_catalog | value_to_json | json | anyelement | normal
pg_catalog | value_to_json | json | text | normal
(2 rows)

Revised patch that tweaks the expected result of opr_sanity attached.

--
Craig Ringer

Attachments:

0001-Implement-value_to_json-exposing-the-existing-datum_.patchtext/x-patch; name=0001-Implement-value_to_json-exposing-the-existing-datum_.patchDownload
[PATCH] Implement value_to_json, exposing the existing datum_to_json
for use from SQL.

A generic json quoting function needs to be available from SQL to permit
the building of any JSON structures other than those produced from arrays
or rowtypes.
---
 src/backend/utils/adt/json.c             | 28 ++++++++++
 src/include/catalog/pg_proc.h            |  4 ++
 src/include/utils/json.h                 |  1 +
 src/test/regress/expected/json.out       | 91 ++++++++++++++++++++++++++++++++
 src/test/regress/expected/opr_sanity.out |  3 +-
 src/test/regress/sql/json.sql            | 39 ++++++++++++++
 6 files changed, 165 insertions(+), 1 deletion(-)

diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
new file mode 100644
index 0425ac6..11babd3
*** a/src/backend/utils/adt/json.c
--- b/src/backend/utils/adt/json.c
*************** row_to_json_pretty(PG_FUNCTION_ARGS)
*** 1121,1126 ****
--- 1121,1154 ----
  }
  
  /*
+  * SQL function value_to_json(value)
+  *
+  * Wraps datum_to_json for use from SQL, so any element may be
+  * converted to json.
+  */
+ extern Datum
+ value_to_json(PG_FUNCTION_ARGS)
+ {
+ 	Datum		arg0 = PG_GETARG_DATUM(0);
+ 	Oid			arg0type = get_fn_expr_argtype(fcinfo->flinfo, 0);
+ 	StringInfo	result;
+ 	TYPCATEGORY tcategory;
+ 	Oid			typoutput;
+ 	bool		typisvarlena;
+ 
+ 	if (arg0type == JSONOID)
+ 		tcategory = TYPCATEGORY_JSON;
+ 	else
+ 		tcategory = TypeCategory(arg0type);
+ 	
+ 	getTypeOutputInfo(arg0type, &typoutput, &typisvarlena);
+ 
+ 	result = makeStringInfo();
+ 	datum_to_json(arg0, PG_ARGISNULL(0), result, tcategory, typoutput);
+ 	PG_RETURN_TEXT_P(cstring_to_text(result->data));
+ }
+ 
+ /*
   * Produce a JSON string literal, properly escaping characters in the text.
   */
  void
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
new file mode 100644
index 665918f..31e7772
*** a/src/include/catalog/pg_proc.h
--- b/src/include/catalog/pg_proc.h
*************** DATA(insert OID = 3155 (  row_to_json
*** 4091,4096 ****
--- 4091,4100 ----
  DESCR("map row to json");
  DATA(insert OID = 3156 (  row_to_json	   PGNSP PGUID 12 1 0 0 0 f f f f t f s 2 0 114 "2249 16" _null_ _null_ _null_ _null_ row_to_json_pretty _null_ _null_ _null_ ));
  DESCR("map row to json with optional pretty printing");
+ DATA(insert OID = 3164 (  value_to_json	   PGNSP PGUID 12 1 0 0 0 f f f f f f s 1 0 114 "2283" _null_ _null_ _null_ _null_ value_to_json _null_ _null_ _null_ ));
+ DESCR("Convert a simple value to json, quoting and escaping if necessary");
+ DATA(insert OID = 3169 (  value_to_json	   PGNSP PGUID 12 1 0 0 0 f f f f f f s 1 0 114 "25" _null_ _null_ _null_ _null_ value_to_json _null_ _null_ _null_ ));
+ DESCR("Convert a simple value to json, quoting and escaping if necessary");
  
  /* uuid */
  DATA(insert OID = 2952 (  uuid_in		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 1 0 2950 "2275" _null_ _null_ _null_ _null_ uuid_in _null_ _null_ _null_ ));
diff --git a/src/include/utils/json.h b/src/include/utils/json.h
new file mode 100644
index 0f38147..432ad24
*** a/src/include/utils/json.h
--- b/src/include/utils/json.h
*************** extern Datum array_to_json(PG_FUNCTION_A
*** 25,30 ****
--- 25,31 ----
  extern Datum array_to_json_pretty(PG_FUNCTION_ARGS);
  extern Datum row_to_json(PG_FUNCTION_ARGS);
  extern Datum row_to_json_pretty(PG_FUNCTION_ARGS);
+ extern Datum value_to_json(PG_FUNCTION_ARGS);
  extern void escape_json(StringInfo buf, const char *str);
  
  #endif   /* JSON_H */
diff --git a/src/test/regress/expected/json.out b/src/test/regress/expected/json.out
new file mode 100644
index 2dfe7bb..41e011d
*** a/src/test/regress/expected/json.out
--- b/src/test/regress/expected/json.out
*************** FROM (SELECT '{"a":1,"b": [2,3,4,"d","e"
*** 433,435 ****
--- 433,526 ----
   {"jsonfield":{"a":1,"b": [2,3,4,"d","e","f"],"c":{"p":1,"q":2}}}
  (1 row)
  
+ -- value_to_json(json)
+ SELECT value_to_json('{"a":1}'::json);
+  value_to_json 
+ ---------------
+  {"a":1}
+ (1 row)
+ 
+ -- value_to_json(text)
+ SELECT value_to_json('{"a":1}'::text);
+  value_to_json 
+ ---------------
+  "{\"a\":1}"
+ (1 row)
+ 
+ -- value_to_json(unspecified-text)
+ SELECT value_to_json('{"a":1}');
+  value_to_json 
+ ---------------
+  "{\"a\":1}"
+ (1 row)
+ 
+ -- value_to_json(integer)
+ SELECT value_to_json(24);
+  value_to_json 
+ ---------------
+  24
+ (1 row)
+ 
+ -- value_to_json(float)
+ SELECT value_to_json('24.9'::float8);
+  value_to_json 
+ ---------------
+  24.9
+ (1 row)
+ 
+ -- value_to_json(NaN)
+ SELECT value_to_json('NaN'::float8);
+  value_to_json 
+ ---------------
+  "NaN"
+ (1 row)
+ 
+ -- value_to_json(Infinity)
+ SELECT value_to_json('Infinity'::float8);
+  value_to_json 
+ ---------------
+  "Infinity"
+ (1 row)
+ 
+ -- value_to_json(-Infinity)
+ SELECT value_to_json('-Infinity'::float8);
+  value_to_json 
+ ---------------
+  "-Infinity"
+ (1 row)
+ 
+ -- value_to_json(json[])
+ SELECT value_to_json(ARRAY[ '1'::json, '"4"'::json, '{"a":1}'::json ]::json[]);
+   value_to_json  
+ -----------------
+  [1,"4",{"a":1}]
+ (1 row)
+ 
+ -- value_to_json(text[])
+ SELECT value_to_json(ARRAY[ '1', '"4"', '{"a":1}' ]::text[]);
+        value_to_json       
+ ---------------------------
+  ["1","\"4\"","{\"a\":1}"]
+ (1 row)
+ 
+ -- value_to_json(int[])
+ SELECT value_to_json(ARRAY[1,2,3,4,-1]);
+  value_to_json 
+ ---------------
+  [1,2,3,4,-1]
+ (1 row)
+ 
+ -- value_to_json(row)
+ SELECT value_to_json(row(1,'foo'));
+  value_to_json 
+ ---------------
+  "(1,foo)"
+ (1 row)
+ 
+ -- value_to_json(emptyrow)
+ SELECT value_to_json(row());
+  value_to_json 
+ ---------------
+  "()"
+ (1 row)
+ 
diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out
new file mode 100644
index 110ea41..a7a3576
*** a/src/test/regress/expected/opr_sanity.out
--- b/src/test/regress/expected/opr_sanity.out
*************** ORDER BY 1, 2;
*** 171,180 ****
  -------------+-------------
            25 |        1042
            25 |        1043
          1114 |        1184
          1560 |        1562
          2277 |        2283
! (5 rows)
  
  SELECT DISTINCT p1.proargtypes[1], p2.proargtypes[1]
  FROM pg_proc AS p1, pg_proc AS p2
--- 171,181 ----
  -------------+-------------
            25 |        1042
            25 |        1043
+           25 |        2283
          1114 |        1184
          1560 |        1562
          2277 |        2283
! (6 rows)
  
  SELECT DISTINCT p1.proargtypes[1], p2.proargtypes[1]
  FROM pg_proc AS p1, pg_proc AS p2
diff --git a/src/test/regress/sql/json.sql b/src/test/regress/sql/json.sql
new file mode 100644
index 52be0cf..8a0c3a5
*** a/src/test/regress/sql/json.sql
--- b/src/test/regress/sql/json.sql
*************** FROM (SELECT '-Infinity'::float8 AS "flo
*** 113,115 ****
--- 113,154 ----
  -- json input
  SELECT row_to_json(q)
  FROM (SELECT '{"a":1,"b": [2,3,4,"d","e","f"],"c":{"p":1,"q":2}}'::json AS "jsonfield") q;
+ 
+ -- value_to_json(json)
+ SELECT value_to_json('{"a":1}'::json);
+ 
+ -- value_to_json(text)
+ SELECT value_to_json('{"a":1}'::text);
+ 
+ -- value_to_json(unspecified-text)
+ SELECT value_to_json('{"a":1}');
+ 
+ -- value_to_json(integer)
+ SELECT value_to_json(24);
+ 
+ -- value_to_json(float)
+ SELECT value_to_json('24.9'::float8);
+ 
+ -- value_to_json(NaN)
+ SELECT value_to_json('NaN'::float8);
+ 
+ -- value_to_json(Infinity)
+ SELECT value_to_json('Infinity'::float8);
+ 
+ -- value_to_json(-Infinity)
+ SELECT value_to_json('-Infinity'::float8);
+ 
+ -- value_to_json(json[])
+ SELECT value_to_json(ARRAY[ '1'::json, '"4"'::json, '{"a":1}'::json ]::json[]);
+ 
+ -- value_to_json(text[])
+ SELECT value_to_json(ARRAY[ '1', '"4"', '{"a":1}' ]::text[]);
+ 
+ -- value_to_json(int[])
+ SELECT value_to_json(ARRAY[1,2,3,4,-1]);
+ 
+ -- value_to_json(row)
+ SELECT value_to_json(row(1,'foo'));
+ 
+ -- value_to_json(emptyrow)
+ SELECT value_to_json(row());
-- 
1.7.11.2

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#3)
Re: PATCH: Implement value_to_json for single-datum conversion

Craig Ringer <ringerc@ringerc.id.au> writes:

OK, opr_sanity was failing because I added the value_to_json(text) alias
to ensure that:

value_to_json('some_literal')

worked, following the same approach as quote_literal(anyelement) and
quote_literal(text). That should be reasonable, right?

No, it isn't. What you're proposing is to let opr_sanity think that
text and anyelement are interchangeable to C functions, which is so
far from reality as to be ludicrous. That would be seriously damaging
to its ability to detect errors.

But more to the point, your analogy to quote_literal is faulty anyway.
If you looked at that, what you'd find is that only quote_literal(text)
is a C function. The other one is a SQL wrapper around a coercion to
text followed by the C function. I rather imagine that the definition
as you have it would crash on, say, value_to_json(42).

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#1)
Re: PATCH: Implement value_to_json for single-datum conversion

Craig Ringer <ringerc@ringerc.id.au> writes:

Whenever I try to work with the new json types I trip over the lack of a
function to escape text to json. The attached patch against master
provides one by exposing the existing datum_to_json function to SQL.
...
This feels basic enough that I'm wondering if there's a reason it wasn't
included from the start,

There actually was a previous thread about this:
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php
Note in particular Andrew's comment:

Second, RFC 4627 is absolutely clear: a valid JSON value can
only be an object or an array, so this thing about converting
arbitrary datum values to JSON is a fantasy. If anything, we
should adjust the JSON input routines to disallow anything else,
rather than start to output what is not valid JSON.

It's possible he's misread the spec, but I think we ought to tread very
carefully before adding "obvious" conversions we might regret later.

Please consider this for the 9.2 branch as well as to HEAD, as IMO it's
very important for basic usability of the json functionality. It applies
to 9.2 fine and passes "make check". I know it's late in the game,

It's several months too late for feature additions to 9.2, especially
ones that would require an initdb to install. We can look at this for
9.3, but I'm still worried about the spec-compliance question.

regards, tom lane

#6Craig Ringer
ringerc@ringerc.id.au
In reply to: Tom Lane (#5)
Re: PATCH: Implement value_to_json for single-datum conversion

On 08/13/2012 12:48 PM, Tom Lane wrote:

There actually was a previous thread about this:
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php
Note in particular Andrew's comment:

Second, RFC 4627 is absolutely clear: a valid JSON value can
only be an object or an array, so this thing about converting
arbitrary datum values to JSON is a fantasy. If anything, we
should adjust the JSON input routines to disallow anything else,
rather than start to output what is not valid JSON.

Thanks for taking a look. That makes sense. I guess these are similar
issues to those the XML type faces, where working with fragments is a
problem. The spec requires a single root element, but you don't always
have that when you're *building* XML, hence the addition of `IS DOCUMENT'.

I was hoping to find a low-impact way to allow SQL-level construction of
more complex JSON objects with correct text escaping, but it sounds like
this isn't the right route. I don't currently see any way to achieve the
kind of on-the-fly building you can do with XML's xmlelement(),
xmlconcat(), xmlforest() etc; nor equivalent to hstore's
hstore(text[],text[]), and I was hoping to improve that.

I have a half-finished JSON object constructor
json_object_from_arrays(text[], json[]) in the same style as
hstore(text[],text[]) . It won't work without the notion of json-typed
scalars, though, as the values of keys could then only be arrays or
objects, which isn't very useful. I can't usefully accept `anyarray' as
a values argument since arrays are of homogeneous type. Accepting text[]
would be a bug-magnet even if there was some kind of `text
json_escape(text)' function.

Would it be reasonable to add a separate json_element type, one that's
binary-equivalent to `json' but not constrained by the requirement to be
an array or object/dict? Or a `jsobject' ?

As for the value_to_json crashing, works for me:

postgres=# SELECT value_to_json(42);
value_to_json
---------------
42
(1 row)

... since datum_to_json is happy to accept anything you throw at it
using output function lookups, and value_to_json its self doesn't care
about the argument type at all. That was all in the regression tests.

Purely so I understand what the correct handling of the anyelement+text
overload would've been: In light of your comments on opr_sanity would
the right approach be to add a second C function like text_to_json that
only accepts 'text' to avoid confusing the sanity check? So the SQL
"value_to_json(anyelement)" would point to the C "value_to_json" and the
SQL "value_to_json(text)" would point to the C "text_to_json" ?

Anyway, clearly the value_to_json approach is out.

--
Craig Ringer

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#6)
Re: PATCH: Implement value_to_json for single-datum conversion

Craig Ringer <ringerc@ringerc.id.au> writes:

As for the value_to_json crashing, works for me:

postgres=# SELECT value_to_json(42);
value_to_json
---------------
42
(1 row)

Oh, right, because there actually is support for anyelement in the
underlying C function. There is not in the quote_literal case.

Purely so I understand what the correct handling of the anyelement+text
overload would've been: In light of your comments on opr_sanity would
the right approach be to add a second C function like text_to_json that
only accepts 'text' to avoid confusing the sanity check?

Actually, given the above, what did you need value_to_json(text) for at
all? Wouldn't value_to_json(anyelement) have covered it?

But yeah, the general approach to suppressing complaints from that
opr_sanity test is to make more C entry points. The point of it,
in some sense, is that if you want to make an assumption that two
types are binary-equivalent then it's better to have that assumption
in C code than embedded in the pg_proc entries. The cases that we
let pass via the "expected" outputs are only ones where binary
equivalence seems pretty well assured, like text vs varchar.

regards, tom lane

#8Craig Ringer
ringerc@ringerc.id.au
In reply to: Tom Lane (#7)
Re: PATCH: Implement value_to_json for single-datum conversion

On 08/13/2012 01:55 PM, Tom Lane wrote:

Actually, given the above, what did you need value_to_json(text) for at
all? Wouldn't value_to_json(anyelement) have covered it?

Usability. Without the version accepting text an explicit cast to text
is required to disambiguate a literal argument like
value_to_json('something') .

But yeah, the general approach to suppressing complaints from that
opr_sanity test is to make more C entry points. The point of it,
in some sense, is that if you want to make an assumption that two
types are binary-equivalent then it's better to have that assumption
in C code than embedded in the pg_proc entries. The cases that we
let pass via the "expected" outputs are only ones where binary
equivalence seems pretty well assured, like text vs varchar.

Thanks. I appreciate the explanation, and sorry for the newbie error.

On the JSON stuff, I can see it's not as simple as adding a simple
escape function. For my needs during the 9.2 timeframe I'll bundle up an
extension with the functionality I need and deal with the need to port
to whatever 9.3 includes. Hopefully a "json_value" or "javascript_value"
or similar can be introduced for 9.3, given comments like:

http://archives.postgresql.org/pgsql-hackers/2012-05/msg00030.php
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00065.php

.. expressing not only the need for json scalars, but the fact that
they're already commonplace in pretty much everything else.

Given this:

http://archives.postgresql.org/pgsql-hackers/2012-05/msg00040.php

it does seem that `json' should be a whole document not a fragment, but
IMO a way to work with individual JSON values is going to be *necessary*
to get the most out of the json support - and to stop people who use
JSON in the real world complaining that Pg's JSON support is broken
because it follows the standard not real-world practice.

Personally the lack of json scalars has prevented me from using JSON
support in two different places already, though it's proving very useful
in many others.

--
Craig Ringer