json accessors

Started by Andrew Dunstanover 13 years ago40 messageshackers
Jump to latest
#1Andrew Dunstan
andrew@dunslane.net

This is a proposal to create some basic functions to extract values from
json. The simple functions I envision would be:

* json_object_keys(json) => setof text
returns the set of dequoted, unescaped keys of the object,
errors if it's not an object
* json_get(json, keytext) => json
returns the json value corresponding to the key text in the json object,
null if not found, error if it's not an object
* json_get(json, indexint) => json
returns the json value of the indexth element in the json array,
null of the index is outside the array bounds, errors if it's not an
array
* json_get_as_text(json, keytext or indexint) => text
same as json_get() except that it returns dequoted, unescaped text
for a quoted leaf field

I also propose to map the json_get functions to the operator '->' and
json_get_as_text to '->>', so that given x has this json value:

{"a":[{"b":"c","d":"e"},{"f":true,"g":1}]}

the expression x->'a'->0->>'d' will yield 'e', x->'a'->0->'f' will yield
'true' and x->'a'->0 will yield '{"b":"c","d":"e"}'. The operators would
make using these a whole lot nicer :-)

Various people have suggested putting json_path or something similar
into the core. I'm not sure we want to do that, partly because there are
several competing entries in this field, and partly because I don't want
to get into the business of evaluating json predicate tests, which I
think any tolerably complete gadget would need to do.

Regarding implementation, the way I propose to do this is to modify the
json parser a bit to turn it into a recursive descent parser, with hooks
for various operations. NULL hooks would leave us with the validating
parser we have now with no side effects. The hook functions themselves
will be very small. This would also allow us to do other things very
simply at a later stage, for example a json to xml transformation
function would be very easy to construct using this infrastructure, and
without disturbing any existing functionality.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#1)
Re: json accessors

On Wed, Nov 28, 2012 at 11:04 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

This is a proposal to create some basic functions to extract values from
json. The simple functions I envision would be:

* json_object_keys(json) => setof text
returns the set of dequoted, unescaped keys of the object,
errors if it's not an object
* json_get(json, keytext) => json
returns the json value corresponding to the key text in the json object,
null if not found, error if it's not an object
* json_get(json, indexint) => json
returns the json value of the indexth element in the json array,
null of the index is outside the array bounds, errors if it's not an
array
* json_get_as_text(json, keytext or indexint) => text
same as json_get() except that it returns dequoted, unescaped text
for a quoted leaf field

Comments (this is awesome btw):

*) ISTM your keytext operators are a reasonable replacement for a
hypothetical json_path. That said you're basically forcing json->sql
mapping through a highly iterative API, which I don't like. At the
very least, I think json_get should return setof json and return all
matching constructions. I won't miss predicate tests: we can do all
that in SQL.

Non-trivial json productions in postgres require the creation of
special composite types that structure the data that we (I?) rig up in
SQL before routing to json. What about having functions that work in
the opposite direction:

*) can you access both arrays and records with numeric positional
syntax (hopefully, yes?), for example:

x->0->0

*) json_object_keys(json) seems to special case to me. how about:

json_each(json) which returns a set of key/value pairs and would on
arrays or objects (for arrays the key could be invented from the
index).

*) json_get_as_text(json, keytext or indexint) => text

prefer json_to_text() naming. also json_to_hstore(), etc.

*) have you considered something like
anyelement from_json(anyelement, json)
or
select <json>::some_type; (this may or many not be possible given our
casting mechanics; i don't know).

My reasoning here is that for non-trivial json productions we (I?)
typically use composite types to rigidly control the structure of the
output document. For 'restful' type protocols I might want to use the
same trick: there would be a set of nested composite type/arrays (or
even, in trivial cases, a table) that would cleanly map to the
document. The parsing here can and should be automatic; this would
give nice symmetry with your xxx_to_json functions. Obviously
conversion here would be best effort but when it works, it would be
wonderful:

WITH json_data AS
(
SELECT from_json(null::foo[], <input_doc>)
)
i1 as (INSERT INTO bar SELECT ... FROM json_data)
i2 as (INSERT INTO baz SELECT ... FROM json_data)

where "..." would be some combination of unnest() and composite type
access syntax.

Now, some documents in json won't translate cleanly to composite types
because json allows for heterogeneous arrays. But if we're in control
of both sides of the protocol that shouldn't matter.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#2)
Re: json accessors

On 11/28/2012 02:08 PM, Merlin Moncure wrote:

On Wed, Nov 28, 2012 at 11:04 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

This is a proposal to create some basic functions to extract values from
json. The simple functions I envision would be:

* json_object_keys(json) => setof text
returns the set of dequoted, unescaped keys of the object,
errors if it's not an object
* json_get(json, keytext) => json
returns the json value corresponding to the key text in the json object,
null if not found, error if it's not an object
* json_get(json, indexint) => json
returns the json value of the indexth element in the json array,
null of the index is outside the array bounds, errors if it's not an
array
* json_get_as_text(json, keytext or indexint) => text
same as json_get() except that it returns dequoted, unescaped text
for a quoted leaf field

Comments (this is awesome btw):

Thanks for the input.

*) ISTM your keytext operators are a reasonable replacement for a
hypothetical json_path. That said you're basically forcing json->sql
mapping through a highly iterative API, which I don't like. At the
very least, I think json_get should return setof json and return all
matching constructions. I won't miss predicate tests: we can do all
that in SQL.

Yes, it's iterative. And for deeply nested json it might be somewhat
inefficient, although the parser is pretty fast AFAICT. But it's a start.

Non-trivial json productions in postgres require the creation of
special composite types that structure the data that we (I?) rig up in
SQL before routing to json. What about having functions that work in
the opposite direction:

*) can you access both arrays and records with numeric positional
syntax (hopefully, yes?), for example:

x->0->0

You can't do that in JS, so I'm not clear why we should allow it.

*) json_object_keys(json) seems to special case to me. how about:

json_each(json) which returns a set of key/value pairs and would on
arrays or objects (for arrays the key could be invented from the
index).

Again, I don't think we should conflate the processing for arrays and
objects. But I could see doing each(json) => setof (text, json) (and
maybe a similar function returning setof (text, text), which would
dequote leaf nodes as json_get_as_text() does).

And similarly a couple of functions to unnest arrays.

*) json_get_as_text(json, keytext or indexint) => text

prefer json_to_text() naming. also json_to_hstore(), etc.

json_to_text seems rather misleading as a name here. Maybe we could
remove the "_as" from the name if that's bothering you.

As for json_to_hstore, as I mentioned, the design is intended to enable
the easy constructyion of such transformations, although for hstores
anything except trivial json structure (i.e. an unnested object) it
might have unappealing results. But in any case, the important thing to
do first is to get the infrastructure in place. Time is very short and I
don't want to extend this very much.

*) have you considered something like
anyelement from_json(anyelement, json)
or
select <json>::some_type; (this may or many not be possible given our
casting mechanics; i don't know).

I have no idea what the semantics of this would be.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#3)
Re: json accessors

On 11/28/2012 03:44 PM, Andrew Dunstan wrote:

As for json_to_hstore, as I mentioned, the design is intended to
enable the easy constructyion of such transformations, although for
hstores anything except trivial json structure (i.e. an unnested
object) it might have unappealing results. But in any case, the
important thing to do first is to get the infrastructure in place.
Time is very short and I don't want to extend this very much.

The other thing about doing json_to_hstore() is that, since hstore is
not itself a core type, we couldn't do that in the core json module, and
therefore we'd either need to expose an API to the JSON parser or
replicate it in the hstore module. Exposing it is probably the better
way to go. Then people could write extensions that process json just by
supplying the hooked functions.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#3)
Re: json accessors

On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/28/2012 02:08 PM, Merlin Moncure wrote:

*) ISTM your keytext operators are a reasonable replacement for a
hypothetical json_path. That said you're basically forcing json->sql
mapping through a highly iterative API, which I don't like. At the
very least, I think json_get should return setof json and return all
matching constructions. I won't miss predicate tests: we can do all
that in SQL.

Yes, it's iterative. And for deeply nested json it might be somewhat
inefficient, although the parser is pretty fast AFAICT. But it's a start.

not completely buying that: see comments below. not supporting xpath
style decompositions seems wrong to me. IOW, json_get should be set
returning (perhaps via wild cards in the keytext) or we need
json_each.

Non-trivial json productions in postgres require the creation of
special composite types that structure the data that we (I?) rig up in
SQL before routing to json. What about having functions that work in
the opposite direction:

*) can you access both arrays and records with numeric positional
syntax (hopefully, yes?), for example:

x->0->0

You can't do that in JS, so I'm not clear why we should allow it.

agreed -- withdrawn.

*) json_object_keys(json) seems to special case to me. how about:

json_each(json) which returns a set of key/value pairs and would on
arrays or objects (for arrays the key could be invented from the
index).

Again, I don't think we should conflate the processing for arrays and
objects. But I could see doing each(json) => setof (text, json) (and maybe a
similar function returning setof (text, text), which would dequote leaf
nodes as json_get_as_text() does).

And similarly a couple of functions to unnest arrays.

Yeah. Although, I *do* think you need 'json_each' (or a set returning
json_get) and they should be conflated...exactly as jquery does:
http://api.jquery.com/jQuery.each/. json objects are associative
arrays, right?

So if the *value* that gets returned by json_each is itself a
collection, we can cast back to json and recurse. at the very least,
we ought to decompose large documents into arbitrary smaller chunks
(as xpath does) without iterating.

In most of the code I'd write, I would decompose to a json object
using your stuff then route to something like:

insert into foo select (r).* from populate_record(null::foo,
json_to_hstore(x)) r
from json_each('path->to->record_containg_array', json_document');

assuming the json was deliberately constructed to mashall cleanly into
the database, which is perfectly reasonable.

*) json_get_as_text(json, keytext or indexint) => text

prefer json_to_text() naming. also json_to_hstore(), etc.

json_to_text seems rather misleading as a name here. Maybe we could remove
the "_as" from the name if that's bothering you.

hm, I think you're right here -- I see the distinction.

As for json_to_hstore, as I mentioned, the design is intended to enable the
easy constructyion of such transformations, although for hstores anything
except trivial json structure (i.e. an unnested object) it might have
unappealing results. But in any case, the important thing to do first is to
get the infrastructure in place. Time is very short and I don't want to
extend this very much.

yeah, understood.

*) have you considered something like
anyelement from_json(anyelement, json)
or
select <json>::some_type; (this may or many not be possible given our
casting mechanics; i don't know).

I have no idea what the semantics of this would be.

Yeah, there's a lot of nuance there. Don't have to tackle everything
at once I suppose, but spiritually I'm hoping it would serve as
replacement for textual record_in, array_in, etc. It's just wrong to
have to specify each and every field in during parsing when the
receiving structure is well defined in the database.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Hannu Krosing
hannu@tm.ee
In reply to: Merlin Moncure (#5)
Re: json accessors

On 11/29/2012 01:10 AM, Merlin Moncure wrote:

On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

...

*) have you considered something like
anyelement from_json(anyelement, json)
or
select <json>::some_type; (this may or many not be possible given our
casting mechanics; i don't know).

I have no idea what the semantics of this would be.

Yeah, there's a lot of nuance there.

One way to tackle it would give the argument element as a template
and the result will the same template filled in from json filled

create table tab1(id serial primary key, ts timestamp default now(),
data text);

insert into tab1 select from_json(row(null,null,null)::tab1,
'{"data":"the data"}');
insert into tab1 select from_json(row(null,null,null)::tab1, '{"id":-1,
"ts":null, "data":""}');
insert into tab1 select from_json(t.*,'{"data":"more data"}') from tab1
t where id = -1;

hannu=# select row_to_json(t.*) from tab1 t;
row_to_json
---------------------------------------------------------------
{"id":1,"ts":"2012-11-29 02:01:48.379172","data":"the data"}
{"id":-1,"ts":null, "data":""}
{"id":2,"ts":"2012-11-29 02:02:34.600164","data":"more data"}
(3 rows)

if extracting the defaults from table def proves too tricky for first
iteration, then
just set the missing fields to NULL or even better, carry over the
values from template;

------------------------------
Hannu

PS: good work so far :)

Hannu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Hannu Krosing
hannu@tm.ee
In reply to: Hannu Krosing (#6)
Re: json accessors

On 11/29/2012 02:07 AM, Hannu Krosing wrote:

On 11/29/2012 01:10 AM, Merlin Moncure wrote:

On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:

...

*) have you considered something like
anyelement from_json(anyelement, json)
or
select <json>::some_type; (this may or many not be possible given our
casting mechanics; i don't know).

I have no idea what the semantics of this would be.

Yeah, there's a lot of nuance there.

One way to tackle it would give the argument element as a template
and the result will the same template filled in from json filled

create table tab1(id serial primary key, ts timestamp default now(),
data text);

insert into tab1 select from_json(row(null,null,null)::tab1,
'{"data":"the data"}');
insert into tab1 select from_json(row(null,null,null)::tab1,
'{"id":-1, "ts":null, "data":""}');
insert into tab1 select from_json(t.*,'{"data":"more data"}') from
tab1 t where id = -1;

hannu=# select row_to_json(t.*) from tab1 t;
row_to_json
---------------------------------------------------------------
{"id":1,"ts":"2012-11-29 02:01:48.379172","data":"the data"}
{"id":-1,"ts":null, "data":""}
{"id":2,"ts":"2012-11-29 02:02:34.600164","data":"more data"}
(3 rows)

if extracting the defaults from table def proves too tricky for first
iteration, then
just set the missing fields to NULL or even better, carry over the
values from template;

You could even do a template-less row_from_json which returns a records
with all fields converted to
the JSON-encodable types and hope that the next conversions will be done
by postgreSQL as needed.

insert into tab1 select row_from_json('{"id":100, "ts":"2012-12-21",
"data":"End of Everything"}');

insert into tab1
select * from row_from_json(
'[{"id":101, "ts":"2012-12-22", "data":"1st day after End of Everything"}
{"id":102, "ts":"2012-12-22", "data":"2nd day after End of Everything"}
]');

Hannu

------------------------------
Hannu

PS: good work so far :)

Hannu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#7)
Re: json accessors

On 11/28/2012 08:16 PM, Hannu Krosing wrote:

On 11/29/2012 02:07 AM, Hannu Krosing wrote:

On 11/29/2012 01:10 AM, Merlin Moncure wrote:

On Wed, Nov 28, 2012 at 2:44 PM, Andrew Dunstan
<andrew@dunslane.net> wrote:

...

*) have you considered something like
anyelement from_json(anyelement, json)
or
select <json>::some_type; (this may or many not be possible given
our
casting mechanics; i don't know).

I have no idea what the semantics of this would be.

Yeah, there's a lot of nuance there.

One way to tackle it would give the argument element as a template
and the result will the same template filled in from json filled

create table tab1(id serial primary key, ts timestamp default now(),
data text);

insert into tab1 select from_json(row(null,null,null)::tab1,
'{"data":"the data"}');
insert into tab1 select from_json(row(null,null,null)::tab1,
'{"id":-1, "ts":null, "data":""}');
insert into tab1 select from_json(t.*,'{"data":"more data"}') from
tab1 t where id = -1;

hannu=# select row_to_json(t.*) from tab1 t;
row_to_json
---------------------------------------------------------------
{"id":1,"ts":"2012-11-29 02:01:48.379172","data":"the data"}
{"id":-1,"ts":null, "data":""}
{"id":2,"ts":"2012-11-29 02:02:34.600164","data":"more data"}
(3 rows)

if extracting the defaults from table def proves too tricky for first
iteration, then
just set the missing fields to NULL or even better, carry over the
values from template;

You could even do a template-less row_from_json which returns a
records with all fields converted to
the JSON-encodable types and hope that the next conversions will be
done by postgreSQL as needed.

insert into tab1 select row_from_json('{"id":100, "ts":"2012-12-21",
"data":"End of Everything"}');

insert into tab1
select * from row_from_json(
'[{"id":101, "ts":"2012-12-22", "data":"1st day after End of Everything"}
{"id":102, "ts":"2012-12-22", "data":"2nd day after End of Everything"}
]');

The real problem here is that for any irregularly shaped json it's
likely to be a bust, and could only possibly work sanely for nested json
at all if the target type had corresponding array and composite fields.
hstore's populate_record works fairly well precisely because hstore is a
flat structure, unlike json.

In any case, I think this sort of suggestion highlights the possible
benefits of what I suggested upthread, namely to expose an API that will
allow easy construction of json transformation functions as extensions.

PS: good work so far :)

Hannu

Thanks.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#8)
Re: json accessors

On Thu, Nov 29, 2012 at 7:58 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/28/2012 08:16 PM, Hannu Krosing wrote:

You could even do a template-less row_from_json which returns a records
with all fields converted to
the JSON-encodable types and hope that the next conversions will be done
by postgreSQL as needed.

insert into tab1 select row_from_json('{"id":100, "ts":"2012-12-21",
"data":"End of Everything"}');

insert into tab1
select * from row_from_json(
'[{"id":101, "ts":"2012-12-22", "data":"1st day after End of Everything"}
{"id":102, "ts":"2012-12-22", "data":"2nd day after End of Everything"}
]');

The real problem here is that for any irregularly shaped json it's likely to
be a bust, and could only possibly work sanely for nested json at all if the
target type had corresponding array and composite fields.

again, that's pretty a fairly typical case -- crafting json documents
specifically for consumption in postgres. defining backend types
allows you to skip intermediate iterative marshaling step.

hstore's
populate_record works fairly well precisely because hstore is a flat
structure, unlike json.

agreed. not trying to drag you into the weeds here. the above is neat
functionality but doesn't cover all the cases so specific accessor
functions in the vein of your proposal are still needed and the hstore
workaround should work pretty well -- sugaring up the syntax for 'all
in wonder' type translations of complicated structures can be done
later if you want to keep things simple in the short term.

so, just hashing out your proposal and making sure it's reasonable
analogous implementation of xpath. Sleeping on it, I say mostly, but
not quite. how about some changes for json_get:

1) return setof (key, value) in the style of jquery each().
2) we need some way of indicating in the keytext path that we want to
unnest the collecton pointed to by keytext or to just return it. for
example, ->* as indicator?
3) use double quotes, and make them optional (as hstore)
4) speaking of hstore, prefer => vs ->?

if you do at least #1 and #2, json_get I think can cover all the bases
for parsing json, meaning you could reproduce the behaviors for each
of your four proposed just as xpath does for xml. (you may still
want to add them for posterity or performance though). so no need for
json_each or json_array_unnest etc.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#9)
Re: json accessors

On 11/29/2012 01:06 PM, Merlin Moncure wrote:

so, just hashing out your proposal and making sure it's reasonable
analogous implementation of xpath. Sleeping on it, I say mostly, but
not quite. how about some changes for json_get:

1) return setof (key, value) in the style of jquery each().
2) we need some way of indicating in the keytext path that we want to
unnest the collecton pointed to by keytext or to just return it. for
example, ->* as indicator?
3) use double quotes, and make them optional (as hstore)
4) speaking of hstore, prefer => vs ->?So I don't think your modifications are well thought out.

if you do at least #1 and #2, json_get I think can cover all the bases
for parsing json, meaning you could reproduce the behaviors for each
of your four proposed just as xpath does for xml. (you may still
want to add them for posterity or performance though). so no need for
json_each or json_array_unnest etc.

json_get is designed to return a single thing. What is more, returning a
(key, value) pair seems quite silly when you're passing the key in as an
argument. It's not designed to be json_path or json_query, and it's not
designed either to take a path expression as an argument. So I don't
think this is a good direction. Your proposed mods to json_get modify it
out of all recognition. If I offer you a horse and ask what colour you'd
like, asking for a lion instead isn't a good response :-)

(Repeating myself), I also suggest exposing the transform API so that it
will be easy to construct further functions as extensions. I'm not
trying to cover the field. The intention here is to provide some very
basic json accessors as core functions / operators.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#10)
Re: json accessors

On Thu, Nov 29, 2012 at 1:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/29/2012 01:06 PM, Merlin Moncure wrote:

so, just hashing out your proposal and making sure it's reasonable
analogous implementation of xpath. Sleeping on it, I say mostly, but
not quite. how about some changes for json_get:

1) return setof (key, value) in the style of jquery each().
2) we need some way of indicating in the keytext path that we want to
unnest the collecton pointed to by keytext or to just return it. for
example, ->* as indicator?
3) use double quotes, and make them optional (as hstore)
4) speaking of hstore, prefer => vs ->?So I don't think your modifications
are well thought out.

if you do at least #1 and #2, json_get I think can cover all the bases
for parsing json, meaning you could reproduce the behaviors for each
of your four proposed just as xpath does for xml. (you may still
want to add them for posterity or performance though). so no need for
json_each or json_array_unnest etc.

json_get is designed to return a single thing. What is more, returning a
(key, value) pair seems quite silly when you're passing the key in as an
argument. It's not designed to be json_path or json_query, and it's not
designed either to take a path expression as an argument. So I don't think
this is a good direction. Your proposed mods to json_get modify it out of
all recognition. If I offer you a horse and ask what colour you'd like,
asking for a lion instead isn't a good response :-)

(Repeating myself), I also suggest exposing the transform API so that it
will be easy to construct further functions as extensions. I'm not trying to
cover the field. The intention here is to provide some very basic json
accessors as core functions / operators.

Right. But you're not offering a horse to the farm...but to the zoo.
json is in core so I don't think you have the luxury of offering a
clunky API now withe expectation of a sleeker, faster one in the
future as the old functions will sit around forever in the public
namespace. What is present in the API doesn't have to cover all
reasonable use cases but it certainly should be expected withstand the
test of time for the cases it does cover.

Sketch out how a object array of indeterminate size would be parsed
and placed into records with a set returning/array returning and
non-set returning json_get: which is a better fit? xpath() doesn't
work iteratively and nobody has ever complained about that to my
recollection.

table: create table foo (a int, b int);
document: [{"a": 1, "b": 2}, {"a": 3, "b": 4}, ... {"a": 99999, "b": 100000}]

set returning json_get:
INSERT INTO foo
SELECT * FROM populate_record(null, hstore_to_json((json_get(*)).value));

assuming '*' is the 'expand this' operator in your 'keytext'
expression that I was suggestion. How would this work with your
proposed API? This is a very typical use case.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#11)
Re: json accessors

On 11/29/2012 04:52 PM, Merlin Moncure wrote:

On Thu, Nov 29, 2012 at 1:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/29/2012 01:06 PM, Merlin Moncure wrote:

so, just hashing out your proposal and making sure it's reasonable
analogous implementation of xpath. Sleeping on it, I say mostly, but
not quite. how about some changes for json_get:

1) return setof (key, value) in the style of jquery each().
2) we need some way of indicating in the keytext path that we want to
unnest the collecton pointed to by keytext or to just return it. for
example, ->* as indicator?
3) use double quotes, and make them optional (as hstore)
4) speaking of hstore, prefer => vs ->?So I don't think your modifications
are well thought out.

if you do at least #1 and #2, json_get I think can cover all the bases
for parsing json, meaning you could reproduce the behaviors for each
of your four proposed just as xpath does for xml. (you may still
want to add them for posterity or performance though). so no need for
json_each or json_array_unnest etc.

json_get is designed to return a single thing. What is more, returning a
(key, value) pair seems quite silly when you're passing the key in as an
argument. It's not designed to be json_path or json_query, and it's not
designed either to take a path expression as an argument. So I don't think
this is a good direction. Your proposed mods to json_get modify it out of
all recognition. If I offer you a horse and ask what colour you'd like,
asking for a lion instead isn't a good response :-)

(Repeating myself), I also suggest exposing the transform API so that it
will be easy to construct further functions as extensions. I'm not trying to
cover the field. The intention here is to provide some very basic json
accessors as core functions / operators.

Right. But you're not offering a horse to the farm...but to the zoo.
json is in core so I don't think you have the luxury of offering a
clunky API now withe expectation of a sleeker, faster one in the
future as the old functions will sit around forever in the public
namespace. What is present in the API doesn't have to cover all
reasonable use cases but it certainly should be expected withstand the
test of time for the cases it does cover.

Sketch out how a object array of indeterminate size would be parsed
and placed into records with a set returning/array returning and
non-set returning json_get: which is a better fit? xpath() doesn't
work iteratively and nobody has ever complained about that to my
recollection.

table: create table foo (a int, b int);
document: [{"a": 1, "b": 2}, {"a": 3, "b": 4}, ... {"a": 99999, "b": 100000}]

set returning json_get:
INSERT INTO foo
SELECT * FROM populate_record(null, hstore_to_json((json_get(*)).value));

assuming '*' is the 'expand this' operator in your 'keytext'
expression that I was suggestion. How would this work with your
proposed API? This is a very typical use case.

There are many things wrong with this. First, converting to hstore so
you can call populate_record is quite horrible and ugly and inefficient.
And it's dependent on having hstore loaded - you can't have an
hstore_to_jon in core because hstore itself isn't in core. If you want a
populate_record that takes data from json we should have one coded
direct. I'm happy to add it to the list as long as everyone understands
the limitations. Given a function to unnest the json array, which I
already suggested upthread, you could do what you suggested above much
more elegantly and directly. Also, BTW, you can't use * that way.

We are not replicating xpath here for json. Sorry, but that's not my
goal. If you want to code that up, be my guest.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#12)
Re: json accessors

On Thu, Nov 29, 2012 at 4:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

There are many things wrong with this. First, converting to hstore so you
can call populate_record is quite horrible and ugly and inefficient. And
it's dependent on having hstore loaded - you can't have an hstore_to_jon in
core because hstore itself isn't in core. If you want a populate_record that
takes data from json we should have one coded direct. I'm happy to add it to
the list as long as everyone understands the limitations. Given a function
to unnest the json array, which I already suggested upthread, you could do
what you suggested above much more elegantly and directly.

I wasn't suggesting you added the hstore stuff and I understand
perfectly well the awkwardness of the hstore route. That said, this
is how people are going to use your api so it doesn't hurt to go
through the motions; I'm just feeling out how code in the wild would
shape up.

Anyways, my example was busted since you'd need an extra step to move
the set returning output from the json array unnest() into a
'populate_record' type function call.

So, AIUI I think you're proposing (i'm assuming optional quotes)
following my example above:

INSERT INTO foo(a,b)
SELECT
json_get_as_text(v, 'a')::int,
json_get_as_text(v, 'b')::int
FROM
json_each(<document>) v; /* gives you array of json (a,b) records */

a hypothetical 'json_to_record (cribbing usage from populate_record)'
variant might look like (please note, I'm not saying 'write this now',
just feeling it out)::

INSERT INTO foo(a,b)
SELECT r.*
FROM
json_each(<document>) v,
LATERAL
json_to_record(null::foo, v) r;

you're right: that's pretty clean.

An json_object_each(json), => key, value couldn't hurt either -- this
would handle those oddball cases of really wide objects that you
occasionally see in json. Plus as_text variants of both each and
object_each. If you're buying json_object_each, I think you can scrap
json_object_keys().

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#13)
Re: json accessors

On 11/29/2012 06:34 PM, Merlin Moncure wrote:

On Thu, Nov 29, 2012 at 4:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

There are many things wrong with this. First, converting to hstore so you
can call populate_record is quite horrible and ugly and inefficient. And
it's dependent on having hstore loaded - you can't have an hstore_to_jon in
core because hstore itself isn't in core. If you want a populate_record that
takes data from json we should have one coded direct. I'm happy to add it to
the list as long as everyone understands the limitations. Given a function
to unnest the json array, which I already suggested upthread, you could do
what you suggested above much more elegantly and directly.

I wasn't suggesting you added the hstore stuff and I understand
perfectly well the awkwardness of the hstore route. That said, this
is how people are going to use your api so it doesn't hurt to go
through the motions; I'm just feeling out how code in the wild would
shape up.

Anyways, my example was busted since you'd need an extra step to move
the set returning output from the json array unnest() into a
'populate_record' type function call.

So, AIUI I think you're proposing (i'm assuming optional quotes)
following my example above:

INSERT INTO foo(a,b)
SELECT
json_get_as_text(v, 'a')::int,
json_get_as_text(v, 'b')::int
FROM
json_each(<document>) v; /* gives you array of json (a,b) records */

a hypothetical 'json_to_record (cribbing usage from populate_record)'
variant might look like (please note, I'm not saying 'write this now',
just feeling it out)::

INSERT INTO foo(a,b)
SELECT r.*
FROM
json_each(<document>) v,
LATERAL
json_to_record(null::foo, v) r;

you're right: that's pretty clean.

An json_object_each(json), => key, value couldn't hurt either -- this
would handle those oddball cases of really wide objects that you
occasionally see in json. Plus as_text variants of both each and
object_each. If you're buying json_object_each, I think you can scrap
json_object_keys().

OK, so based on this discussion, I'm thinking of the following:

* keep the original functions and operators. json_keys is still
required for the case where the json is not flat.
* json_each(json) => setof (text, text)
errors if the json is not a flat object
* json_unnest(json) => setof json
errors if the json is not an array
* json_unnest_each => setof (int, text, text)
errors if the array is not an array of flat objects
* populate_record(record, json) => record
errors if the json isn't a flat object
* populate_recordset(record, json) => setof record
errors if the json is not an array of flat objects

Note that I've added a couple of things to deal with json that
represents a recordset (i.e. an array of objects). This is a very common
pattern and one well worth optimizing for.

I think that would let you do a lot of what you want pretty cleanly.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#14)
Re: json accessors

On Fri, Nov 30, 2012 at 8:38 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

OK, so based on this discussion, I'm thinking of the following:

ok, this is looking awesome -- couple naming suggestions (see inline):

* keep the original functions and operators. json_keys is still
required for the case where the json is not flat.
* json_each(json) => setof (text, text)
errors if the json is not a flat object
* json_unnest(json) => setof json
errors if the json is not an array

I wonder if usage of 'unnest' is appropriate: sql unnest()
*completely* unwraps the array to a list of scalars where as json
unnest() only peels of one level. If you agree with that (it's
debatable), how about json_array_each()?

* json_unnest_each => setof (int, text, text)
errors if the array is not an array of flat objects

I like this. Maybe json_object_each() if you agree with my analysis above.

* populate_record(record, json) => record
errors if the json isn't a flat object
* populate_recordset(record, json) => setof record
errors if the json is not an array of flat objects

Two questions:
1) is it possible for these to work without a polymorphic object
passed through as hstore does (null::foo)?
select populate_record(anyelement, record, json)

2) in keeping with naming style of json api, how about json_to_record,
json_to_recordset?
Maybe though keeping similarity with hstore convention is more important.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#15)
Re: json accessors

On 11/30/2012 09:51 AM, Merlin Moncure wrote:

Two questions:
1) is it possible for these to work without a polymorphic object
passed through as hstore does (null::foo)?
select populate_record(anyelement, record, json)

I don't understand the question. The API I'm suggesting is exactly in
line with hstore's, which uses a polymorphic parameter. I don't see how
it can not, and I don't understand why you would have 3 parameters.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#16)
Re: json accessors

On Fri, Nov 30, 2012 at 9:02 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/30/2012 09:51 AM, Merlin Moncure wrote:

Two questions:
1) is it possible for these to work without a polymorphic object
passed through as hstore does (null::foo)?
select populate_record(anyelement, record, json)

I don't understand the question. The API I'm suggesting is exactly in line
with hstore's, which uses a polymorphic parameter. I don't see how it can
not, and I don't understand why you would have 3 parameters.

my mistake: I misread the function as you write it. it's good as is.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#14)
Re: json accessors

On 11/30/2012 03:38 PM, Andrew Dunstan wrote:

On 11/29/2012 06:34 PM, Merlin Moncure wrote:

On Thu, Nov 29, 2012 at 4:14 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:

There are many things wrong with this. First, converting to hstore
so you
can call populate_record is quite horrible and ugly and inefficient.
And
it's dependent on having hstore loaded - you can't have an
hstore_to_jon in
core because hstore itself isn't in core. If you want a
populate_record that
takes data from json we should have one coded direct. I'm happy to
add it to
the list as long as everyone understands the limitations. Given a
function
to unnest the json array, which I already suggested upthread, you
could do
what you suggested above much more elegantly and directly.

I wasn't suggesting you added the hstore stuff and I understand
perfectly well the awkwardness of the hstore route. That said, this
is how people are going to use your api so it doesn't hurt to go
through the motions; I'm just feeling out how code in the wild would
shape up.

Anyways, my example was busted since you'd need an extra step to move
the set returning output from the json array unnest() into a
'populate_record' type function call.

So, AIUI I think you're proposing (i'm assuming optional quotes)
following my example above:

INSERT INTO foo(a,b)
SELECT
json_get_as_text(v, 'a')::int,
json_get_as_text(v, 'b')::int
FROM
json_each(<document>) v; /* gives you array of json (a,b)
records */

a hypothetical 'json_to_record (cribbing usage from populate_record)'
variant might look like (please note, I'm not saying 'write this now',
just feeling it out)::

INSERT INTO foo(a,b)
SELECT r.*
FROM
json_each(<document>) v,
LATERAL
json_to_record(null::foo, v) r;

you're right: that's pretty clean.

An json_object_each(json), => key, value couldn't hurt either -- this
would handle those oddball cases of really wide objects that you
occasionally see in json. Plus as_text variants of both each and
object_each. If you're buying json_object_each, I think you can scrap
json_object_keys().

OK, so based on this discussion, I'm thinking of the following:

* keep the original functions and operators. json_keys is still
required for the case where the json is not flat.
* json_each(json) => setof (text, text)
errors if the json is not a flat object

Why not json_each(json) => setof (text, json) ? with no erroring out ?

if the json does represent text it is easy to convert to text on the
query side.

* json_unnest(json) => setof json
errors if the json is not an array
* json_unnest_each => setof (int, text, text)
errors if the array is not an array of flat objects

json_unnest_each => setof (int, text, json)

* populate_record(record, json) => record
errors if the json isn't a flat object

errors if the values are not castable to records field types

nb! some nonflatness is castable. especially to json or hstore or record
types

* populate_recordset(record, json) => setof record
errors if the json is not an array of flat objects

ditto

Note that I've added a couple of things to deal with json that
represents a recordset (i.e. an array of objects). This is a very
common pattern and one well worth optimizing for.

I think that would let you do a lot of what you want pretty cleanly.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#18)
Re: json accessors

On 11/30/2012 10:04 AM, Hannu Krosing wrote:

OK, so based on this discussion, I'm thinking of the following:

* keep the original functions and operators. json_keys is still
required for the case where the json is not flat.
* json_each(json) => setof (text, text)
errors if the json is not a flat object

Why not json_each(json) => setof (text, json) ? with no erroring out ?

if the json does represent text it is easy to convert to text on the
query side.

Well, it would be possible, sure. I'm not sure how useful. Or we could
do both fairly easily. It's not as simple or efficient as you might
think to dequote / de-escape json string values, which is why the
original API had variants for returning both types of values. Maybe we
need a function for doing just that.

* json_unnest(json) => setof json
errors if the json is not an array
* json_unnest_each => setof (int, text, text)
errors if the array is not an array of flat objects

json_unnest_each => setof (int, text, json)

ditto.

* populate_record(record, json) => record
errors if the json isn't a flat object

errors if the values are not castable to records field types

nb! some nonflatness is castable. especially to json or hstore or
record types

If the record has a json field, certainly. If it has a record field,
fairly likely. hstore could probably be a problem given it's not a core
type. Similarly to the generation functions discussed in another thread,
I could possibly look up a cast from json to the non-core type and use
it. That might work for hstore.

I'll try to keep this as permissive as possible.

* populate_recordset(record, json) => setof record
errors if the json is not an array of flat objects

ditto

ditto ;-)

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#19)
Re: json accessors

On 11/30/2012 04:29 PM, Andrew Dunstan wrote:

On 11/30/2012 10:04 AM, Hannu Krosing wrote:

OK, so based on this discussion, I'm thinking of the following:

* keep the original functions and operators. json_keys is still
required for the case where the json is not flat.
* json_each(json) => setof (text, text)
errors if the json is not a flat object

Why not json_each(json) => setof (text, json) ? with no erroring out ?

if the json does represent text it is easy to convert to text on the
query side.

Well, it would be possible, sure. I'm not sure how useful. Or we could
do both fairly easily. It's not as simple or efficient as you might
think to dequote / de-escape json string values, which is why the
original API had variants for returning both types of values. Maybe we
need a function for doing just that.

Btw, how does current json type handle code pages - is json always utf-8
even when server encoding is not ?

if so then we could at least have a shortcut conversion of json to
utf8-text which can skip codepage changes.

----------------------
Hannu

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#20)
#22Josh Berkus
josh@agliodbs.com
In reply to: Merlin Moncure (#15)
#23Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#22)
#24Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#23)
#25David E. Wheeler
david@kineticode.com
In reply to: Josh Berkus (#24)
#26David E. Wheeler
david@kineticode.com
In reply to: Merlin Moncure (#5)
#27Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#25)
#29David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#27)
#30Merlin Moncure
mmoncure@gmail.com
In reply to: David E. Wheeler (#26)
#31Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#29)
#32David E. Wheeler
david@kineticode.com
In reply to: Merlin Moncure (#30)
#33David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#31)
#34Josh Berkus
josh@agliodbs.com
In reply to: David E. Wheeler (#32)
#35Merlin Moncure
mmoncure@gmail.com
In reply to: Josh Berkus (#34)
#36Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#34)
#37Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#34)
#38Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#33)
#39Merlin Moncure
mmoncure@gmail.com
In reply to: David E. Wheeler (#32)
#40David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#38)