json api WIP patch
Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and
functions for json_get and friends, plus json_keys.
As is, this exposes the json lexer fully for use by the hook functions.
But I could easily be persuaded that this should be an opaque structure
with some constructor and getter functions - I don't think the hook
functions need or should be able to set anything in the lexer.
Work is proceeding on some of the more advanced functionality discussed.
cheers
andrew
Attachments:
jsonapi1.patchtext/x-patch; name=jsonapi1.patchDownload+1425-453
On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and functions
for json_get and friends, plus json_keys.As is, this exposes the json lexer fully for use by the hook functions. But
I could easily be persuaded that this should be an opaque structure with
some constructor and getter functions - I don't think the hook functions
need or should be able to set anything in the lexer.Work is proceeding on some of the more advanced functionality discussed.
This seems to contain a large number of spurious whitespace changes.
And maybe some other spurious changes. For example, I'm not sure why
this comment is truncated:
}
}
! /*
! * Check for trailing garbage. As in json_lex(), any alphanumeric stuff
! * here should be considered part of the token for error-reporting
! * purposes.
! */
for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++)
error = true;
lex->token_terminator = p;
if (error)
report_invalid_token(lex);
--- 730,739 ----
}
}
! /* Check for trailing garbage. */
for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++)
error = true;
+ lex->prev_token_terminator = lex->token_terminator;
lex->token_terminator = p;
if (error)
report_invalid_token(lex);
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/02/2013 04:45 PM, Robert Haas wrote:
On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and functions
for json_get and friends, plus json_keys.As is, this exposes the json lexer fully for use by the hook functions. But
I could easily be persuaded that this should be an opaque structure with
some constructor and getter functions - I don't think the hook functions
need or should be able to set anything in the lexer.Work is proceeding on some of the more advanced functionality discussed.
This seems to contain a large number of spurious whitespace changes.
I'm glad you're looking at it :-)
I did do a run of pgindent on the changed files before I cut the patch,
which might have made some of those changes.
I notice a couple of other infelicities too, which are undoubtedly my fault.
The original prototype of this was cut against some older code, and I
then tried to merge it with the current code base, and make sure that
all the regression tests passed. That might well have resulted in a
number of things that need review.
And maybe some other spurious changes. For example, I'm not sure why
this comment is truncated:
Another good question.
I'll make another pass over this and try to remove some of what's
annoying you.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
On 01/02/2013 04:45 PM, Robert Haas wrote:
On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and
functions
for json_get and friends, plus json_keys.
Udated patch that contains most of the functionality I'm after. One
piece left is populate_recordset (populate a set of records from a
single json datum which is an array of objects, in one pass). That
requires a bit of thought.
I hope most of the whitespace issues are fixed.
cheers
andrew
Attachments:
jsonapi2.patchtext/x-patch; name=jsonapi2.patchDownload+2787-435
Hello
2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
On 01/02/2013 04:45 PM, Robert Haas wrote:
On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and
functions
for json_get and friends, plus json_keys.Udated patch that contains most of the functionality I'm after. One piece
left is populate_recordset (populate a set of records from a single json
datum which is an array of objects, in one pass). That requires a bit of
thought.I hope most of the whitespace issues are fixed.
it is looking well
I have one note - is it "json_each" good name?
Regards
Pavel
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/04/2013 03:36 PM, Pavel Stehule wrote:
Hello
2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
On 01/02/2013 04:45 PM, Robert Haas wrote:
On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and
functions
for json_get and friends, plus json_keys.Udated patch that contains most of the functionality I'm after. One piece
left is populate_recordset (populate a set of records from a single json
datum which is an array of objects, in one pass). That requires a bit of
thought.I hope most of the whitespace issues are fixed.
it is looking well
I have one note - is it "json_each" good name?
Possibly not, although hstore has each(). json_unnest might be even less
felicitous.
I'm expecting a good deal of bikeshedding - I'm trying to ignore those
issues for the most part because the functionality seems much more
important to me than the names.
The more people that pound on it and try to break it the happier I'll be.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
On 01/04/2013 03:36 PM, Pavel Stehule wrote:
Hello
2013/1/4 Andrew Dunstan <andrew@dunslane.net>:
On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
On 01/02/2013 04:45 PM, Robert Haas wrote:
On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and
functions
for json_get and friends, plus json_keys.Udated patch that contains most of the functionality I'm after. One piece
left is populate_recordset (populate a set of records from a single json
datum which is an array of objects, in one pass). That requires a bit of
thought.I hope most of the whitespace issues are fixed.
it is looking well
I have one note - is it "json_each" good name?
Possibly not, although hstore has each(). json_unnest might be even less
felicitous.
I understand - but hstore isn't in core - so it should not be precedent
regexp_split_to_table
I am not native speaker, it sounds little bit strange - but maybe
because I am not native speaker :)
Regards
Pavel
I'm expecting a good deal of bikeshedding - I'm trying to ignore those
issues for the most part because the functionality seems much more important
to me than the names.The more people that pound on it and try to break it the happier I'll be.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I understand - but hstore isn't in core - so it should not be precedent
regexp_split_to_table
I am not native speaker, it sounds little bit strange - but maybe
because I am not native speaker :)
it's common usage: http://api.jquery.com/jQuery.each/
the patch looks fabulous. There are a few trivial whitespace issues
yet and I noticed a leaked hstore comment@ 2440:
+ /*
+ * if the input hstore is empty, we can only skip the rest if we were
+ * passed in a non-null record, since otherwise there may be issues with
+ * domain nulls.
+ */
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
2013/1/7 Merlin Moncure <mmoncure@gmail.com>:
On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
I understand - but hstore isn't in core - so it should not be precedent
regexp_split_to_table
I am not native speaker, it sounds little bit strange - but maybe
because I am not native speaker :)it's common usage: http://api.jquery.com/jQuery.each/
ook
Regards
Pavel
the patch looks fabulous. There are a few trivial whitespace issues yet and I noticed a leaked hstore comment@ 2440: + /* + * if the input hstore is empty, we can only skip the rest if we were + * passed in a non-null record, since otherwise there may be issues with + * domain nulls. + */merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/07/2013 10:25 AM, Merlin Moncure wrote:
the patch looks fabulous. There are a few trivial whitespace issues yet and I noticed a leaked hstore comment@ 2440: + /* + * if the input hstore is empty, we can only skip the rest if we were + * passed in a non-null record, since otherwise there may be issues with + * domain nulls. + */
Here is a patch that has all the functionality I'm intending to provide.
The API is improved some, with the parser now keeping track of the
nesting levels instead of callers having to do so, and a constructor
function provided for JsonLexContext objects.
The processing functions have been extended to provide populate_record()
and populate_recordset() functions.The latter in particular could be
useful in decomposing a piece of json representing an array of flat
objects (a fairly common pattern) into a set of Postgres records in a
single pass.
The main thing I'm going to concentrate on now is making sure that this
doesn't leak memory. I'm sure there's some tightening required in that
area. Any eyeballs there will be greatly appreciated. There are also a
couple of very minor things to clean up.
You (Merlin) have kindly volunteered to work on documentation, so before
we go too far with that any bikeshedding on names, or on the
functionality being provided, should now take place.
cheers
andrew
Attachments:
jsonapi3.patchtext/x-patch; name=jsonapi3.patchDownload+3112-425
The processing functions have been extended to provide populate_record() and populate_recordset() functions.The latter in particular could be useful in decomposing a piece of json representing an array of flat objects (a fairly common pattern) into a set of Postgres records in a single pass.
So this would allow an 'insert into ... select ... from
<unpack-the-JSON>(...)'?
I had been wondering how to do such an insertion efficiently in the
context of SPI, but it seems that there is no SPI_copy equiv that would
allow a query parse and plan to be avoided.
Is this mechanism likely to be as fast as we can get at the moment in
contexts where copy is not feasible?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/08/2013 01:45 AM, james wrote:
The processing functions have been extended to provide
populate_record() and populate_recordset() functions.The latter in
particular could be useful in decomposing a piece of json
representing an array of flat objects (a fairly common pattern) into
a set of Postgres records in a single pass.So this would allow an 'insert into ... select ... from
<unpack-the-JSON>(...)'?
Yes.
I had been wondering how to do such an insertion efficiently in the
context of SPI, but it seems that there is no SPI_copy equiv that
would allow a query parse and plan to be avoided.
Your query above would need to be planned too, although the plan will be
trivial.
Is this mechanism likely to be as fast as we can get at the moment in
contexts where copy is not feasible?
You should not try to use it as a general bulk load facility. And it
will not be as fast as COPY for several reasons, including that the Json
parsing routines are necessarily much heavier than the COPY parse
routines, which have in any case been optimized over quite a long
period. Also, a single json datum is limited to no more than 1Gb. If you
have such a datum, parsing it involves having it in memory and then
taking a copy (I wonder if we could avoid that step - will take a look).
Then each object is decomposed into a hash table of key value pairs,
which it then used to construct the record datum. Each field name in
the result record is used to look up the value in the hash table - this
happens once in the case of populate_record() and once per object in the
array in the case of populate_recordset(). In the latter case the
resulting records are put into a tuplestore structure (which spills to
disk if necessary) which is then returned to the caller when all the
objects in the json array are processed. COPY doesn't have these sorts
of issues. It knows without having to look things up where each datum is
in each record, and it stashes the result straight into the target
table. It can read and insert huge numbers of rows without significant
memory implications.
Both these routines and COPY in non-binary mode use the data type input
routines to convert text values. In some cases (very notably timestamps)
these routines can easily be shown to be fantastically expensive
compared to binary input. This is part of what has led to the creation
of utilities like pg_bulkload.
Perhaps if you give us a higher level view of what you're trying to
achieve we can help you better.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I had been wondering how to do such an insertion efficiently in the context of SPI, but it seems that there is no SPI_copy equiv that would allow a query parse and plan to be avoided.
Your query above would need to be planned too, although the plan will be trivial.
Ah yes, I meant that I had not found a way to avoid it (for multi-row
inserts etc) from a stored proc context where I have SPI functions
available.
You should not try to use it as a general bulk load facility. And it will not be as fast as COPY for several reasons, including that the Json parsing routines are necessarily much heavier than the COPY parse routines, which have in any case been optimized over quite a long period. Also, a single json datum is limited to no more than 1Gb. If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Then each object is decomposed into a hash table of key value pairs, which it then used to construct the record datum. Each field name in the result record is used to look up the value in the hash table - this happens once in the case of populate_record() and once per object in the array in the case of populate_recordset(). In the latter case the resulting records are put into a tuplestore structure (which spills to disk if necessary) which is then returned to the caller when all the objects in
the js
on array are processed. COPY doesn't have these sorts of issues. It knows without having to look things up where each datum is in each record, and it stashes the result straight into the target table. It can read and insert huge numbers of rows without significant memory implications.
Yes - but I don't think I can use COPY from a stored proc context can I?
If I could use binary COPY from a stored proc that has received a
binary param and unpacked to the data, it would be handy.
If SPI provided a way to perform a copy to a temp table and then some
callback on an iterator that yields rows to it, that would do the trick
I guess.
Perhaps if you give us a higher level view of what you're trying to achieve we can help you better.
I had been trying to identify a way to work with record sets where the
records might be used for insert, or for updates or deletion statements,
preferably without forming a large custom SQL statement that must then
be parsed and planned (and which would be a PITA if I wanted to use the
SQL-C preprocessor or some language bindings that like to prepare a
statement and execute with params).
The data I work with has a master-detail structure and insertion
performance matters, so I'm trying to limit manipulations to one
statement per table per logical operation even where there are multiple
detail rows.
Sometimes the network latency can be a pain too and that also suggests
an RPC with unpack and insert locally.
Cheers
James
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/08/2013 09:58 AM, Andrew Dunstan wrote:
If you have such a datum, parsing it involves having it in memory and
then taking a copy (I wonder if we could avoid that step - will take a
look).
Here is a Proof Of Concept patch against my development tip on what's
involved in getting the JSON lexer not to need a nul-terminated string
to parse. This passes regression, incidentally. The downside is that
processing is very slightly more complex, and that json_in() would need
to call strlen() on its input. The upside would be that the processing
routines I've been working on would no longer need to create copies of
their json arguments using text_to_cstring() just so they can get a
null-terminated string to process.
Consequent changes would modify the signature of makeJsonLexContext() so
it's first argument would be a text* instead of a char* (and of course
its logic would change accordingly).
I could go either way. Thoughts?
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/08/2013 03:12 PM, Andrew Dunstan wrote:
On 01/08/2013 09:58 AM, Andrew Dunstan wrote:
If you have such a datum, parsing it involves having it in memory and
then taking a copy (I wonder if we could avoid that step - will take
a look).Here is a Proof Of Concept patch against my development tip on what's
involved in getting the JSON lexer not to need a nul-terminated string
to parse. This passes regression, incidentally. The downside is that
processing is very slightly more complex, and that json_in() would
need to call strlen() on its input. The upside would be that the
processing routines I've been working on would no longer need to
create copies of their json arguments using text_to_cstring() just so
they can get a null-terminated string to process.Consequent changes would modify the signature of makeJsonLexContext()
so it's first argument would be a text* instead of a char* (and of
course its logic would change accordingly).I could go either way. Thoughts?
this time with patch ...
Attachments:
jsonparser.patchtext/x-patch; name=jsonparser.patchDownload+69-64
On 01/08/2013 03:07 PM, james wrote:
Yes - but I don't think I can use COPY from a stored proc context can
I? If I could use binary COPY from a stored proc that has received a
binary param and unpacked to the data, it would be handy.
You can use COPY from a stored procedure, but only to and from files.
If SPI provided a way to perform a copy to a temp table and then some
callback on an iterator that yields rows to it, that would do the
trick I guess.
SPI is useful, but it's certainly possible to avoid its use. After all,
that what almost the whole backend does, including the COPY code. Of
course, it's a lot harder to write that way, which is part of why SPI
exists. Efficiency has its price.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/7/13 5:15 PM, Andrew Dunstan wrote:
You (Merlin) have kindly volunteered to work on documentation, so before
we go too far with that any bikeshedding on names, or on the
functionality being provided, should now take place.
Hmm, I was going to say, this patch contains no documentation, so I have
no idea what it is supposed to do. "Recently discussed" isn't a good
substitute for describing what the patch is supposed to accomplish.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
You can use COPY from a stored procedure, but only to and from files.
I think that's in the chocolate fireguard realm though as far as
efficiency for this sort of scenario goes, even if its handled by
retaining an mmap'd file as workspace.
If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it, that would do the trick I guess.
SPI is useful, but it's certainly possible to avoid its use. After all, that what almost the whole backend does, including the COPY code. Of course, it's a lot harder to write that way, which is part of why SPI exists. Efficiency has its price.
So it is possible to use a lower level interface from a C stored proc?
SPI is the (only) documented direct function extension API isn't it?
Is the issue with using the JSON data-to-record set that the parsing can
be costly? Perhaps it can be achieved with B64 of compressed protobuf,
or such. I don't mind if it seems a bit messy - the code can be
generated from the table easily enough, especially if I can use C++. I
guess an allocator that uses SPI_palloc would solve issues with memory
management on error?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 1/7/13 5:15 PM, Andrew Dunstan wrote:
You (Merlin) have kindly volunteered to work on documentation, so before
we go too far with that any bikeshedding on names, or on the
functionality being provided, should now take place.Hmm, I was going to say, this patch contains no documentation, so I have
no idea what it is supposed to do. "Recently discussed" isn't a good
substitute for describing what the patch is supposed to accomplish.
Why not? There are functional examples in the docs and the purpose of
the various functions was hashed out pretty well a couple weeks back,
deficiencies corrected, etc.
reference: http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 01/08/2013 04:32 PM, Merlin Moncure wrote:
On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut<peter_e@gmx.net> wrote:
On 1/7/13 5:15 PM, Andrew Dunstan wrote:
You (Merlin) have kindly volunteered to work on documentation, so before
we go too far with that any bikeshedding on names, or on the
functionality being provided, should now take place.Hmm, I was going to say, this patch contains no documentation, so I have
no idea what it is supposed to do. "Recently discussed" isn't a good
substitute for describing what the patch is supposed to accomplish.Why not? There are functional examples in the docs and the purpose of
the various functions was hashed out pretty well a couple weeks back,
deficiencies corrected, etc.reference:http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html
Well, at a high level the patch is meant to do two things: provide an
API that can be used to build JSON processing functions easily, and
provide some basic json processing functions built on the API. Those
functions provide similar capabilities to the accessor functions that
hstore has.
Perhaps also this will help. Here is the list of functions and operators
as currently implemented. I also have working operators for the get_path
functions which will be in a future patch.
All these are used in the included regression tests.
Name | Result data type | Argument data types
-------------------------+------------------+------------------------------------------------------------------------
json_array_length | integer | json
json_each | SETOF record | from_json json, OUT key text, OUT value json
json_each_as_text | SETOF record | from_json json, OUT key text, OUT value text
json_get | json | json, integer
json_get | json | json, text
json_get_as_text | text | json, integer
json_get_as_text | text | json, text
json_get_path | json | from_json json, VARIADIC path_elems text[]
json_get_path_as_text | text | from_json json, VARIADIC path_elems text[]
json_object_keys | SETOF text | json
json_populate_record | anyelement | anyelement, json
json_populate_recordset | SETOF anyelement | base anyelement, from_json json, use_json_as_text boolean DEFAULT false
json_unnest | SETOF json | from_json json, OUT value json
Name | Left arg type | Right arg type | Result type | Description
------+---------------+----------------+-------------+--------------------------------
-> | json | integer | json | get json array element
-> | json | text | json | get json object field
->> | json | integer | text | get json array element as text
->> | json | text | text | get json object field as text
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers