additional json functionality

Started by Andrew Dunstanover 12 years ago108 messageshackers
Jump to latest
#1Andrew Dunstan
andrew@dunslane.net

In the last year I have created a number of modules providing extra JSON
functionality. I'd like to include some of these in release 9.4.
Candidates for inclusion are:

json_build: which provides extra functionality for building up
non-regualr and arbitrarily complex json, as well as a way of
aggregating key value pairs as a json record. See
<https://github.com/pgexperts/json_build&gt;

json_object: take a one or two dimensional array of text and turn it
into a json object, similar to the way hstore currently does. See
<https://bitbucket.org/qooleot/json_object&gt;

json_to_record: turn json into a record or recordset, using the same
logic as json_pupolate_record and json_populate_recordset, but inferring
the return type from the call instead of from the first argument. See
<https://bitbucket.org/qooleot/json_to_record&gt;

json_build is probably the most important in terms of additional
functionality.

Both json_build and json_to_record have had to copy non-exposed parts of
the internal Json code, which is one of the many reasons I would like to
include them, particularly. These extensions have all been created in
response to client requirements, so there is a definiite use case for
all of them.

If there's agreement on taking these, I will prepare patches and submit
them by the 15th.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Craig Ringer
craig@2ndquadrant.com
In reply to: Andrew Dunstan (#1)
Re: additional json functionality

On 11/13/2013 02:34 AM, Andrew Dunstan wrote:

If there's agreement on taking these, I will prepare patches and submit
them by the 15th.

With JSON enhancement, my only concern is that there's work ongoing to
integrate the v2 development version of hstore with json, providing
typed hstore and an efficient binary storage format for json.

It might be worth seeing how that work is going and what functionality
needs to be added to it, rather than enhancing the existing json support
that may soon change dramatically.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Merlin Moncure
mmoncure@gmail.com
In reply to: Craig Ringer (#2)
Re: additional json functionality

On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 11/13/2013 02:34 AM, Andrew Dunstan wrote:

If there's agreement on taking these, I will prepare patches and submit
them by the 15th.

With JSON enhancement, my only concern is that there's work ongoing to
integrate the v2 development version of hstore with json, providing
typed hstore and an efficient binary storage format for json.

It might be worth seeing how that work is going and what functionality
needs to be added to it, rather than enhancing the existing json support
that may soon change dramatically.

I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize). Andrew's patches just fill out a couple
of missing cases that are handled in the existing API. Putting all
the patches together, ISTM there might be a function or two too many.
I'm not sure why the json_ prefix was abandoned for build_json_object
and build_json_array.

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.". I think the caller should have no such
responsibility. Keys should be able to repeated. Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice. I greatly prefer the aggregation and the
variadic methods in json_build.

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#3)
Re: additional json functionality

On 11/13/2013 09:45 AM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

On 11/13/2013 02:34 AM, Andrew Dunstan wrote:

If there's agreement on taking these, I will prepare patches and submit
them by the 15th.

With JSON enhancement, my only concern is that there's work ongoing to
integrate the v2 development version of hstore with json, providing
typed hstore and an efficient binary storage format for json.

It might be worth seeing how that work is going and what functionality
needs to be added to it, rather than enhancing the existing json support
that may soon change dramatically.

I'm going to be fairly upset if I'm told I have to wait for the new
format work, and then I'm later told it's too late to bring this into 9.4.

I think these are really orthogonal issues. Adding a new serialization
format (which I have been discussing with Oleg and Teodor, and which I
hope to help in bringing to JSON) will make some things lots faster than
they now are, and might make some things easier or possible where now
they are hard or impossible, but it won't remove any functionality
requirement.

In particular, json_build, which lets you build up arbitrarily complex
and irregular json in a way that's just not possible without using a PL
right now, is quite an important requirement. I've given talks about it
and JSON users have been quite excited by the possibilities it opens up.

The patch for json_to_record is quite small (two functions), and it has
the advantage that unlike the json_populate_record functions you don't
need to have or create a named type to use it. I think that makes it
worth having in itself.

I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize).

I have no idea what this means.

The plan with the work that Oleg and Teodor are doing is to provide a
set of common code that can be used by either a binary json
representation (which will be able to be distinguished from a text
representation, so there would be no pg_upgrade problems) or nested
hstore. In effect, nested hstore and json would have pretty much
identical capabilities, so using one ovber another should be largely a
matter of preference than a forced choice. Frankly, I believe the
audience for JSON is vastly larger, and I expect it to be the treeish
data format of choice for almost all users.

Andrew's patches just fill out a couple
of missing cases that are handled in the existing API. Putting all
the patches together, ISTM there might be a function or two too many.
I'm not sure why the json_ prefix was abandoned for build_json_object
and build_json_array.

I'm quite happy to change it.

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format,

What? there is no new serialization format. This is a way to generate a
json object in the existing format from a one or two dimensional array
of text. c.f. |existing function hstore(text[]) => hstore|

and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.". I think the caller should have no such
responsibility. Keys should be able to repeated.

They can be repeated, as they can in the current json text format.
However, the function makes no attempt to deal with repeated keys. If a
key is repeated in the inout it will be repeated in the output. In this
respect it differs from the hstore function.

Note too, that one effect of moving to a non-text representation of json
will be that duplicated keys will be resolved (last value will win). But
that's a much wider issue that this function.

Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice. I greatly prefer the aggregation and the
variadic methods in json_build.

The extension was built before json_build. But it met a requirement that
existed at the time. It probably wouldn't be a tragedy to leave it out,
but there is probably a place for it just as there is for the hstore
function.

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

The third point seems to conflict with the first. I'd only consider that
if we *do* add the one-array version of json_object.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#4)
Re: additional json functionality

On Wed, Nov 13, 2013 at 9:32 AM, Andrew Dunstan <andrew@dunslane.net> wrote:

I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize).

I have no idea what this means.

What I'm saying there is I agree: what's going with hstore is not
relevant here. json features currently provide standalone
serialization and deserialization minus a couple of edge cases that
you are fixing up here. Hstore will emerge as a json manipulation
engine.

Aside: I thought we blew it (as you know) by not unifying the hstore
and json APIs in the 9.2 cycle and now with the emerging json stuff I
really think so...it's way to late to do anything about it now even if
there was consensus on that point.

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format,

What? there is no new serialization format. This is a way to generate a json
object in the existing format from a one or two dimensional array of text.
c.f. |existing function hstore(text[]) => hstore|

Right -- I thought it took text, not text[] -- withdrawn. I consider
a variant taking (text[], text[]) to be generally more practical than
the one argument version (either 1d or 2d variant). Dealing with 2d
arrays is a headache unless you really know what you're doing.

and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.". I think the caller should have no such
responsibility. Keys should be able to repeated.

They can be repeated, as they can in the current json text format. However,
the function makes no attempt to deal with repeated keys. If a key is
repeated in the inout it will be repeated in the output. In this respect it
differs from the hstore function.

Yes. and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet). In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

Note too, that one effect of moving to a non-text representation of json
will be that duplicated keys will be resolved (last value will win). But
that's a much wider issue that this function.

Right, exactly. And I think this a pretty serious problem with 'non
text json' unless there is a relatively robust and fast process to
recompose the json properly for serialization purposes (but that's
mostly off topic for your proposed patch).

Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice. I greatly prefer the aggregation and the
variadic methods in json_build.

The extension was built before json_build. But it met a requirement that
existed at the time. It probably wouldn't be a tragedy to leave it out, but
there is probably a place for it just as there is for the hstore function.

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking
about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

The third point seems to conflict with the first. I'd only consider that if
we *do* add the one-array version of json_object.

ok, agreed. so now I'm just saying to unify function names over json
prefix and maybe add text[], text[] variant for the object builder, or
maybe just drop json_object completely.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#5)
Re: additional json functionality

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes. and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet). In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

I think you're probably going to lose any argument that says we should
necessarily preserve key order (and possibly key duplication) in
objects. The standard doesn't support such a contention, either:

An object is an unordered collection of zero or more name/value
pairs

...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty
effective barrier to any performance improvements.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#6)
Re: additional json functionality

On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes. and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet). In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty effective
barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

"An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique."

And SHOULD means
"3. SHOULD. This word, or the adjective "RECOMMENDED", mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course."

As far as I'm concerned, that settles things right there. Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should
be considered equivalent. Another consequence is that creating
particular legal constructions should be discouraged. I disagree with
this.

This is simply not the case with many json consuming clients. It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC. I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
Re: additional json functionality

On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
require hstore to do things like build

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.".

This is pretty standard in the programming languages I know of which use
JSON.

I think the caller should have no such
responsibility. Keys should be able to repeated.

Apparently your experience with using JSON in practice has been fairly
different from mine; the projects I work on, the JSON is being
constantly converted back and forth to hashes and dictionaries, which
means that ordering is not preserved and keys have to be unique (or
become unique within one conversion cycle). I think, based on the
language of the RFC and common practice, that it's completely valid for
us to require unique keys within JSON-manipulation routines.

Certainly the upcoming binary storage is going to require unique keys.
For that matter, both MongoDB and CouchDB store unique, unordered keys.
And ever supporting CRUD functions (i.e. "update this key") is going to
require uniqueness.

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not

thinking about)

*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

I was with you until the third idea. Huh?

The scripting languages I use (Perl, Python) have functions which
convert a list/array to a hash/dictionary. In each case, the standard
input is a single list/array in the form [ k, v, k, v, k, v ]. Now,
while there are standard language functions which support munging two
parallel arrays into one hash (such as Python's zip()), these are less
frequently used. Supporting the zip() option without supporting the [
k, v ] array option would be a bizarre and puzzling approach to most
programmers I know. I can see three approaches which make sense:

1. we don't include json_object at all.
2. we include the existing json_object
3. we include json_object, plus a second json_object function which
takes two arrays

Keep in mind that all of Andrew's functions came out of real-life use
cases of writing applications which return JSON to the caller, so they
are based on real needs to fill holes in our JSON-building function library.

In the case of json_object, the need was to supply "column" labels
where, usually due to calculated columns, none exist in the input. Take
the example where I want to return a bunch of aggregates from a table as
a series of json objects with user-friendly labels:

SELECT build_json_object( "dept", department, "total_costs", sum(costs),
"running_total", running_sum() )
FROM ....

Where it becomes even more useful is when you want the json label to be
the result of a calculated expression:

SELECT build_json_object ( department, sum() )

Yes, you could do this with a two-array version as well; it's just not
more intuitive, and in cases where you have dozens of columns, puts you
in column-counting hell.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#7)
Re: additional json functionality

On 11/13/2013 04:58 PM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes. and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet). In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty effective
barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

"An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique."

And SHOULD means
"3. SHOULD. This word, or the adjective "RECOMMENDED", mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course."

As far as I'm concerned, that settles things right there. Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should
be considered equivalent. Another consequence is that creating
particular legal constructions should be discouraged. I disagree with
this.

This is simply not the case with many json consuming clients. It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC. I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.

I understand the difference between "should" and "must". But there is
nothing that REQUIRES us to preserve key order or duplicate keys. If you
really need textual preservation, you should probably store the data as
text and convert it to json to do json-ish things to it. If not, we're
going to face huge demands to implement another type which almost
everyone but you will move to in rapid order because it performs so much
better. The strong consensus I have seen in discussions at conferences
and elsewhere is to go the way we're going, instead.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Andrew Dunstan (#9)
Re: additional json functionality

On 14/11/13 11:33, Andrew Dunstan wrote:

On 11/13/2013 04:58 PM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net>
wrote:

On 11/13/2013 11:37 AM, Merlin Moncure wrote:

Yes. and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet). In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

An object is an unordered collection ofz ero or more name/value
pairs
...

The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty
effective
barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.

"An object structure is represented as a pair of curly brackets
surrounding zero or more name/value pairs (or members). A name is a
string. A single colon comes after each name, separating the name
from the value. A single comma separates a value from a following
name. The names within an object SHOULD be unique."

And SHOULD means
"3. SHOULD. This word, or the adjective "RECOMMENDED", mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course."

As far as I'm concerned, that settles things right there. Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should
be considered equivalent. Another consequence is that creating
particular legal constructions should be discouraged. I disagree with
this.

This is simply not the case with many json consuming clients. It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC. I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.

I understand the difference between "should" and "must". But there is
nothing that REQUIRES us to preserve key order or duplicate keys. If
you really need textual preservation, you should probably store the
data as text and convert it to json to do json-ish things to it. If
not, we're going to face huge demands to implement another type which
almost everyone but you will move to in rapid order because it
performs so much better. The strong consensus I have seen in
discussions at conferences and elsewhere is to go the way we're going,
instead.

cheers

andrew

I can see that both points of view are valid, in different contexts.

Would be possible to have a boolean, such as 'strict' - so that unique &
ordered was only imposed when strict was TRUE? Alternately, separate
functions to allow the same choice?

Cheers,
Gavin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Merlin Moncure
mmoncure@gmail.com
In reply to: Josh Berkus (#8)
Re: additional json functionality

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
require hstore to do things like build

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.".

This is pretty standard in the programming languages I know of which use
JSON.

I think the caller should have no such
responsibility. Keys should be able to repeated.

Apparently your experience with using JSON in practice has been fairly
different from mine; the projects I work on, the JSON is being
constantly converted back and forth to hashes and dictionaries, which
means that ordering is not preserved and keys have to be unique (or
become unique within one conversion cycle). I think, based on the
language of the RFC and common practice, that it's completely valid for
us to require unique keys within JSON-manipulation routines.

Common practice? The internet is littered with complaints about
documents being spontaneously re-ordered and or de-duplicated in
various stacks. Other stacks provide mechanisms for explicit key
order handling (see here: http://docs.python.org/2/library/json.html).
Why do you think they did that?

I use pg/JSON all over the place. In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument. The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not. I guess that's a done deal based on
'performance'. I'm clearly not the only one to have complained about
this though.

merln

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Mike Blackwell
mike.blackwell@rrd.com
In reply to: Josh Berkus (#8)
Re: additional json functionality

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not

thinking about)

*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

I was with you until the third idea. Huh?
​​

​I actually had a use case for this today, though with hstore, importing a
fixed length record with​ something along the lines of:

hstore(
ARRAY['field 1', 'field 2', 'field 3'],
regexp_matches(fixed_field,'(.{4})(.{10})(.{5})')
)

__________________________________________________________________________________
*Mike Blackwell | Technical Analyst, Distribution Services/Rollout
Management | RR Donnelley*
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike.Blackwell@rrd.com
http://www.rrdonnelley.com

<http://www.rrdonnelley.com/&gt;
* <Mike.Blackwell@rrd.com>*

#13Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
Re: additional json functionality

Merlin,

I use pg/JSON all over the place. In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument. The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not. I guess that's a done deal based on
'performance'. I'm clearly not the only one to have complained about
this though.

It's not just a matter of "performance". It's the basic conflict of
JSON as document format vs. JSON as data storage. For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

XML went through the same thing, which is part of how we got a bunch of
incompatible "dialects" of XML.

Now, your use case does show us that there's a case to be made for still
having text JSON even after we have binary JSON. There's a strong
simplicity argument against that, though ...

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14Hannu Krosing
hannu@tm.ee
In reply to: Merlin Moncure (#11)
Re: additional json functionality

On 11/14/2013 12:09 AM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
require hstore to do things like build

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.".

This is pretty standard in the programming languages I know of which use
JSON.

I think the caller should have no such
responsibility. Keys should be able to repeated.

Apparently your experience with using JSON in practice has been fairly
different from mine; the projects I work on, the JSON is being
constantly converted back and forth to hashes and dictionaries, which
means that ordering is not preserved and keys have to be unique (or
become unique within one conversion cycle). I think, based on the
language of the RFC and common practice, that it's completely valid for
us to require unique keys within JSON-manipulation routines.

Common practice? The internet is littered with complaints about
documents being spontaneously re-ordered and or de-duplicated in
various stacks. Other stacks provide mechanisms for explicit key
order handling (see here: http://docs.python.org/2/library/json.html).
Why do you think they did that?

I use pg/JSON all over the place. In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument. The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not.

I guess we should not replace current JSON type with hstore based
one, but add something json-like based on nested hstore instead.

Maybe call it jsdoc or jdoc or jsobj or somesuch.

For some time I was also pretty perplexed by by some PostgreSQL JSON
type discussions where JSON was not really being a defined as
the the type constructed from its string representation, and even not
a string which results from serialising an existing javascript object,
but rather a source code, which can be parsed into a structured type.

So PostgreSQL "json" type is *not* a structured type like hstore is but
is really a string type with a few syntax checks.

Some of the json_* functions are then defined on top of this
"json-source" type which treat this source as if it were actual
structured type.

It is kind of defining an "int-notation" type, which acts like an integer
when added to another integer, but is required to also keep its original
representation:

select '1+1'::"int-notation" + 2;
==> 4
select '1+1'::"int-notation"
==> 1+1

I guess that's a done deal based on
'performance'. I'm clearly not the only one to have complained about
this though.

I am pretty sure we can not move to internal object representation and
preserve the current 'json source" behaviour.

this is why I recommend not replacing json, but rather adding another
built-in
for real structured type.

then you can keep using current json for the earlier-quoted uses of
"processing instructions" and do real data manipulation on jsdoc/jsobj type.

Also most of the current json functions should also be moved to work on
jsobj instead with explicit cast from json to jsobj

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic O�

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15David E. Wheeler
david@kineticode.com
In reply to: Gavin Flower (#10)
Re: additional json functionality

On Nov 13, 2013, at 2:41 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

Would be possible to have a boolean, such as 'strict' - so that unique & ordered was only imposed when strict was TRUE? Alternately, separate functions to allow the same choice?

It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16Andrew Dunstan
andrew@dunslane.net
In reply to: Hannu Krosing (#14)
Re: additional json functionality

On 11/13/2013 07:01 PM, Hannu Krosing wrote:

I guess we should not replace current JSON type with hstore based
one, but add something json-like based on nested hstore instead.

Well, that's two voices for that course of action.

Interesting that I don't think I heard a single voice for this either at
pgCon or pgOpen, although I spent large amounts of time at both talking
to people about Json, so I'd be interested to hear more voices.

It would actually simplify things in a way if we do that - we've been
working on a way of doing this that wouldn't upset pg_upgrade. This
would render that effort unnecessary.

However it will complicate things for users who will have to choose
between the data types, and function authors who will possibly have to
write versions of functions to work with both types.

Also most of the current json functions should also be moved to work on
jsobj instead with explicit cast from json to jsobj

Sure, we can overload them - that's probably the least of our worries.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17Andrew Dunstan
andrew@dunslane.net
In reply to: David E. Wheeler (#15)
Re: additional json functionality

On 11/13/2013 07:39 PM, David E. Wheeler wrote:

On Nov 13, 2013, at 2:41 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

Would be possible to have a boolean, such as 'strict' - so that unique & ordered was only imposed when strict was TRUE? Alternately, separate functions to allow the same choice?

It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.

I don't in the least understand how it could be a pretty printing
option. If we move to a binary rep using the hstore stuff order will be
destroyed and not stored anywhere, and duplicate keys will be lost. Once
that's done, how would a pretty print option restore the lost info?

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#17)
Re: additional json functionality

On Nov 13, 2013, at 4:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.

I don't in the least understand how it could be a pretty printing option. If we move to a binary rep using the hstore stuff order will be destroyed and not stored anywhere, and duplicate keys will be lost. Once that's done, how would a pretty print option restore the lost info?

I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable.

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19Garick Hamlin
ghamlin@isc.upenn.edu
In reply to: David E. Wheeler (#18)
Re: additional json functionality

On Wed, Nov 13, 2013 at 04:50:49PM -0800, David E. Wheeler wrote:

On Nov 13, 2013, at 4:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.

I don't in the least understand how it could be a pretty printing option.
If we move to a binary rep using the hstore stuff order will be destroyed
and not stored anywhere, and duplicate keys will be lost. Once that's done,
how would a pretty print option restore the lost info?

I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable.

There is a canonical form.

http://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00

A Canonical form would be very useful. Thats a bit trickier than sorting the
keys and I don't know there is an accepted canonical form for json yet that
can represent all json documents. (The canonical form is not the pretty form,
but I think the key ordering should be the same.)

It might be nice to have a more general canonical form if one emerges from
somewhere that could encode any json. Since without something like this,
hashing can only be well specified for the 'sensible subset of json' used in
security protocols.

Garick

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20Merlin Moncure
mmoncure@gmail.com
In reply to: Hannu Krosing (#14)
Re: additional json functionality

On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

On 11/14/2013 12:09 AM, Merlin Moncure wrote:

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:

On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
require hstore to do things like build

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.".

This is pretty standard in the programming languages I know of which use
JSON.

I think the caller should have no such
responsibility. Keys should be able to repeated.

Apparently your experience with using JSON in practice has been fairly
different from mine; the projects I work on, the JSON is being
constantly converted back and forth to hashes and dictionaries, which
means that ordering is not preserved and keys have to be unique (or
become unique within one conversion cycle). I think, based on the
language of the RFC and common practice, that it's completely valid for
us to require unique keys within JSON-manipulation routines.

Common practice? The internet is littered with complaints about
documents being spontaneously re-ordered and or de-duplicated in
various stacks. Other stacks provide mechanisms for explicit key
order handling (see here: http://docs.python.org/2/library/json.html).
Why do you think they did that?

I use pg/JSON all over the place. In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument. The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not.

I guess we should not replace current JSON type with hstore based
one, but add something json-like based on nested hstore instead.

Maybe call it jsdoc or jdoc or jsobj or somesuch.

This is exactly what needs to be done, full stop (how about: hstore).
It really comes down to this: changing the serialization behaviors
that have been in production for 2 releases (three if you count the
extension) is bad enough, but making impossible some legal json
constructions which are currently possible is an unacceptable
compatibility break. It's going to break applications I've currently
put into production with no clear workaround. This is quite frankly
not ok and and I'm calling foul. The RFC may claim that these
constructions are dubious but that's irrelevant. It's up to the
parser to decide that and when serializing you are not in control of
the parser.

Had the json type been stuffed into an extension, there would be a
clearer path to get to where you want to go since we could have walled
off the old functionality and introduced side by side API calls. As
things stand now, I don't see a clean path to do that.

I use pg/JSON all over the place. In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument. The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not. I guess that's a done deal based on
'performance'. I'm clearly not the only one to have complained about
this though.

It's not just a matter of "performance". It's the basic conflict of
JSON as document format vs. JSON as data storage. For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

On Wed, Nov 13, 2013 at 5:20 PM, Josh Berkus <josh@agliodbs.com> wrote:

It's not just a matter of "performance". It's the basic conflict of
JSON as document format vs. JSON as data storage. For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

That's not very convincing. What *exactly* is impossible and why to
you think it justifies breaking compatibility with current
applications? The way forward seems pretty straightforward: given
that hstore is getting nesting power and is moving closer to the json
way of doing things it is essentially 'binary mode json'. I'm ok with
de-duplication and key ordering when moving into that structure since
it's opt in and doesn't break the serialization behaviors we have
today. If you want to go further and unify the types then you have to
go through the design work to maintain compatibility.

Furthermore, I bet the performance argument isn't so clear cut either.
The current json type is probably faster at bulk serialization
precisely because you *dont* need to deduplicate and reorder keys: the
serialization operates without context. It will certainly be much
better for in place manipulations but it's not nearly as simple as you
are making it out to be.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#16)
#22Hannu Krosing
hannu@tm.ee
In reply to: Josh Berkus (#13)
#23Merlin Moncure
mmoncure@gmail.com
In reply to: Hannu Krosing (#21)
#24Hannu Krosing
hannu@tm.ee
In reply to: Merlin Moncure (#20)
#25Hannu Krosing
hannu@tm.ee
In reply to: Merlin Moncure (#23)
#26Merlin Moncure
mmoncure@gmail.com
In reply to: Hannu Krosing (#25)
#27David E. Wheeler
david@kineticode.com
In reply to: Merlin Moncure (#20)
#28Merlin Moncure
mmoncure@gmail.com
In reply to: David E. Wheeler (#27)
#29Hannu Krosing
hannu@tm.ee
In reply to: Merlin Moncure (#28)
#30David G. Johnston
david.g.johnston@gmail.com
In reply to: Hannu Krosing (#29)
#31Merlin Moncure
mmoncure@gmail.com
In reply to: Hannu Krosing (#29)
#32David E. Wheeler
david@kineticode.com
In reply to: Merlin Moncure (#31)
#33Merlin Moncure
mmoncure@gmail.com
In reply to: David E. Wheeler (#32)
#34Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#33)
#35Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#34)
#36Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#37Merlin Moncure
mmoncure@gmail.com
In reply to: Josh Berkus (#36)
#38Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#36)
#39David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#34)
#40Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#41David G. Johnston
david.g.johnston@gmail.com
In reply to: Merlin Moncure (#35)
In reply to: Josh Berkus (#40)
#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kenneth Marshall (#42)
#44Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#43)
#45David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#44)
#46Hannu Krosing
hannu@tm.ee
In reply to: Merlin Moncure (#33)
#47Merlin Moncure
mmoncure@gmail.com
In reply to: Hannu Krosing (#46)
#48Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#49David G. Johnston
david.g.johnston@gmail.com
In reply to: Josh Berkus (#48)
#50David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#49)
#51Josh Berkus
josh@agliodbs.com
In reply to: David E. Wheeler (#27)
#52Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#51)
#53Hannu Krosing
hannu@tm.ee
In reply to: Josh Berkus (#48)
#54David E. Wheeler
david@kineticode.com
In reply to: Hannu Krosing (#53)
#55Hannu Krosing
hannu@tm.ee
In reply to: David E. Wheeler (#54)
#56Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#57David E. Wheeler
david@kineticode.com
In reply to: Hannu Krosing (#55)
#58David G. Johnston
david.g.johnston@gmail.com
In reply to: David E. Wheeler (#57)
#59Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: David E. Wheeler (#57)
#60Hannu Krosing
hannu@tm.ee
In reply to: David E. Wheeler (#57)
#61Hannu Krosing
hannu@tm.ee
In reply to: Dimitri Fontaine (#59)
#62David E. Wheeler
david@kineticode.com
In reply to: Dimitri Fontaine (#59)
#63Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: David E. Wheeler (#62)
#64David E. Wheeler
david@kineticode.com
In reply to: Dimitri Fontaine (#63)
#65Andrew Dunstan
andrew@dunslane.net
In reply to: Dimitri Fontaine (#59)
#66Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Andrew Dunstan (#65)
#67Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: David E. Wheeler (#57)
#68Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: David G. Johnston (#58)
#69Andrew Dunstan
andrew@dunslane.net
In reply to: Dimitri Fontaine (#66)
#70Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#71David E. Wheeler
david@kineticode.com
In reply to: Josh Berkus (#70)
#72Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: David E. Wheeler (#71)
#73Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#70)
#74David E. Wheeler
david@kineticode.com
In reply to: Andrew Dunstan (#73)
#75Hannu Krosing
hannu@tm.ee
In reply to: Andrew Dunstan (#73)
#76Peter Eisentraut
peter_e@gmx.net
In reply to: Josh Berkus (#48)
#77Merlin Moncure
mmoncure@gmail.com
In reply to: Andrew Dunstan (#73)
#78Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#77)
#79Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#80Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#81Merlin Moncure
mmoncure@gmail.com
In reply to: Josh Berkus (#80)
#82Robert Haas
robertmhaas@gmail.com
In reply to: Hannu Krosing (#29)
#83David E. Wheeler
david@kineticode.com
In reply to: Robert Haas (#82)
#84Robert Haas
robertmhaas@gmail.com
In reply to: David E. Wheeler (#83)
#85Josh Berkus
josh@agliodbs.com
In reply to: Andrew Dunstan (#1)
#86Robert Haas
robertmhaas@gmail.com
In reply to: Josh Berkus (#85)
#87Andrew Dunstan
andrew@dunslane.net
In reply to: Josh Berkus (#85)
#88David G. Johnston
david.g.johnston@gmail.com
In reply to: Andrew Dunstan (#87)
#89Gavin Flower
GavinFlower@archidevsys.co.nz
In reply to: Robert Haas (#82)
#90Andrew Dunstan
andrew@dunslane.net
In reply to: David G. Johnston (#88)
#91David G. Johnston
david.g.johnston@gmail.com
In reply to: Gavin Flower (#89)
#92Andrew Dunstan
andrew@dunslane.net
In reply to: David G. Johnston (#88)
#93Merlin Moncure
mmoncure@gmail.com
In reply to: Josh Berkus (#85)
#94Andrew Dunstan
andrew@dunslane.net
In reply to: Merlin Moncure (#93)
#95Robert Haas
robertmhaas@gmail.com
In reply to: David G. Johnston (#88)
#96Andrew Dunstan
andrew@dunslane.net
In reply to: Robert Haas (#95)
#97Hannu Krosing
hannu@tm.ee
In reply to: Josh Berkus (#79)
#98David G. Johnston
david.g.johnston@gmail.com
In reply to: Hannu Krosing (#97)
#99Josh Berkus
josh@agliodbs.com
In reply to: Josh Berkus (#8)
#100Bruce Momjian
bruce@momjian.us
In reply to: Josh Berkus (#51)
#101Andrew Dunstan
andrew@dunslane.net
In reply to: Bruce Momjian (#100)
#102Josh Berkus
josh@agliodbs.com
In reply to: David E. Wheeler (#27)
#103Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#98)
#104Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#103)
#105Robert Haas
robertmhaas@gmail.com
In reply to: Bruce Momjian (#100)
#106Andres Freund
andres@anarazel.de
In reply to: Josh Berkus (#102)
#107Maciej Gajewski
maciej.gajewski0@gmail.com
In reply to: Bruce Momjian (#100)
#108Josh Berkus
josh@agliodbs.com
In reply to: David E. Wheeler (#27)