Add jsonb_compact(...) for whitespace-free jsonb to text
Hi,
The default text representation of jsonb adds whitespace in between
key/value pairs (after the colon ":") and after successive properties
(after the comma ","):
postgres=# SELECT '{"b":2,"a":1}'::jsonb::text;
text
------------------
{"a": 1, "b": 2}
(1 row)
AFAIK, there's also no guarantee on the specific order of the resulting
properties in the text representation either. I would suppose it's fixed
for a given jsonb value within a database major version but across major
versions it could change (if the underlying representation changes).
I originally ran into this when comparing the hashes of the text
representation of jsonb columns. The results didn't match up because the
javascript function JSON.stringify(...) does not add any extra whitespace.
It'd be nice to have a stable text representation of a jsonb value with
minimal whitespace. The latter would also save a few bytes per record in
text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
Attached is a *very* work in progress patch that adds a
jsonb_compact(jsonb)::text function. It generates a text representation
without extra whitespace but does not yet try to enforce a stable order of
the properties within a jsonb value.
Here's some sample usage:
postgres=# SELECT x::text, jsonb_compact(x) FROM (VALUES ('{}'::jsonb),
('{"a":1,"b":2}'), ('[1,2,3]'), ('{"a":{"b":1}}')) AS t(x);
x | jsonb_compact
------------------+---------------
{} | {}
{"a": 1, "b": 2} | {"a":1,"b":2}
[1, 2, 3] | [1,2,3]
{"a": {"b": 1}} | {"a":{"b":1}}
(4 rows)
There aren't any tests yet but I'd like to continue working on it for
inclusion in 9.7. I'm posting it now to see if there's interest in the idea
and get some feedback on the approach (this is my first real patch...).
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
Attachments:
add_jsonb_compact.patchtext/x-patch; charset=US-ASCII; name=add_jsonb_compact.patchDownload+66-11
On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
AFAIK, there's also no guarantee on the specific order of the
resulting properties in the text representation either. I would
suppose it's fixed for a given jsonb value within a database major
version but across major versions it could change (if the underlying
representation changes).
The order is fixed and very unlikely to change, as it was chosen quite
deliberately to help ensure efficient processing. Any change in on-disk
representation of data types is something we work very hard to avoid, as
it makes it impossible to run pg_upgrade.
It's true that the storage order of keys is not terribly intuitive.
Note that the json type, unlike jsonb, preserves exactly the white space
and key order of its input. In fact, the input is exactly what it stores.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Andrew Dunstan (andrew@dunslane.net) wrote:
On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
AFAIK, there's also no guarantee on the specific order of the
resulting properties in the text representation either. I would
suppose it's fixed for a given jsonb value within a database major
version but across major versions it could change (if the
underlying representation changes).The order is fixed and very unlikely to change, as it was chosen
quite deliberately to help ensure efficient processing. Any change
in on-disk representation of data types is something we work very
hard to avoid, as it makes it impossible to run pg_upgrade.
We do, from time-to-time, change on-disk formats in a
backwards-compatible way though. In any case, it's my understanding
that we don't *guarantee* any ordering currently and therefore we should
discourage users from depending on it. If we *are* going to guarantee
ordering, then we should document what that ordering is.
Thanks!
Stephen
On Sun, Apr 24, 2016 at 9:27 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Andrew Dunstan (andrew@dunslane.net) wrote:
On 04/24/2016 06:02 PM, Sehrope Sarkuni wrote:
AFAIK, there's also no guarantee on the specific order of the
resulting properties in the text representation either. I would
suppose it's fixed for a given jsonb value within a database major
version but across major versions it could change (if the
underlying representation changes).The order is fixed and very unlikely to change, as it was chosen
quite deliberately to help ensure efficient processing. Any change
in on-disk representation of data types is something we work very
hard to avoid, as it makes it impossible to run pg_upgrade.We do, from time-to-time, change on-disk formats in a
backwards-compatible way though. In any case, it's my understanding
that we don't *guarantee* any ordering currently and therefore we should
discourage users from depending on it. If we *are* going to guarantee
ordering, then we should document what that ordering is.
Yes that's the idea, namely to have a fixed text format that will not
change across releases. If the on-disk representation is already supports
that then this could just be a doc change (assuming there's agreement that
it's a good idea and said guarantee will be maintained).
Separately, I think the compact (i.e. whitespace free) output is useful on
it's own. It adds up to two bytes per key/value pair (one after the colon
and one after the comma) so the more keys you have the more the savings.
Here's a (contrived) example to show the size difference when serializing
information_schema.columns. The row_to_json(...) function returns
whitespace free output (as json, not jsonb) so it's a proxy for
json_compact(..). It comes out to 7.5% smaller than the default jsonb text
format:
app=> SELECT
MAX((SELECT COUNT(*) FROM json_object_keys(x))) AS num_keys,
AVG(length(x::text)) AS json_text,
AVG(length(x::jsonb::text)) AS jsonb_text,
AVG(length(x::text)) / AVG(length(x::jsonb::text)) AS ratio
FROM (SELECT row_to_json(z.*) AS x
FROM information_schema.columns z) t;
num_keys | json_text | jsonb_text | ratio
----------+-----------------------+-----------------------+------------------------
44 | 1077.0748522652659225 | 1164.0748522652659225 |
0.92526253803121012857
(1 row)
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
Hi,
The default text representation of jsonb adds whitespace in between
key/value pairs (after the colon ":") and after successive properties
(after the comma ","):postgres=# SELECT '{"b":2,"a":1}'::jsonb::text;
text
------------------
{"a": 1, "b": 2}
(1 row)AFAIK, there's also no guarantee on the specific order of the resulting
properties in the text representation either. I would suppose it's fixed
for a given jsonb value within a database major version but across major
versions it could change (if the underlying representation changes).I originally ran into this when comparing the hashes of the text
representation of jsonb columns. The results didn't match up because the
javascript function JSON.stringify(...) does not add any extra whitespace.It'd be nice to have a stable text representation of a jsonb value with
minimal whitespace. The latter would also save a few bytes per record in
text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
+1
I cannot comment on the patch itself, but I welcome jsonb_compact() or some
way to get JSON with no inserted whitespace.
* Ryan Pedela (rpedela@datalanche.com) wrote:
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
The default text representation of jsonb adds whitespace in between
key/value pairs (after the colon ":") and after successive properties
(after the comma ","):
[...]
It'd be nice to have a stable text representation of a jsonb value with
minimal whitespace. The latter would also save a few bytes per record in
text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).+1
I cannot comment on the patch itself, but I welcome jsonb_compact() or some
way to get JSON with no inserted whitespace.
As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.
Compression might be another option, though that's certainly less
flexible and only (easily) used in combination with SSL, today.
Thanks!
Stephen
On Sun, Apr 24, 2016 at 8:16 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
Note that the json type, unlike jsonb, preserves exactly the white space and
key order of its input. In fact, the input is exactly what it stores.
That is true, but the json serialization functions (to_json etc)
really out to have the same whitespace strategy is jsonb text out. Of
the two ways it's currently done, the json serialization variant seems
better but a completely compact variant seems like a good idea basis
of efficiency.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 26 April 2016 at 12:49, Stephen Frost <sfrost@snowman.net> wrote:
* Ryan Pedela (rpedela@datalanche.com) wrote:
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com>
wrote:
The default text representation of jsonb adds whitespace in between
key/value pairs (after the colon ":") and after successive properties
(after the comma ","):[...]
It'd be nice to have a stable text representation of a jsonb value with
minimal whitespace. The latter would also save a few bytes per recordin
text output formats, on the wire, and in backups (ex: COPY ... TO
STDOUT).
+1
I cannot comment on the patch itself, but I welcome jsonb_compact() or
some
way to get JSON with no inserted whitespace.
As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.
+1
Dave Cramer
davec@postgresintl.com
www.postgresintl.com
On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Ryan Pedela (rpedela@datalanche.com) wrote:
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
The default text representation of jsonb adds whitespace in between
key/value pairs (after the colon ":") and after successive properties
(after the comma ","):[...]
It'd be nice to have a stable text representation of a jsonb value with
minimal whitespace. The latter would also save a few bytes per record in
text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).+1
I cannot comment on the patch itself, but I welcome jsonb_compact() or some
way to get JSON with no inserted whitespace.As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.
-1
This will benefit pretty much nobody unless you are writing a hand
crafted C application that consumes and processes the data directly.
I'd venture to guess this is a tiny fraction of pg users these days.
I do not understand at all the objection to removing whitespace.
Extra whitespace does nothing but pad the document as humans will
always run the document through a prettifier tuned to their specific
requirements (generally starting with, intelligent placement of
newlines) if reading directly.
Also, binary formats are not always smaller than text formats.
Compression might be another option, though that's certainly less
flexible and only (easily) used in combination with SSL, today.
right, exactly.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
* Merlin Moncure (mmoncure@gmail.com) wrote:
On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.-1
This will benefit pretty much nobody unless you are writing a hand
crafted C application that consumes and processes the data directly.
That's not accurate. All that's needed is for the libraries which
either wrap libpq or re-implement it to be updated to understand the
format and then convert the data into whatever structure makes sense for
the language (or library that the language includes for working with
JSON data).
One of the unfortunate realities with JSON is that there isn't a
terribly good binary representation, afaik. As I understand it, BSON
doesn't support all the JSON structures that we do; if it did, I'd
suggest we provide a way to convert our structure into BSON.
I'd venture to guess this is a tiny fraction of pg users these days.
I do not understand at all the objection to removing whitespace.
Extra whitespace does nothing but pad the document as humans will
always run the document through a prettifier tuned to their specific
requirements (generally starting with, intelligent placement of
newlines) if reading directly.
The objection is that it's a terribly poor solution as it simply makes
things ugly for a pretty small amount of improvement. Looking at it
from the perspective of just "whitespace is bad!" it might look like a
good answer to just remove whitespace, but we should be looking at it
from the perspective of "how do we make this more efficient?". Under
that lense, removing whitespace appears to be minimally effective
whereas passing the data back in a binary structure looks likely to
greatly improve the efficiency on a number of levels.
Also, binary formats are not always smaller than text formats.
While true, I don't think that would be true in this case.
Of course, there's nothing like actually trying it and seeing.
Thanks!
Stephen
On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
It'd be nice to have a stable text representation of a jsonb value with
minimal whitespace. The latter would also save a few bytes per record in
text output formats, on the wire, and in backups (ex: COPY ... TO STDOUT).
Attached is a *very* work in progress patch that adds a
jsonb_compact(jsonb)::text function. It generates a text representation
without extra whitespace but does not yet try to enforce a stable order of
the properties within a jsonb value.
This doesn't impact backups unless you do away with the function and change
the output i/o function for the type. In that scenario the function is no
longer necessary.
David J.
On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
Attached is a *very* work in progress patch that adds a
jsonb_compact(jsonb)::text function. It generates a text representation
without extra whitespace but does not yet try to enforce a stable order of
the properties within a jsonb value.
I think that having a jsonb_compact function that complements the existing
jsonb_pretty function is a well scoped and acceptable feature. I do not
believe that it should also take on the role of canonicalization.
I'd suggest that any discussions regarding stability of jsonb output be
given its own thread.
That topic also seems separate from a discussion on how to implement a
binary transport protocol for jsonb.
Lastly would be whether we change our default text representation so that
users utilizing COPY get the added benefit of a maximally minimized text
representation.
As an aside on the last topic, has there ever been considered to have a way
to specify a serialization function to use for a given type (or maybe
column) specified in a copy command?
Something like: COPY [...] WITH (jsonb USING jsonb_compact)
I'm thinking this would hard and undesirable given the role copy plays and
limited intelligence that it has in order to maximize its efficiency in
fulfilling its role.
Backups get compressed already so bandwidth seems the bigger goal there.
Otherwise I'd say that we lack any kind of overwhelming evidence that
making such a change would be warranted.
While these are definitely related topics it doesn't seem like any are
pre-requisites for the others. I think this thread is going to become hard
to follow and trail off it continues to try and address all of these topics
randomly as people see fit to reply. And it will quickly become hard for
anyone to jump in and understand the topics at hand.
David J.
On Tue, Apr 26, 2016 at 10:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
* Ryan Pedela (rpedela@datalanche.com) wrote:
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com>
wrote:
The default text representation of jsonb adds whitespace in between
key/value pairs (after the colon ":") and after successive properties
(after the comma ","):[...]
It'd be nice to have a stable text representation of a jsonb value with
minimal whitespace. The latter would also save a few bytes per recordin
text output formats, on the wire, and in backups (ex: COPY ... TO
STDOUT).
+1
I cannot comment on the patch itself, but I welcome jsonb_compact() or
some
way to get JSON with no inserted whitespace.
As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.
Why build a Ferrari when a skateboard would suffice? Besides, that doesn't
help one of the most common cases for JSONB: REST APIs.
Now that PG fully supports JSON, a user can use PG to construct the JSON
payload of a REST API request. Then the web server would simply be a
pass-through for the JSON payload. I personally have this use case, it is
not hypothetical. However currently, a user must parse the JSON string from
PG and re-stringify it to minimize the whitespace. Given that HTTP is
text-based, removing all extraneous whitespace is the best way to compress
it, and on top of that you can do gzip compression. Unless you are
suggesting that the binary format is just a gzipped version of the
minimized text format, I don't see how a binary format helps at all in the
REST API case.
In addition, every JSON implementation I have ever seen fully minimizes
JSON by default. PG appears to deviate from standard practice for no
apparent reason.
On Thu, Apr 28, 2016 at 10:00 AM, Ryan Pedela <rpedela@datalanche.com>
wrote:
On Tue, Apr 26, 2016 at 10:49 AM, Stephen Frost <sfrost@snowman.net>
wrote:* Ryan Pedela (rpedela@datalanche.com) wrote:
On Sun, Apr 24, 2016 at 4:02 PM, Sehrope Sarkuni <sehrope@jackdb.com>
wrote:
The default text representation of jsonb adds whitespace in between
key/value pairs (after the colon ":") and after successive properties
(after the comma ","):[...]
It'd be nice to have a stable text representation of a jsonb value
with
minimal whitespace. The latter would also save a few bytes per record
in
text output formats, on the wire, and in backups (ex: COPY ... TO
STDOUT).
+1
I cannot comment on the patch itself, but I welcome jsonb_compact() or
some
way to get JSON with no inserted whitespace.
As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.Why build a Ferrari when a skateboard would suffice? Besides, that doesn't
help one of the most common cases for JSONB: REST APIs.
I'm agreeing with this sentiment. This isn't an either-or situation so
argue the white-space removal on its own merits. The fact that we might
implement a binary representation in the future doesn't, for me, influence
whether we make this white-space change now.
Now that PG fully supports JSON, a user can use PG to construct the JSON
payload of a REST API request. Then the web server would simply be a
pass-through for the JSON payload. I personally have this use case, it is
not hypothetical.
However currently, a user must parse the JSON string from PG and
re-stringify it to minimize the whitespace. Given that HTTP is text-based,
removing all extraneous whitespace is the best way to compress it, and on
top of that you can do gzip compression.
Can you clarify what you mean by "and on top of that you can do gzip
compression"?
Unless you are suggesting that the binary format is just a gzipped version
of the minimized text format, I don't see how a binary format helps at all
in the REST API case.
No, I'm pretty sure you still end up with uncompressed text in the
application layer.
In addition, every JSON implementation I have ever seen fully minimizes
JSON by default. PG appears to deviate from standard practice for no
apparent reason.
Sorry to nit-pick but that's called convention - the standard is likely
silent on this point. And its not like this was done in a vacuum - why is
this only coming up now and not, say, during the beta? Is it surprising
that this seemingly easy-to-overlook dynamic was not explicitly addressed
by the author and reviewer of the patch, especially when implementation of
said feature consisted of a lot more things of greater import and impact?
David J.
On Wed, Apr 27, 2016 at 05:05:18PM -0400, Stephen Frost wrote:
* Merlin Moncure (mmoncure@gmail.com) wrote:
On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.-1
This will benefit pretty much nobody unless you are writing a hand
crafted C application that consumes and processes the data directly.That's not accurate. All that's needed is for the libraries which
either wrap libpq or re-implement it to be updated to understand the
format and then convert the data into whatever structure makes sense for
the language (or library that the language includes for working with
JSON data).One of the unfortunate realities with JSON is that there isn't a
terribly good binary representation, afaik. As I understand it, BSON
doesn't support all the JSON structures that we do; if it did, I'd
suggest we provide a way to convert our structure into BSON.
How about MessagePack?
http://msgpack.org/index.html
Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david.fetter@gmail.com
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
David G. Johnston wrote:
On Thu, Apr 28, 2016 at 10:00 AM, Ryan Pedela <rpedela@datalanche.com>
wrote:
In addition, every JSON implementation I have ever seen fully minimizes
JSON by default. PG appears to deviate from standard practice for no
apparent reason.Sorry to nit-pick but that's called convention - the standard is likely
silent on this point. And its not like this was done in a vacuum - why is
this only coming up now and not, say, during the beta? Is it surprising
that this seemingly easy-to-overlook dynamic was not explicitly addressed
by the author and reviewer of the patch, especially when implementation of
said feature consisted of a lot more things of greater import and impact?
Actually we did have someone come up with a patch to "normalize" how
JSON stuff was output, because our code seems to do it in three
different, inconsistent ways. And our response was for them to get the
heck outta here, because we're so much in love with our current
practice that we don't need to refactor the three implementations into a
single one.
Personally I don't see any reason we need to care one bit about how the
irrelevant whitespace is laid out, in other words why shouldn't we just
strip them all out? Surely there's no backwards compatibility argument
there. If somebody wants to see a nicely laid out structure they can
use the prettification function.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Apr 27, 2016 at 7:09 PM, David G. Johnston <
david.g.johnston@gmail.com> wrote:
On Sun, Apr 24, 2016 at 3:02 PM, Sehrope Sarkuni <sehrope@jackdb.com>
wrote:Attached is a *very* work in progress patch that adds a
jsonb_compact(jsonb)::text function. It generates a text representation
without extra whitespace but does not yet try to enforce a stable order of
the properties within a jsonb value.I think that having a jsonb_compact function that complements the
existing jsonb_pretty function is a well scoped and acceptable feature. I
do not believe that it should also take on the role of canonicalization.I'd suggest that any discussions regarding stability of jsonb output be
given its own thread.
I'm fine with removing the stability aspect. I think it's nice-to-have but
it definitely complicates things and has longer term consequences.
That topic also seems separate from a discussion on how to implement a
binary transport protocol for jsonb.
Defining a binary format for jsonb is definitely out of scope.
Lastly would be whether we change our default text representation so that
users utilizing COPY get the added benefit of a maximally minimized text
representation.
I see this applying to both COPY and the text format on the wire. The
latter has the added benefit that it works with existing clients without
any driver changes.
Outside of being a bit more pleasant in psql, I don't see a point in the
added whitespace for jsonb::text. Even in psql it only helps with small
fields as anything big isn't really legible without indenting it via
jsonb_pretty(...).
As an aside on the last topic, has there ever been considered to have a
way to specify a serialization function to use for a given type (or maybe
column) specified in a copy command?Something like: COPY [...] WITH (jsonb USING jsonb_compact)
I'm thinking this would hard and undesirable given the role copy plays and
limited intelligence that it has in order to maximize its efficiency in
fulfilling its role.Backups get compressed already so bandwidth seems the bigger goal there.
Otherwise I'd say that we lack any kind of overwhelming evidence that
making such a change would be warranted.While these are definitely related topics it doesn't seem like any are
pre-requisites for the others. I think this thread is going to become hard
to follow and trail off it continues to try and address all of these topics
randomly as people see fit to reply. And it will quickly become hard for
anyone to jump in and understand the topics at hand.
That's a really cool idea but agree it's way out of scope for this.
I had a related idea, maybe something similar could be done for psql to set
a jsonb output format. That way you could automatically prettify jsonb
fields client side.
Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
On 04/28/2016 04:29 PM, Alvaro Herrera wrote:
David G. Johnston wrote:
On Thu, Apr 28, 2016 at 10:00 AM, Ryan Pedela <rpedela@datalanche.com>
wrote:In addition, every JSON implementation I have ever seen fully minimizes
JSON by default. PG appears to deviate from standard practice for no
apparent reason.Sorry to nit-pick but that's called convention - the standard is likely
silent on this point. And its not like this was done in a vacuum - why is
this only coming up now and not, say, during the beta? Is it surprising
that this seemingly easy-to-overlook dynamic was not explicitly addressed
by the author and reviewer of the patch, especially when implementation of
said feature consisted of a lot more things of greater import and impact?Actually we did have someone come up with a patch to "normalize" how
JSON stuff was output, because our code seems to do it in three
different, inconsistent ways. And our response was for them to get the
heck outta here, because we're so much in love with our current
practice that we don't need to refactor the three implementations into a
single one.
That's a pretty bad mischaracterization of the discussion. What was
proposed was broken, as I pointed out to the OP, and then again later to
you when you asked about it. What he wanted to achieve simply couldn't
be done they way he was trying to achieve it.
Regarding the present proposal:
I wouldn't necessarily be opposed to us having one or more of the following:
a) suppressing whitespace addition in all json generation and text
output, possibly governed by a GUC setting so we could maintain
behaviour compatibility if required
b) a function to remove whitespace from json values, but otherwise
preserve things like key order
c) a function to pretty-print json similar to the output from jsonb, but
again preserving key order
d) a function to reorder keys in json so they were sorted according to
the relevant collation.
None of these things except possibly the last should be terribly
difficult to do.
cheers
andrew
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Apr 29, 2016 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
On 04/28/2016 04:29 PM, Alvaro Herrera wrote:
Actually we did have someone come up with a patch to "normalize" how
JSON stuff was output, because our code seems to do it in three
different, inconsistent ways. And our response was for them to get the
heck outta here, because we're so much in love with our current
practice that we don't need to refactor the three implementations into a
single one.That's a pretty bad mischaracterization of the discussion. What was
proposed was broken, as I pointed out to the OP, and then again later to
you when you asked about it. What he wanted to achieve simply couldn't be
done they way he was trying to achieve it.
Yeah, the original request was pretty invalid, but when I've proposed to
resubmit just the normalization of whitespace nobody has shown enthusiasm
about the idea either:
/messages/by-id/CACACo5QKOiZ-00Jf6W2Uf0Pst05qRekQ9UzssyBL0m9FGKdS2Q@mail.gmail.com
Regarding the present proposal:
I wouldn't necessarily be opposed to us having one or more of the
following:a) suppressing whitespace addition in all json generation and text output,
possibly governed by a GUC setting so we could maintain behaviour
compatibility if required
I'm not thrilled about GUC that would silently break stuff. That being
said, if someone's code is dependent on exact placement of whitespace in
the JSON text, it's pretty broken already and it's just a matter of time
when they hit an issue there.
b) a function to remove whitespace from json values, but otherwise
preserve things like key order
c) a function to pretty-print json similar to the output from jsonb, but
again preserving key order
d) a function to reorder keys in json so they were sorted according to the
relevant collation.None of these things except possibly the last should be terribly difficult
to do.
My vote goes to remove all optional whitespace by default and have a single
function to prettify it. Key reordering can then be handled with an
optional parameter to such prettifying function.
It would probably make sense model this function after Python's
"dump-to-JSON-string" function:
https://docs.python.org/2/library/json.html#json.dumps With the optional
parameters for sorting the keys, indentation size and punctuation. This
way all the prettiness enhancements could be contained in a single function
w/o the need for generalized interface used in many places.
How about that?
--
Alex
On Wed, Apr 27, 2016 at 4:05 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Merlin Moncure (mmoncure@gmail.com) wrote:
On Tue, Apr 26, 2016 at 11:49 AM, Stephen Frost <sfrost@snowman.net> wrote:
As I mentioned to Sehrope on IRC, at least for my 2c, if you want a
compact JSON format to reduce the amount of traffic over the wire or to
do things with on the client side, we should probably come up with a
binary format, rather than just hack out the whitespace. It's not like
representing numbers using ASCII characters is terribly efficient
either.-1
This will benefit pretty much nobody unless you are writing a hand
crafted C application that consumes and processes the data directly.That's not accurate. All that's needed is for the libraries which
either wrap libpq or re-implement it to be updated to understand the
format and then convert the data into whatever structure makes sense for
the language (or library that the language includes for working with
JSON data).
Sure, that's pretty easy. Note, I cowrote the only libpq wrapper
that demystifies pg binary formats, libpqtypes. I can tell you that
binary formats are much faster than text formats in any cases where
parsing is non trivial -- geo types, timestamp types, containers etc.
However I would be very surprised if postgres binary format json
would replace language parsing of json in any popular language like
java for common usage.
I'll go further. Postgres json support has pretty much made our
binary formats obsolete. The main problem with text format data was
sending complex structured data to the client over our overlapping
escape mechanisms; client side parsing was slow and in certain
scenarios backslashes would proliferate exponentially. json support
eliminates all of those problems and the performance advantages of
binary support (mainly parsing of complex types) rarely justify the
development headaches. These days, for the vast majority of data
traffic to the application it's a single row, single column json
coming in and out of the database.
I'd venture to guess this is a tiny fraction of pg users these days.
I do not understand at all the objection to removing whitespace.
Extra whitespace does nothing but pad the document as humans will
always run the document through a prettifier tuned to their specific
requirements (generally starting with, intelligent placement of
newlines) if reading directly.The objection is that it's a terribly poor solution as it simply makes
things ugly for a pretty small amount of improvement. Looking at it
from the perspective of just "whitespace is bad!"
Whitespace is bad, because it simply pads documents on every stage of
processing. You simply can't please everyone in terms of where it
should go so you don't and reserve that functionality for
prettification functions. json is for *data serialization*. We
should not inject extra characters for aesthetics in the general case;
reducing memory consumption by 10% on both the client and server
during parse is a feature.
Andrew mentions several solutions. I like them all except I would
prefer not to introduce a GUC for controlling the output format. I do
not think it's a good idea to set the expectation that clients can
rely on text out byte for byte for any type including json.
merlin
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers