psql JSON output format

Started by Christoph Bergover 2 years ago33 messageshackers

myon@debian.org

over 2 years ago

I noticed psql was lacking JSON formatting of query results which I
need for a follow-up patch. It also seems useful generally, so here's
a patch:

postgres=# \pset format json
Output format is json.
postgres=# select * from (values ('one', 2, 'three'), ('four', 5, 'six')) as sub(a, b, c);
[
{ "a": "one", "b": "2", "c": "three" },
{ "a": "four", "b": "5", "c": "six" }
]
postgres=# \x
Expanded display is on.
postgres=# select * from (values ('one', 2, 'three'), ('four', 5, 'six')) as sub(a, b, c);
[{
"a": "one",
"b": "2",
"c": "three"
},{
"a": "four",
"b": "5",
"c": "six"
}]
postgres=#

Both normal and expanded output format are optimized for readability
while still saving screen space.

Both formats output the same JSON structure, an array of objects.
Other variants like array-of-arrays or line-separated objects
("jsonline") might be possible, but I didn't want to overengineer it.

On the command line, the format is selected by `psql --json` and `psql -J`.
(I'm not attached to the short option, but -J was free and it's in
line with `psql -H` to select HTML.)

Christoph

Christoph Berg

myon@debian.org

over 2 years ago

In reply to: Christoph Berg (#1)

Re: psql JSON output format

Re: To PostgreSQL Hackers

On the command line, the format is selected by `psql --json` and `psql -J`.

Among other uses, it enables easy post-processing of psql output using `jq`:

$ psql -lJ | jq
[
{
"Name": "myon",
"Owner": "myon",
"Encoding": "UTF8",
"Locale Provider": "libc",
"Collate": "de_DE.utf8",
"Ctype": "de_DE.utf8",
"ICU Locale": null,
"ICU Rules": null,
"Access privileges": null
},
...
]

Christoph

Jelte Fennema-Nio

postgres@jeltef.nl

over 2 years ago

In reply to: Christoph Berg (#1)

Re: psql JSON output format

On Mon, 18 Dec 2023 at 15:56, Christoph Berg <myon@debian.org> wrote:

I noticed psql was lacking JSON formatting of query results which I
need for a follow-up patch.

This seems useful to me too, but my usecases would also be solved (and
possibly better solved) by adding JSON support to COPY as proposed
here: /messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com

I'm wondering if your follow-up patch would be better served by that too or not.

Christoph Berg

myon@debian.org

over 2 years ago

In reply to: Jelte Fennema-Nio (#3)

Re: psql JSON output format

Re: Jelte Fennema-Nio

This seems useful to me too, but my usecases would also be solved (and
possibly better solved) by adding JSON support to COPY as proposed
here: /messages/by-id/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=kcg@mail.gmail.com

Thanks for the pointer, I had not scrolled back enough to see that
thread.

I'm happy to see that this patch is also settling on "array of
objects".

I'm wondering if your follow-up patch would be better served by that too or not.

I'd need it to work on query results. Which could of course be wrapped
into "copy (select whatever) to stdout (format json)", but doing it in
psql without mangling the query is cleaner. And (see the other mail),
the psql format selection works nicely with existing queries like
`psql -l`.

And "copy format json" wouldn't support \x expanded mode.

We'd want both patches even if they do the same thing on two different
levels, I'd say.

Christoph

Jelte Fennema-Nio

postgres@jeltef.nl

over 2 years ago

In reply to: Christoph Berg (#4)

Re: psql JSON output format

On Mon, 18 Dec 2023 at 16:38, Christoph Berg <myon@debian.org> wrote:

We'd want both patches even if they do the same thing on two different
levels, I'd say.

Makes sense. One thing I was still wondering is if it wouldn't be
easier to wrap all queries in "copy (select whatever) to stdout
(format json)" automatically when the -J flag is passed psql. Because
it would be nice not to have to implement this very similar logic in
two places.

But I guess that approach does not work for commands that don't work
inside COPY, i.e. DML and DDL. I'm assuming your current patch works
fine with DML/DDL. If that's indeed the case then I agree it makes
sense to have this patch. And another big benefit is that it wouldn't
require a new Postgres server function for the json functionality of
psql.

Dean Rasheed

dean.a.rasheed@gmail.com

over 2 years ago

In reply to: Jelte Fennema-Nio (#5)

Re: psql JSON output format

On Mon, 18 Dec 2023 at 16:34, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

On Mon, 18 Dec 2023 at 16:38, Christoph Berg <myon@debian.org> wrote:

We'd want both patches even if they do the same thing on two different
levels, I'd say.

Makes sense.

I can see the appeal in this feature. However, as it stands, this
isn't compatible with copy format json, and I think it would need to
duplicate quite a lot of the JSON output code in client-side code to
make it compatible.

Consider, for example:

CREATE TABLE foo(col json);
INSERT INTO foo VALUES ('"str_value"');

copy foo to stdout with (format json) produces this:

{"col":"str_value"}

which is as expected. However, psql -Jc "select * from foo" produces

[
{ "col": "\"str_value\"" }
]

The problem is, various datatypes such as boolean, number types, json,
and jsonb must not be quoted and escaped, since that would change them
to strings or double-encode them in the result. And then there are
domain types built on top of those types, and arrays, etc. See, for
example, the logic in json_categorize_type(). I think that trying to
duplicate that client-side is doomed to failure.

Regards,
Dean

Laurenz Albe

laurenz.albe@cybertec.at

over 2 years ago

In reply to: Dean Rasheed (#6)

Re: psql JSON output format

On Mon, 2024-01-08 at 18:43 +0000, Dean Rasheed wrote:

I can see the appeal in this feature. However, as it stands, this
isn't compatible with copy format json, and I think it would need to
duplicate quite a lot of the JSON output code in client-side code to
make it compatible.

Consider, for example:

CREATE TABLE foo(col json);
INSERT INTO foo VALUES ('"str_value"');

copy foo to stdout with (format json) produces this:

{"col":"str_value"}

which is as expected. However, psql -Jc "select * from foo" produces

[
{ "col": "\"str_value\"" }
]

The problem is, various datatypes such as boolean, number types, json,
and jsonb must not be quoted and escaped, since that would change them
to strings or double-encode them in the result.

I agree that such data types should not be double quoted.

And then there are
domain types built on top of those types, and arrays, etc. See, for
example, the logic in json_categorize_type(). I think that trying to
duplicate that client-side is doomed to failure.

Perhaps. But maybe "printTableContent" could be extended to contain
a boolean array "quote_for_json" that is set in "printTableAddHeader"
based on the underlying data type, similar to how "aligns" is set now.
Detecting array types might be a challenge.

Domains might not be a problem, since "PQftype()" seems to return the
base data type for domain values.

Yours,
Laurenz Albe

Christoph Berg

myon@debian.org

over 2 years ago

In reply to: Laurenz Albe (#7)

Re: psql JSON output format

Re: Dean Rasheed

I can see the appeal in this feature. However, as it stands, this
isn't compatible with copy format json, and I think it would need to
duplicate quite a lot of the JSON output code in client-side code to
make it compatible.

I can see we probably wouldn't want two different output formats named
json, but the general idea of "allow psql to format results as json of
strings" makes a lot of sense, so we should try to make it work. Does
it even have to be compatible?

If the code required is the same, it could be moved to libpgcommon.

The problem is, various datatypes such as boolean, number types, json,
and jsonb must not be quoted and escaped, since that would change them
to strings or double-encode them in the result. And then there are
domain types built on top of those types, and arrays, etc. See, for
example, the logic in json_categorize_type(). I think that trying to
duplicate that client-side is doomed to failure.

Can we try to make it work first, before we declare the perfect the
enemy of the good?

I'll note that the current code uses PG's string representation of
strings which is meant to be round-trip safe when fed back into the
server. So quoted numeric values aren't a problem at all. (And that
part is fixable.)

The real problem here is that COPY json violates that by pulling json
values up one syntax level. "Normal" cases will be fixable by just
looking for json(b) and printing that unquoted. And composite types
with jsonb members... are these really only half-quoted?!

Re: Laurenz Albe

The problem is, various datatypes such as boolean, number types, json,
and jsonb must not be quoted and escaped, since that would change them
to strings or double-encode them in the result.

I agree that such data types should not be double quoted.

I left that out so far because it didn't make a practical difference,
but that's fixable.

And then there are
domain types built on top of those types, and arrays, etc. See, for
example, the logic in json_categorize_type(). I think that trying to
duplicate that client-side is doomed to failure.

Perhaps. But maybe "printTableContent" could be extended to contain
a boolean array "quote_for_json" that is set in "printTableAddHeader"
based on the underlying data type, similar to how "aligns" is set now.
Detecting array types might be a challenge.

Domains might not be a problem, since "PQftype()" seems to return the
base data type for domain values.

Thanks, I'll give that a try.

Christoph

Import Notes

Reply to msg id not found: 2f429e755a7ad5725bd056cefc7acf5bc45ca7bf.camel@cybertec.atCAEZATCWZWEapZU8AOc8rwEukaO_FBG-Xue0y+FCFaDFHDPUAQ@mail.gmail.com | Resolved by subject fallback

Dean Rasheed

dean.a.rasheed@gmail.com

over 2 years ago

In reply to: Christoph Berg (#8)

Re: psql JSON output format

On Tue, 9 Jan 2024 at 09:43, Christoph Berg <myon@debian.org> wrote:

I can see we probably wouldn't want two different output formats named
json, but the general idea of "allow psql to format results as json of
strings" makes a lot of sense, so we should try to make it work. Does
it even have to be compatible?

I would say that they should be compatible, in the sense that an
external tool parsing the outputs should regard them as equal, but
maybe there are different expectations for the two features.

I'll note that the current code uses PG's string representation of
strings which is meant to be round-trip safe when fed back into the
server. So quoted numeric values aren't a problem at all. (And that
part is fixable.)

I'm not sure that being round-trip safe is a necessary goal here, but
again, it's about the expectations for the feature. I was imagining
that the goal was to produce something that an external tool would
parse, rather than something Postgres would read back in. So not
quoting numeric values seems desirable to produce output that better
reflects the semantic content of the data (though it doesn't affect it
being round-trip safe).

The real problem here is that COPY json violates that by pulling json
values up one syntax level. "Normal" cases will be fixable by just
looking for json(b) and printing that unquoted. And composite types
with jsonb members... are these really only half-quoted?!

What to do with composites is an interesting point in question. COPY
format json will turn a composite into a JSON object whose keys are
the field names. That's useful if you want to use an external tool to
parse the result and get at the individual fields, but it's not
round-trip safe. OTOH, this patch outputs the Postgres string
representation of the object, which might be round-trip safe, but is
not very convenient for any other tool to read.

Regards,
Dean

#10

Christoph Berg

myon@debian.org

over 2 years ago

In reply to: Dean Rasheed (#9)

Re: psql JSON output format

Re: Dean Rasheed

I'll note that the current code uses PG's string representation of
strings which is meant to be round-trip safe when fed back into the
server. So quoted numeric values aren't a problem at all. (And that
part is fixable.)

I'm not sure that being round-trip safe is a necessary goal here, but
again, it's about the expectations for the feature. I was imagining
that the goal was to produce something that an external tool would
parse, rather than something Postgres would read back in. So not
quoting numeric values seems desirable to produce output that better
reflects the semantic content of the data (though it doesn't affect it
being round-trip safe).

Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?

OTOH, this patch outputs the Postgres string representation of the
object, which might be round-trip safe, but is not very convenient
for any other tool to read.

For my use case, I need something that can be fed back into PG.
Reassembling all the json parts back into proper values would be a
pretty hard problem.

Perhaps there should be two output formats, one that's roundtrip-safe,
and one that represents json structures and composite values nicely.
Adding format-specific options could also be used to switch the output
between "array of json objects" and "one json object per line".

Christoph

#11

Dean Rasheed

dean.a.rasheed@gmail.com

over 2 years ago

In reply to: Christoph Berg (#10)

Re: psql JSON output format

[cc'ing Joe]

On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:

Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?

Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.

OTOH, this patch outputs the Postgres string representation of the
object, which might be round-trip safe, but is not very convenient
for any other tool to read.

For my use case, I need something that can be fed back into PG.
Reassembling all the json parts back into proper values would be a
pretty hard problem.

What is your use case? It seems like what you want is quite different
from the COPY patch.

Regards,
Dean

#12

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Dean Rasheed (#11)

Re: psql JSON output format

On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:

On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:

Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?

Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.

I think the behavior is fine as it is. I'd expect both NULL and JSON "null"
to be rendered as "null". I think the main use case for a feature like this
is people who need the result in JSON for further processing somewhere else.

"Round-trip safety" is not so important. If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.

Yours,
Laurenz Albe

#13

Andrew Dunstan

andrew@dunslane.net

about 2 years ago

In reply to: Laurenz Albe (#12)

Re: psql JSON output format

On 2024-01-16 Tu 11:07, Laurenz Albe wrote:

On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:

On Tue, 9 Jan 2024 at 14:35, Christoph Berg<myon@debian.org> wrote:

Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?

Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.

I think the behavior is fine as it is. I'd expect both NULL and JSON "null"
to be rendered as "null". I think the main use case for a feature like this
is people who need the result in JSON for further processing somewhere else.

"Round-trip safety" is not so important. If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.

This is absolutely not true. The docs say about CSV format:

A NULL is output as the NULL parameter string and is not quoted,
while a non-NULL value matching the NULL parameter string is quoted.
For example, with the default settings, a NULL is written as an
unquoted empty string, while an empty string data value is written
with double quotes ("").

CSV format with default settings is and has been from the beginning
designed to be round trippable.

cheers

andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com

#14

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: Laurenz Albe (#12)

Re: psql JSON output format

On Tue, Jan 16, 2024 at 11:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

"Round-trip safety" is not so important. If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.

As Andrew says, the part about the CSV format is not correct, but I
also don't think I agree with the larger point, either. I believe that
round-trip safety is a really desirable property. Is it absolutely
necessary in every case? Maybe not. But, it shouldn't be lacking
without a good reason, either, at least IMHO. If you postulate that
people are moving data from A to B, it is reasonable to think that
eventually someone is going to want to move some data from B back to
A. If that turns out to be hard, they'll be sad. We shouldn't make
people sad without a good reason.

--
Robert Haas
EDB: http://www.enterprisedb.com

#15

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Andrew Dunstan (#13)

Re: psql JSON output format

On Tue, 2024-01-16 at 11:49 -0500, Andrew Dunstan wrote:

On 2024-01-16 Tu 11:07, Laurenz Albe wrote:

On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:

On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:

Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?

Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.

I think the behavior is fine as it is. I'd expect both NULL and JSON "null"
to be rendered as "null". I think the main use case for a feature like this
is people who need the result in JSON for further processing somewhere else.

"Round-trip safety" is not so important. If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.

This is absolutely not true.

CSV format with default settings is and has been from the beginning designed
to be round trippable.

Sorry for being unclear. I wasn't talking about COPY, but about the psql
output format:

CREATE TABLE xy (a integer, b text);

INSERT INTO xy VALUES (1, 'one'), (2, NULL), (3, '');

\pset format csv
Output format is csv.

TABLE xy;
a,b
1,one
2,
3,

Yours,
Laurenz Albe

#16

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Robert Haas (#14)

Re: psql JSON output format

On Tue, 2024-01-16 at 14:12 -0500, Robert Haas wrote:

On Tue, Jan 16, 2024 at 11:07 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

"Round-trip safety" is not so important. If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.

As Andrew says, the part about the CSV format is not correct, but I
also don't think I agree with the larger point, either. I believe that
round-trip safety is a really desirable property. Is it absolutely
necessary in every case? Maybe not. But, it shouldn't be lacking
without a good reason, either, at least IMHO. If you postulate that
people are moving data from A to B, it is reasonable to think that
eventually someone is going to want to move some data from B back to
A. If that turns out to be hard, they'll be sad. We shouldn't make
people sad without a good reason.

As mentioned in my other mail, I was talking about the psql output
format "csv" rather than about COPY.

I agree that it is desirable to lose as little information as possible.
But if we want to format query output as JSON, we have a couple of
requirements that cannot all be satisfied:

1. lose no information ("round-trip safe")

2. don't double quote numbers, booleans and other JSON values

3. don't skip any table column in the output

Christoph's original patch didn't satisfy #2, and his current version
doesn't satisfy #1. Do you think that skipping NULL columns would be
the best solution? We don't do that in the to_json() function, which
also renders SQL NULL as JSON null.

I think the argument for round-trip safety of psql output is tenuous.
There is no way for psql to ingest JSON as input format, and the patch
to add JSON as COPY format only supports COPY TO. And unless you can
name the exact way that the data written by psql will be loaded into
PostgreSQL again, all that remains is an (understandable) unease about
losing the distiction between SQL NULL and JSON null.

We have jsonb_populate_record() to convert JSON back to a table row,
but that function will convert both missing columns and a JSON null
to SQL NULL:

CREATE TABLE xy (id integer, j jsonb);

\pset null '∅'

SELECT * FROM jsonb_populate_record(NULL::xy, '{"id":1,"j":null}');

id │ j
════╪═══
1 │ ∅
(1 row)

SELECT * FROM jsonb_populate_record(NULL::xy, '{"id":1}');

id │ j
════╪═══
1 │ ∅
(1 row)

Indeed, there doesn't seem to be a way to generate JSON null with that
function.

So I wouldn't worry about round-trip safety too much, and my preference
is how the current patch does it. I am not dead set against a solution
that omits NULL columns in the output, though.

Yours,
Laurenz Albe

#17

Andrew Dunstan

andrew@dunslane.net

about 2 years ago

In reply to: Laurenz Albe (#15)

Re: psql JSON output format

On 2024-01-17 We 03:52, Laurenz Albe wrote:

On Tue, 2024-01-16 at 11:49 -0500, Andrew Dunstan wrote:

On 2024-01-16 Tu 11:07, Laurenz Albe wrote:

On Tue, 2024-01-09 at 16:51 +0000, Dean Rasheed wrote:

On Tue, 9 Jan 2024 at 14:35, Christoph Berg <myon@debian.org> wrote:

Getting it print numeric/boolean without quotes was actually easy, as
well as json(b). Implemented as the attached v2 patch.

But: not quoting json means that NULL and 'null'::json will both be
rendered as 'null'. That strikes me as a pretty undesirable conflict.
Does the COPY patch also do that?

Yes. Perhaps what needs to happen is for a NULL column to be omitted
entirely from the output. I think the COPY TO json patch would have to
do that if COPY FROM json were to be added later, to make it
round-trip safe.

I think the behavior is fine as it is. I'd expect both NULL and JSON "null"
to be rendered as "null". I think the main use case for a feature like this
is people who need the result in JSON for further processing somewhere else.

"Round-trip safety" is not so important. If you want to move data from
PostgreSQL to PostgreSQL, you use the plain or the binary format.
The CSV format by default renders NULL and empty strings identical, and
I don't think anybody objects to that.

This is absolutely not true.

CSV format with default settings is and has been from the beginning designed
to be round trippable.

Sorry for being unclear. I wasn't talking about COPY, but about the psql
output format:

CREATE TABLE xy (a integer, b text);

INSERT INTO xy VALUES (1, 'one'), (2, NULL), (3, '');

\pset format csv
Output format is csv.

TABLE xy;
a,b
1,one
2,
3,

I think the reason nobody's complained about it is quite possibly that
very few people have used it. That's certainly the case with me - if I'd
noticed it I would have complained.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

#18

Robert Haas

robertmhaas@gmail.com

about 2 years ago

In reply to: Laurenz Albe (#16)

Re: psql JSON output format

On Wed, Jan 17, 2024 at 4:30 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

As mentioned in my other mail, I was talking about the psql output
format "csv" rather than about COPY.

Oh. Well, I think it's sad that the psql format csv has that property.
Why doesn't it adopt COPY's handling?

I agree that it is desirable to lose as little information as possible.
But if we want to format query output as JSON, we have a couple of
requirements that cannot all be satisfied:

1. lose no information ("round-trip safe")

2. don't double quote numbers, booleans and other JSON values

3. don't skip any table column in the output

Christoph's original patch didn't satisfy #2, and his current version
doesn't satisfy #1. Do you think that skipping NULL columns would be
the best solution? We don't do that in the to_json() function, which
also renders SQL NULL as JSON null.

Let me start by clarifying that I'm OK with sacrificing
round-trippability here as long as we do it thoughtfully.
"Round-trippability is important but X is more important and we cannot
have both for Y reasons" seems like a potentially fine argument to me;
I'm only objecting to an argument of the form "round-trippability
doesn't even matter." My previous comment was a bit of a drive-by
remark on that specifically rather than a strong opinion about what
exactly we ought to do here.

I guess the specifically issue here is around a json(b) column that is
null at the SQL level vs one that contains a JSON null. How do we
distinguish those cases? I think entirely omitting null columns could
be a way forward, but I don't know if that would cause other problems
for users.

I'm not quite sure that addresses all the issues, though. For
instance, consider that 1.00::numeric and 1.0::numeric are equal but
distinguishable. If those get rendered into the JSON unquoted as 1.00
and 1.0, respectively, is that going to round-trip properly? What
about float8 values where extra_float_digits=3 is needed to properly
round trip? If we take PostgreSQL's array data types and turn them
into JSON arrays, what happens with non-default bounds? I know how
we're going to turn '{1,2}'::int[] into a JSON array, or at least I
assume I do, but what in the world are we going to do about
'[-3:-2]={1,2}'?

As much as I think round-trippability is good, getting it to 100% here
is probably a good bit of work. And maybe that work isn't worth doing
or involves too much collateral damage. But I do think it has positive
value. If we produce output that could be ingested back into PG later
with the right tool, that leaves the door open for someone to build
the tool later even if we don't have it today. If we produce output
that loses information, no tool built later can make up for the loss.

--
Robert Haas
EDB: http://www.enterprisedb.com

#19

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Robert Haas (#18)

Re: psql JSON output format

On Wed, 2024-01-17 at 14:52 -0500, Robert Haas wrote:

Let me start by clarifying that I'm OK with sacrificing
round-trippability here as long as we do it thoughtfully.

Got you.

I'm not quite sure that addresses all the issues, though. For
instance, consider that 1.00::numeric and 1.0::numeric are equal but
distinguishable. If those get rendered into the JSON unquoted as 1.00
and 1.0, respectively, is that going to round-trip properly? What
about float8 values where extra_float_digits=3 is needed to properly
round trip? If we take PostgreSQL's array data types and turn them
into JSON arrays, what happens with non-default bounds? I know how
we're going to turn '{1,2}'::int[] into a JSON array, or at least I
assume I do, but what in the world are we going to do about
'[-3:-2]={1,2}'?

As much as I think round-trippability is good, getting it to 100% here
is probably a good bit of work.

I would go as far as saying that the attempt to preserve all that is
futile, if you are bound to JSON as format.

But I do think it has positive
value. If we produce output that could be ingested back into PG later
with the right tool, that leaves the door open for someone to build
the tool later even if we don't have it today. If we produce output
that loses information, no tool built later can make up for the loss.

I am all for losing as little information as possible, but I think
that this discussion is going off on a tangent. After all, we are not
talking about a data export tool here, we are talking about psql.
I don't see anybody complain that float8 values lose precision in
the default aligned format, or that empty strings and NULL values
look the same in aligned format. Why do the goalposts move for the
JSON output format? I don't think psql output should be considered
a form of backup.

I'd say that we should strive to preserve whatever information we
easily can, and we shouldn't worry about the rest.

Can we get consensus that SQL NULL columns should be omitted from the
output, and the rest left as it currently is?

Yours,
Laurenz Albe

#20

Christoph Berg

myon@debian.org

about 2 years ago

In reply to: Laurenz Albe (#19)

Re: psql JSON output format

Re: Laurenz Albe

But I do think it has positive
value. If we produce output that could be ingested back into PG later
with the right tool, that leaves the door open for someone to build
the tool later even if we don't have it today. If we produce output
that loses information, no tool built later can make up for the loss.

I am all for losing as little information as possible, but I think
that this discussion is going off on a tangent. After all, we are not
talking about a data export tool here, we are talking about psql.

I've just posted the other patch where I need the JSON format:
/messages/by-id/Za6EfXeewwLRS_fs@msg.df7cb.de

There, I need to be able to read back the query output into psql,
while at the same time being human-readable so the user can sanely
edit the data in an editor. The default "aligned" format is only
human-readable, while CSV is mostly only machine-readable. JSON is the
best option between the two, I think.

What I did now in v3 of this patch is to print boolean and numeric
values (ints, floats, numeric) without quotes, while adding the quotes
back to json. This solves the NULL vs 'null'::json problem.

I don't see anybody complain that float8 values lose precision in
the default aligned format, or that empty strings and NULL values
look the same in aligned format. Why do the goalposts move for the
JSON output format? I don't think psql output should be considered
a form of backup.

Fwiw, not quoting numbers in JSON won't have any of these problems if
the JSON reader just passes the strings read through. (Which PG's JSON
parser does.)

Can we get consensus that SQL NULL columns should be omitted from the
output, and the rest left as it currently is?

I think that would be an interesting option for a JSON export format.
The psql JSON format is more for human inspection, where omitting the
columns might create confusion. (We could make it a pset parameter of
the format, but I think the default should be to show NULL columns.)

Christoph

#21

Laurenz Albe

laurenz.albe@cybertec.at

about 2 years ago

In reply to: Christoph Berg (#20)

#22