Bypassing NULL elements in row_to_json function

Started by Michael Nolanabout 10 years ago16 messagesgeneral
Jump to latest
#1Michael Nolan
htfoot@gmail.com

I'm looking at the possibility of using JSON as a data exchange format
with some apps running on both PCs and Macs. .

The table I would be exporting has a lot of NULL values in it. Is
there any way to skip the NULL values in the row_to_json function and
include only the fields that are non-null?
--
Mike Nolan
nolan@tssi.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In reply to: Michael Nolan (#1)
Re: Bypassing NULL elements in row_to_json function

On 08/04/2016 16:31, Michael Nolan wrote:

I'm looking at the possibility of using JSON as a data exchange format
with some apps running on both PCs and Macs. .

The table I would be exporting has a lot of NULL values in it. Is
there any way to skip the NULL values in the row_to_json function and
include only the fields that are non-null?

You could use a CTE to filter out the nulls (not tested - I haven't used
JSON in PG (yet!)):

with no_nulls as (
select ... from my_table
where whatever is not null
)
select row_to_json(....) from no_nulls;

Ray.

--
Raymond O'Donnell :: Galway :: Ireland
rod@iol.ie

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Raymond O'Donnell (#2)
Re: Bypassing NULL elements in row_to_json function

On Fri, Apr 8, 2016 at 8:53 AM, Raymond O'Donnell <rod@iol.ie> wrote:

On 08/04/2016 16:31, Michael Nolan wrote:

I'm looking at the possibility of using JSON as a data exchange format
with some apps running on both PCs and Macs. .

The table I would be exporting has a lot of NULL values in it. Is
there any way to skip the NULL values in the row_to_json function and
include only the fields that are non-null?

You could use a CTE to filter out the nulls (not tested - I haven't used
JSON in PG (yet!)):

with no_nulls as (
select ... from my_table
where whatever is not null
)
select row_to_json(....) from no_nulls;

​One of us is confused. I'm reading this as "I want a row_to_json" call to
generate objects with different keys depending on whether a given key would
have a null - in which case exclude the key.

I
​ think one would have to simply allow row_to_json to populate the keys
with null values and then post-process them away:

json_strip_nulls(row_to_json(...))​

http://www.postgresql.org/docs/current/static/functions-json.html

David J.

#4Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Michael Nolan (#1)
Re: Bypassing NULL elements in row_to_json function

On 04/08/2016 08:31 AM, Michael Nolan wrote:

I'm looking at the possibility of using JSON as a data exchange format
with some apps running on both PCs and Macs. .

The table I would be exporting has a lot of NULL values in it. Is
there any way to skip the NULL values in the row_to_json function and
include only the fields that are non-null?

I guess it depends on your data.

Are the NULLs all in one field or scattered across fields?

Imagining this scenario:

fld_1 fld_2 fld_3
'val1_1' NULL 'val1_3
NULL 'val2_2' 'val2_3'
'val3_3' 'val3_2' NULL

How do you deal with the holes(NULL) on the receiving end?

--
Mike Nolan
nolan@tssi.com

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Michael Nolan
htfoot@gmail.com
In reply to: Adrian Klaver (#4)
Re: Bypassing NULL elements in row_to_json function

It looks like json_strip_nulls() may be what I need, I'm currently on 9.3,
which doesn't have that function but may be in a position to upgrade to 9.5
this summer. I think the apps that would be receiving the data can deal
with any resulting 'holes' in the data set by just setting them to null.
--
Mike Nolan

#6Michael Nolan
htfoot@gmail.com
In reply to: Michael Nolan (#5)
Re: Bypassing NULL elements in row_to_json function

I was able to try it on a test server, the combination of row_to_json() and
json_strip_nulls() worked exactly as I had hoped. Stripping nulls reduced
the JSON file by over 50%. (The data I needed to export has around 900,000
rows, so it gets quite large.)

I've got a test file I can make available to app developers.

My next task is to find out if validating and importing a JSON file into a
table is as easy as exporting a table in JSON turned out to be. Thanks for
the help.
--
Mike Nolan

#7Michael Nolan
htfoot@gmail.com
In reply to: Michael Nolan (#6)
Re: Bypassing NULL elements in row_to_json function

2nd Followup: It turns out that loading a table from a JSON string is more
complicated than going from a table to JSON, perhaps for good reason.
There does not appear to be a direct inverse to the row_to_json() function,
but it wasn't difficult for me to write a PHP program that takes the JSON
file I created the other day and converts it back to a series of inserts,
recreating the original table.

Of course this simple program does NO validation (not that this file needed
any), so if the JSON string is not well-formed for any of a number of
reasons, or if it is not properly mapped to the table into which the
inserts are made, an insert could fail or result in incorrect data.
--
Mike Nolan

#8David G. Johnston
david.g.johnston@gmail.com
In reply to: Michael Nolan (#7)
Re: Bypassing NULL elements in row_to_json function

On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com> wrote:

2nd Followup: It turns out that loading a table from a JSON string is
more complicated than going from a table to JSON, perhaps for good reason.
There does not appear to be a direct inverse to the row_to_json() function,
but it wasn't difficult for me to write a PHP program that takes the JSON
file I created the other day and converts it back to a series of inserts,
recreating the original table.

Of course this simple program does NO validation (not that this file
needed any), so if the JSON string is not well-formed for any of a number
of reasons, or if it is not properly mapped to the table into which the
inserts are made, an insert could fail or result in incorrect data.
--
Mike Nolan

​See: http://www.postgresql.org/docs/9.5/interactive/functions-json.html

​json_populate_record(base anyelement, from_json json)
json_populate_recordset(base anyelement, from_json json)

Exists in 9.3 too...though if you are going heavy json I'd suggest doing
whatever you can to keep up with the recent releases.

David J.

#9Michael Nolan
htfoot@gmail.com
In reply to: David G. Johnston (#8)
Re: Bypassing NULL elements in row_to_json function

On Sun, Apr 10, 2016 at 2:30 AM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com> wrote:

2nd Followup: It turns out that loading a table from a JSON string is
more complicated than going from a table to JSON, perhaps for good reason.
There does not appear to be a direct inverse to the row_to_json() function,
but it wasn't difficult for me to write a PHP program that takes the JSON
file I created the other day and converts it back to a series of inserts,
recreating the original table.

Of course this simple program does NO validation (not that this file
needed any), so if the JSON string is not well-formed for any of a number
of reasons, or if it is not properly mapped to the table into which the
inserts are made, an insert could fail or result in incorrect data.
--
Mike Nolan

​See: http://www.postgresql.org/docs/9.5/interactive/functions-json.html

​json_populate_record(base anyelement, from_json json)
json_populate_recordset(base anyelement, from_json json)

Exists in 9.3 too...though if you are going heavy json I'd suggest doing
whatever you can to keep up with the recent releases.

David J.

If there's a way to use the json_populate_record() or
json_populate_recordset() functions to load a table from a JSON file (eg,
using copy), it would be nice if it was better documented. I did find a
tool that loads a JSON file into a table (pgfutter), and even loaded one
row from that table into another table using json_populate_record(), but
the 'subquery returned multiple rows' issue wouldn't let me do the entire
table.

But that still doesn't deal with validating individual fields or checking
that the JSON is complete and consistent with the table to be loaded.
--
Mike Nolan

#10David G. Johnston
david.g.johnston@gmail.com
In reply to: Michael Nolan (#9)
Re: Bypassing NULL elements in row_to_json function

On Sun, Apr 10, 2016 at 7:49 AM, Michael Nolan <htfoot@gmail.com> wrote:

On Sun, Apr 10, 2016 at 2:30 AM, David G. Johnston <
david.g.johnston@gmail.com> wrote:

On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com> wrote:

2nd Followup: It turns out that loading a table from a JSON string is
more complicated than going from a table to JSON, perhaps for good reason.
There does not appear to be a direct inverse to the row_to_json() function,
but it wasn't difficult for me to write a PHP program that takes the JSON
file I created the other day and converts it back to a series of inserts,
recreating the original table.

Of course this simple program does NO validation (not that this file
needed any), so if the JSON string is not well-formed for any of a number
of reasons, or if it is not properly mapped to the table into which the
inserts are made, an insert could fail or result in incorrect data.
--
Mike Nolan

​See: http://www.postgresql.org/docs/9.5/interactive/functions-json.html

​json_populate_record(base anyelement, from_json json)
json_populate_recordset(base anyelement, from_json json)

Exists in 9.3 too...though if you are going heavy json I'd suggest doing
whatever you can to keep up with the recent releases.

David J.

If there's a way to use the json_populate_record() or
json_populate_recordset() functions to load a table from a JSON file (eg,
using copy), it would be nice if it was better documented. I did find a
tool that loads a JSON file into a table (pgfutter), and even loaded one
row from that table into another table using json_populate_record(), but
the 'subquery returned multiple rows' issue wouldn't let me do the entire
table.

But that still doesn't deal with validating individual fields or checking
that the JSON is complete and consistent with the table to be loaded.

It isn't that involved once you've learned generally how to call normal
record functions and also set-returning functions (that later must be in
the FROM clause of the query). If you provide what you attempted its
becomes easier to explain away your mis-understanding.

It doesn't work with COPY. You have to write an explicit INSERT+SELECT
query where the text of the JSON is a parameter. Your client library
should let you do this. If you are using "psql", which doesn't support
parameters, you up having to store the json in a psql variable and
reference that in the function.
​​
INSERT INTO %I SELECT * FROM json_populate_recordset(null::%I, $1)

​The function ensures that column order is consistent so "INSERT INTO %I"
is all you need to write.



​Data validation is why we invented CHECK constraints - if you need more
functionality than the simple mechanical conversion from a json object to a
table row you will need to write code somewhere to do the additional work.
All json_populate_record(set) promises is that the above command will work.

I suppose the way you'd write your attempt that failed would be similar to:

INSERT INTO %I
SELECT rec.* FROM src_table_with_json LATERAL
json_populate_record(null::%I, src_table_with_json.json_column);

Again, seeing what you actually did would be helpful - I'm having trouble
imaging what you did to provoke that particular error.

David J.

#11Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Michael Nolan (#9)
Re: Bypassing NULL elements in row_to_json function

On 04/10/2016 07:49 AM, Michael Nolan wrote:

On Sun, Apr 10, 2016 at 2:30 AM, David G. Johnston
<david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:

On Sat, Apr 9, 2016 at 9:48 PM, Michael Nolan <htfoot@gmail.com
<mailto:htfoot@gmail.com>>wrote:

2nd Followup: It turns out that loading a table from a JSON
string is more complicated than going from a table to JSON,
perhaps for good reason. There does not appear to be a direct
inverse to the row_to_json() function, but it wasn't difficult
for me to write a PHP program that takes the JSON file I created
the other day and converts it back to a series of inserts,
recreating the original table.

Of course this simple program does NO validation (not that this
file needed any), so if the JSON string is not well-formed for
any of a number of reasons, or if it is not properly mapped to
the table into which the inserts are made, an insert could fail
or result in incorrect data.
--
Mike Nolan

​See: http://www.postgresql.org/docs/9.5/interactive/functions-json.html

​json_populate_record(base anyelement, from_json json)
json_populate_recordset(base anyelement, from_json json)

Exists in 9.3 too...though if you are going heavy json I'd suggest
doing whatever you can to keep up with the recent releases.

David J.

If there's a way to use the json_populate_record() or
json_populate_recordset() functions to load a table from a JSON file
(eg, using copy), it would be nice if it was better documented. I did
find a tool that loads a JSON file into a table (pgfutter), and even
loaded one row from that table into another table using
json_populate_record(), but the 'subquery returned multiple rows' issue
wouldn't let me do the entire table.

Does the receiving table have the same structure as the sending table?

Is the receiving table already populated with data?

But that still doesn't deal with validating individual fields or
checking that the JSON is complete and consistent with the table to be
loaded.

Well you know the JSON is not complete as you dropped all the fields in
each row that had NULL values.
Validation is a more complex subject and honestly something I do not
think could be accomplished in straight SQL. In other words it would
need to be run through some ETL tool. I use Python so as an example:

https://petl.readthedocs.org/en/latest/

In particular:

https://petl.readthedocs.org/en/latest/io.html#json-files

https://petl.readthedocs.org/en/latest/transform.html#validation

--
Mike Nolan

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Michael Nolan
htfoot@gmail.com
In reply to: David G. Johnston (#10)
Re: Bypassing NULL elements in row_to_json function

Here's what I did:

\d gold1604_test
Table "uscf.gold1604_test"
Column | Type | Modifiers
--------+------+-----------
data | json |

Some sample data:
{"id":"10000001","name":"MISNER, J
NATHAN","st":"NY","exp":"2012-05-31","sts":
"A"} +

{"id":"10000002","name":"MISNER,
JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
+

{"id":"10000003","name":"MISNER, J
AMSCHEL","st":"NY","exp":"2007-05-31","sts"
:"A"}+

uscf-> \d goldmast_test
Table "uscf.goldmast_test"
Column | Type | Modifiers
--------+-----------------------+-----------
id | character varying(8) |
name | character varying(40) |
st | character varying(2) |
exp | date |
sts | character(1) |
supp | date |
rrtg | character varying(8) |
qrtg | character varying(8) |
brtg | character varying(8) |
oqrtg | character varying(8) |
obrtg | character varying(8) |
fid | character varying(12) |

insert into goldmast_test select * from
json_populate_record(NULL::"goldmast_test", (select * from gold1604_test
limit 1) )
produces:
uscf=> select * from goldmast_test;
id | name | st | exp | sts | supp | rrtg | qrtg |
brtg
| oqrtg | obrtg | fid
----------+------------------+----+------------+-----+------+------+------+-----
-+-------+-------+-----
10000001 | MISNER, J NATHAN | NY | 2012-05-31 | A | | | |
| | |
(1 row)

The fact that the null values were stripped out is not an issue here.

But,
uscf=> insert into goldmast_test select * from
json_populate_record(NULL::"goldmast_test", (select * from gold1604_test
limit 2) )
uscf-> \g
ERROR: more than one row returned by a subquery used as an expression

Is there a way to get around the one row per subquery issue?
--
Mike Nolan

#13Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Michael Nolan (#12)
Re: Bypassing NULL elements in row_to_json function

On 04/10/2016 08:39 AM, Michael Nolan wrote:

Here's what I did:

\d gold1604_test
Table "uscf.gold1604_test"
Column | Type | Modifiers
--------+------+-----------
data | json |

Some sample data:
{"id":"10000001","name":"MISNER, J
NATHAN","st":"NY","exp":"2012-05-31","sts":
"A"} +

{"id":"10000002","name":"MISNER,
JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
+

{"id":"10000003","name":"MISNER, J
AMSCHEL","st":"NY","exp":"2007-05-31","sts"
:"A"}+

uscf-> \d goldmast_test
Table "uscf.goldmast_test"
Column | Type | Modifiers
--------+-----------------------+-----------
id | character varying(8) |
name | character varying(40) |
st | character varying(2) |
exp | date |
sts | character(1) |
supp | date |
rrtg | character varying(8) |
qrtg | character varying(8) |
brtg | character varying(8) |
oqrtg | character varying(8) |
obrtg | character varying(8) |
fid | character varying(12) |

insert into goldmast_test select * from
json_populate_record(NULL::"goldmast_test", (select * from gold1604_test
limit 1) )
produces:
uscf=> select * from goldmast_test;
id | name | st | exp | sts | supp | rrtg |
qrtg | brtg
| oqrtg | obrtg | fid
----------+------------------+----+------------+-----+------+------+------+-----
-+-------+-------+-----
10000001 | MISNER, J NATHAN | NY | 2012-05-31 | A | | | |
| | |
(1 row)

The fact that the null values were stripped out is not an issue here.

But,
uscf=> insert into goldmast_test select * from
json_populate_record(NULL::"goldmast_test", (select * from gold1604_test
limit 2) )
uscf-> \g
ERROR: more than one row returned by a subquery used as an expression

Is there a way to get around the one row per subquery issue?

Per Davids post:

http://www.postgresql.org/docs/9.5/interactive/functions-json.html

json_populate_recordset(base anyelement, from_json json)

Expands the outermost array of objects in from_json to a set of rows
whose columns match the record type defined by base (see note below).
select * from json_populate_recordset(null::myrowtype,
'[{"a":1,"b":2},{"a":3,"b":4}]')

--
Mike Nolan

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14David G. Johnston
david.g.johnston@gmail.com
In reply to: Michael Nolan (#12)
Re: Bypassing NULL elements in row_to_json function

On Sun, Apr 10, 2016 at 8:39 AM, Michael Nolan <htfoot@gmail.com> wrote:

Here's what I did:

\d gold1604_test
Table "uscf.gold1604_test"
Column | Type | Modifiers
--------+------+-----------
data | json |

Some sample data:
{"id":"10000001","name":"MISNER, J
NATHAN","st":"NY","exp":"2012-05-31","sts":
"A"} +

{"id":"10000002","name":"MISNER,
JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
+

{"id":"10000003","name":"MISNER, J
AMSCHEL","st":"NY","exp":"2007-05-31","sts"
:"A"}+

​(I think) PostgreSQL assumes that there is only a single top-level json
element, whether it be an array or an object. The first thing you'd have
to do is split on the newline and create a PostgreSQL text array.

But,
uscf=> insert into goldmast_test select * from
json_populate_record(NULL::"goldmast_test", (select * from gold1604_test
limit 2) )
uscf-> \g
ERROR: more than one row returned by a subquery used as an expression

Is there a way to get around the one row per subquery issue?

​Yes, use LATERAL.

Something like the following should work (not tested):

INSERT INTO goldmast_test
SELECT jpr.*
FROM gold1604_test
LATERAL json_populate_record(null::goldmast_test", data) AS jpr

ideally you could just do (not tested):

INSERT INTO goldmast_test
SELECT jpr.*
FROM (SELECT t::json FROM unnest(string_to_array(?::text, E'\n') un (t)))
src j
LATERAL json_populate_record(null::goldmast_test", j) AS jpr

Where the "?::text" is placeholder for the textual JSON being handed to the
query thus avoiding the temporary "gold1604_test" table.

David J.

#15Michael Nolan
htfoot@gmail.com
In reply to: David G. Johnston (#14)
Re: Bypassing NULL elements in row_to_json function

In case it wasn't clear, the sample data was 3 rows of data. (There are
actually around 890K rows in the table pgfutter built from the JSON file.)
-
Mike Nolan

#16Adrian Klaver
adrian.klaver@aklaver.com
In reply to: David G. Johnston (#14)
Re: Bypassing NULL elements in row_to_json function

On 04/10/2016 09:24 AM, David G. Johnston wrote:

On Sun, Apr 10, 2016 at 8:39 AM, Michael Nolan <htfoot@gmail.com
<mailto:htfoot@gmail.com>>wrote:

Here's what I did:

\d gold1604_test
Table "uscf.gold1604_test"
Column | Type | Modifiers
--------+------+-----------
data | json |

Some sample data:
{"id":"10000001","name":"MISNER, J
NATHAN","st":"NY","exp":"2012-05-31","sts":
"A"} +

{"id":"10000002","name":"MISNER,
JUDY","st":"TN","exp":"2007-07-31","sts":"I"}
+

{"id":"10000003","name":"MISNER, J
AMSCHEL","st":"NY","exp":"2007-05-31","sts"
:"A"}+

​(I think) PostgreSQL assumes that there is only a single top-level json
element, whether it be an array or an object. The first thing you'd
have to do is split on the newline and create a PostgreSQL text array.

But,
uscf=> insert into goldmast_test select * from
json_populate_record(NULL::"goldmast_test", (select * from
gold1604_test limit 2) )
uscf-> \g
ERROR: more than one row returned by a subquery used as an expression

Is there a way to get around the one row per subquery issue?

​Yes, use LATERAL.

Something like the following should work (not tested):

INSERT INTO goldmast_test
SELECT jpr.*
FROM gold1604_test
LATERAL json_populate_record(null::goldmast_test", data) AS jpr

I can confirm this works after a little clean up:

test=> INSERT INTO goldmast_test
SELECT jpr.*
FROM gold1604_test,
LATERAL json_populate_record(null::"goldmast_test", data) AS jpr;
INSERT 0 3

test=> select * from goldmast_test ;
id | name | st | exp | sts | supp | rrtg |
qrtg | brtg | oqrtg | obrtg | fid
----------+-------------------+----+------------+-----+------+------+------+------+-------+-------+-----
10000001 | MISNER, J NATHAN | NY | 2012-05-31 | A | | |
| | | |
10000002 | MISNER, JUDY | TN | 2007-07-31 | I | | |
| | | |
10000003 | MISNER, J AMSCHEL | NY | 2007-05-31 | A | | |
| | | |
(3 rows)

While trying to figure out how it works I discovered the LATERAL is not
necessary:

test=> INSERT INTO goldmast_test
SELECT jpr.*
FROM gold1604_test,
json_populate_record(null::"goldmast_test", data) AS jpr;
INSERT 0 3

test=> select * from goldmast_test ;
id | name | st | exp | sts | supp | rrtg |
qrtg | brtg | oqrtg | obrtg | fid
----------+-------------------+----+------------+-----+------+------+------+------+-------+-------+-----
10000001 | MISNER, J NATHAN | NY | 2012-05-31 | A | | |
| | | |
10000002 | MISNER, JUDY | TN | 2007-07-31 | I | | |
| | | |
10000003 | MISNER, J AMSCHEL | NY | 2007-05-31 | A | | |
| | | |
10000001 | MISNER, J NATHAN | NY | 2012-05-31 | A | | |
| | | |
10000002 | MISNER, JUDY | TN | 2007-07-31 | I | | |
| | | |
10000003 | MISNER, J AMSCHEL | NY | 2007-05-31 | A | | |
| | | |
(6 rows)

ideally you could just do (not tested):

INSERT INTO goldmast_test
SELECT jpr.*
FROM (SELECT t::json FROM unnest(string_to_array(?::text, E'\n') un
(t))) src j
LATERAL json_populate_record(null::goldmast_test", j) AS jpr

Where the "?::text" is placeholder for the textual JSON being handed to
the query thus avoiding the temporary "gold1604_test" table.

David J.

--
Adrian Klaver
adrian.klaver@aklaver.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general