nested xml/json to table

Started by Wim Bertelsabout 3 years ago3 messagesgeneral
Jump to latest
#1Wim Bertels
wim.bertels@ucll.be

Hello,

in python pandas there is for example a json_normalize function,
i didn't find something similar or better in postgresql?

what would be the general idea: "easily" convert an hierarchical
structure like json or xml to a table; for example creating columns by
appending the key-names when going doing down the three, using null for
empty values, adding more columns as needed by the given structure.
(1-way operation)

a few conceptual gists:
jsonX=
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Sta..";
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-m..",
"GlossSeeAlso": 
["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}

select *
from json_to_table('jsonX');
-- generated columns with no data/only nulls could be removed..
-- arrays could be unnested in the process as well

glossary | glossary.title | glossary.title.GlossDiv.title | ..
-------------------------------------------------------------
null | example glossary | S |
..

the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee
with value "markup"

---

what if there are different structures that need to be combined?
(they could be added in the same manner as before)

jsonY=
{
s1:[{
"f1": "a",
"f2": "b",
"f3": { "f3.1": "c",
"f3.2": "d"}
},
{
"f1": "e",
"f4": "g"
}
]
}

select *
from json_to_table('jsonY');
-- generated columns with no data/only nulls could be removed..
-- separator sign is untrusted

s1 | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4
-------------------------------------------------------------
null| a | b | null | c | d | null
null| e | null | null | null | null | g

any ideas or suggestions (apart from plpython)?
Wim

#2Thomas Kellerer
shammat@gmx.net
In reply to: Wim Bertels (#1)
Re: nested xml/json to table

Wim Bertels schrieb am 17.03.2023 um 11:05:

what would be the general idea: "easily" convert an hierarchical
structure like json or xml to a table; for example creating columns by
appending the key-names when going doing down the three, using null for
empty values, adding more columns as needed by the given structure.
(1-way operation)

a few conceptual gists:
jsonX=
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Sta..";
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-m..",
"GlossSeeAlso": 
["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}

select *
from json_to_table('jsonX');
-- generated columns with no data/only nulls could be removed..
-- arrays could be unnested in the process as well

glossary | glossary.title | glossary.title.GlossDiv.title | ..
-------------------------------------------------------------
null | example glossary | S |
..

the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee
with value "markup"

---

what if there are different structures that need to be combined?
(they could be added in the same manner as before)

jsonY=
{
s1:[{
"f1": "a",
"f2": "b",
"f3": { "f3.1": "c",
"f3.2": "d"}
},
{
"f1": "e",
"f4": "g"
}
]
}

select *
from json_to_table('jsonY');
-- generated columns with no data/only nulls could be removed..
-- separator sign is untrusted

s1 | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4
-------------------------------------------------------------
null| a | b | null | c | d | null
null| e | null | null | null | null | g

You can't have a function that returns a different set of columns each time you call it
(without specifying the output columns - which you don't want).

I have once written a function to flatten a JSON hierarchy to multiple rows.

Applied to your first example it would return the following:

path | key | value
-------------------------------------------------+--------------+-----------------
/glossary | title | example glossary
/glossary/GlossDiv | title | S
/glossary/GlossDiv/GlossList/GlossEntry | ID | SGML
/glossary/GlossDiv/GlossList/GlossEntry | Abbrev | ISO 8879:1986
/glossary/GlossDiv/GlossList/GlossEntry | SortAs | SGML
/glossary/GlossDiv/GlossList/GlossEntry | Acronym | SGML
/glossary/GlossDiv/GlossList/GlossEntry | GlossSee | markup
/glossary/GlossDiv/GlossList/GlossEntry | GlossTerm | Sta..
/glossary/GlossDiv/GlossList/GlossEntry/GlossDef | para | A meta-m..
/glossary/GlossDiv/GlossList/GlossEntry/GlossDef | GlossSeeAlso | ["GML", "XML"]

And the following for the second example:

path | key | value
-------+------+------
/s1 | f1 | a
/s1 | f2 | b
/s1/f3 | f3.1 | c
/s1/f3 | f3.2 | d
/s1 | f1 | e
/s1 | f4 | g

Thomas

Attachments:

flatten_json.sqltext/plain; charset=UTF-8; name=flatten_json.sqlDownload
#3Wim Bertels
wim.bertels@ucll.be
In reply to: Thomas Kellerer (#2)
Re: nested xml/json to table

Thomas Kellerer schreef op vr 17-03-2023 om 11:21 [+0100]:

Wim Bertels schrieb am 17.03.2023 um 11:05:

what would be the general idea: "easily" convert an hierarchical
structure like json or xml to a table; for example creating columns
by
appending the key-names when going doing down the three, using null
for
empty values, adding more columns as needed by the given structure.
(1-way operation)

You can't have a function that returns a different set of columns
each time you call it
(without specifying the output columns - which you don't want).

Hello Thomas,

thanks for the feedback,
i was wondering in the likes of existing built-in functions or
extensions (not CREATE FUNCTION)

mvg,
Wim