nested xml/json to table
Hello,
in python pandas there is for example a json_normalize function,
i didn't find something similar or better in postgresql?
what would be the general idea: "easily" convert an hierarchical
structure like json or xml to a table; for example creating columns by
appending the key-names when going doing down the three, using null for
empty values, adding more columns as needed by the given structure.
(1-way operation)
a few conceptual gists:
jsonX=
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Sta..";
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-m..",
"GlossSeeAlso":
["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}
select *
from json_to_table('jsonX');
-- generated columns with no data/only nulls could be removed..
-- arrays could be unnested in the process as well
glossary | glossary.title | glossary.title.GlossDiv.title | ..
-------------------------------------------------------------
null | example glossary | S |
..
the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee
with value "markup"
---
what if there are different structures that need to be combined?
(they could be added in the same manner as before)
jsonY=
{
s1:[{
"f1": "a",
"f2": "b",
"f3": { "f3.1": "c",
"f3.2": "d"}
},
{
"f1": "e",
"f4": "g"
}
]
}
select *
from json_to_table('jsonY');
-- generated columns with no data/only nulls could be removed..
-- separator sign is untrusted
s1 | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4
-------------------------------------------------------------
null| a | b | null | c | d | null
null| e | null | null | null | null | g
any ideas or suggestions (apart from plpython)?
Wim
Wim Bertels schrieb am 17.03.2023 um 11:05:
what would be the general idea: "easily" convert an hierarchical
structure like json or xml to a table; for example creating columns by
appending the key-names when going doing down the three, using null for
empty values, adding more columns as needed by the given structure.
(1-way operation)a few conceptual gists:
jsonX=
{
"glossary": {
"title": "example glossary",
"GlossDiv": {
"title": "S",
"GlossList": {
"GlossEntry": {
"ID": "SGML",
"SortAs": "SGML",
"GlossTerm": "Sta..";
"Acronym": "SGML",
"Abbrev": "ISO 8879:1986",
"GlossDef": {
"para": "A meta-m..",
"GlossSeeAlso":
["GML", "XML"]
},
"GlossSee": "markup"
}
}
}
}
}select *
from json_to_table('jsonX');
-- generated columns with no data/only nulls could be removed..
-- arrays could be unnested in the process as wellglossary | glossary.title | glossary.title.GlossDiv.title | ..
-------------------------------------------------------------
null | example glossary | S |
..the last column: glossary.title.GlossDiv.GlossList.GlossEntry.GlossSee
with value "markup"---
what if there are different structures that need to be combined?
(they could be added in the same manner as before)jsonY=
{
s1:[{
"f1": "a",
"f2": "b",
"f3": { "f3.1": "c",
"f3.2": "d"}
},
{
"f1": "e",
"f4": "g"
}
]
}select *
from json_to_table('jsonY');
-- generated columns with no data/only nulls could be removed..
-- separator sign is untrusteds1 | s1.f1 | s1.f2 | s1.f3 | s1.f3.f3.1 | s1.f3.f3.2 | s1.f4
-------------------------------------------------------------
null| a | b | null | c | d | null
null| e | null | null | null | null | g
You can't have a function that returns a different set of columns each time you call it
(without specifying the output columns - which you don't want).
I have once written a function to flatten a JSON hierarchy to multiple rows.
Applied to your first example it would return the following:
path | key | value
-------------------------------------------------+--------------+-----------------
/glossary | title | example glossary
/glossary/GlossDiv | title | S
/glossary/GlossDiv/GlossList/GlossEntry | ID | SGML
/glossary/GlossDiv/GlossList/GlossEntry | Abbrev | ISO 8879:1986
/glossary/GlossDiv/GlossList/GlossEntry | SortAs | SGML
/glossary/GlossDiv/GlossList/GlossEntry | Acronym | SGML
/glossary/GlossDiv/GlossList/GlossEntry | GlossSee | markup
/glossary/GlossDiv/GlossList/GlossEntry | GlossTerm | Sta..
/glossary/GlossDiv/GlossList/GlossEntry/GlossDef | para | A meta-m..
/glossary/GlossDiv/GlossList/GlossEntry/GlossDef | GlossSeeAlso | ["GML", "XML"]
And the following for the second example:
path | key | value
-------+------+------
/s1 | f1 | a
/s1 | f2 | b
/s1/f3 | f3.1 | c
/s1/f3 | f3.2 | d
/s1 | f1 | e
/s1 | f4 | g
Thomas
Attachments:
flatten_json.sqltext/plain; charset=UTF-8; name=flatten_json.sqlDownload
Thomas Kellerer schreef op vr 17-03-2023 om 11:21 [+0100]:
Wim Bertels schrieb am 17.03.2023 um 11:05:
what would be the general idea: "easily" convert an hierarchical
structure like json or xml to a table; for example creating columns
by
appending the key-names when going doing down the three, using null
for
empty values, adding more columns as needed by the given structure.
(1-way operation)You can't have a function that returns a different set of columns
each time you call it
(without specifying the output columns - which you don't want).
Hello Thomas,
thanks for the feedback,
i was wondering in the likes of existing built-in functions or
extensions (not CREATE FUNCTION)
mvg,
Wim