Decomposing xml into table

Started by Surafel Temesgenover 5 years ago9 messages
#1Surafel Temesgen
surafel3000@gmail.com

In PostgreSQL there are a function table_to_xml to map the table content
to xml value but there are no functionality to decompose xml back into
table which can be used in system that uses xml for transport only or there
are a need to migrate to database system to use database functionality. I
propose to have this by extending copy to handle xml format as well because
file parsing and tuple formation functions is in there and it also seems to
me that implement it without using xml library is simpler

Comments?

regards

Surafel

#2Pavel Stehule
pavel.stehule@gmail.com
In reply to: Surafel Temesgen (#1)
Re: Decomposing xml into table

po 22. 6. 2020 v 20:49 odesílatel Surafel Temesgen <surafel3000@gmail.com>
napsal:

In PostgreSQL there are a function table_to_xml to map the table content
to xml value but there are no functionality to decompose xml back into
table which can be used in system that uses xml for transport only or there
are a need to migrate to database system to use database functionality. I
propose to have this by extending copy to handle xml format as well because
file parsing and tuple formation functions is in there and it also seems to
me that implement it without using xml library is simpler

Did you try the xmltable function?

https://www.postgresql.org/docs/10/functions-xml.html

Regards

Pavel

Comments?

Show quoted text

regards

Surafel

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Surafel Temesgen (#1)
Re: Decomposing xml into table

Surafel Temesgen <surafel3000@gmail.com> writes:

In PostgreSQL there are a function table_to_xml to map the table content
to xml value but there are no functionality to decompose xml back into
table

Huh? XMLTABLE does that, and it's even SQL-standard.

I propose to have this by extending copy to handle xml format as well because
file parsing and tuple formation functions is in there

Big -1 on that. COPY is not for general-purpose data transformation.
The more unrelated features we load onto it, the slower it will get,
and probably also the more buggy and unmaintainable. There's also a
really fundamental mismatch, in that COPY is designed to do row-by-row
processing with essentially no cross-row state. How would you square
that with the inherently nested nature of XML?

and it also seems to
me that implement it without using xml library is simpler

I'm not in favor of implementing our own XML functionality, at least
not unless we go all the way and remove the dependency on libxml2
altogether. That wouldn't be a terrible idea --- libxml2 has a long
and sad track record of bugs, including security issues. But it'd be
quite a big job, and it'd still have nothing to do with COPY.

The big-picture question here, though, is why expend effort on XML at all?
It seems like JSON is where it's at these days for that problem space.

regards, tom lane

#4Surafel Temesgen
surafel3000@gmail.com
In reply to: Pavel Stehule (#2)
Re: Decomposing xml into table

hey Pavel

On Mon, Jun 22, 2020 at 9:59 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Did you try the xmltable function?

yes i know it but i am proposing changing given xml data in to relational
form and insert it to desired table at once

regards
Surafel

#5Pavel Stehule
pavel.stehule@gmail.com
In reply to: Surafel Temesgen (#4)
Re: Decomposing xml into table

út 23. 6. 2020 v 13:59 odesílatel Surafel Temesgen <surafel3000@gmail.com>
napsal:

hey Pavel

On Mon, Jun 22, 2020 at 9:59 PM Pavel Stehule <pavel.stehule@gmail.com>
wrote:

Did you try the xmltable function?

yes i know it but i am proposing changing given xml data in to relational
form and insert it to desired table at once

It is a question of how common it is. Because there is no common format
for xml, I agree with Tom, so it should not be part of core. A import from
XML can be done with COPY PROGRAM

or some special tools like https://github.com/okbob/pgimportdoc

There is too high variability so some special external tool will be better
(more cleaner, more user friendly).

Regards

Pavel

Show quoted text

regards
Surafel

#6Surafel Temesgen
surafel3000@gmail.com
In reply to: Tom Lane (#3)
Re: Decomposing xml into table

Hey Tom

On Mon, Jun 22, 2020 at 10:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Big -1 on that. COPY is not for general-purpose data transformation.
The more unrelated features we load onto it, the slower it will get,
and probably also the more buggy and unmaintainable.

what new format handling takes to add regards to performance is a check to
a few place and I don’t think that have noticeable performance impact and
as far as I can see copy is extendable by design and I don’t think adding
additional format will be a huge undertaking

There's also a
really fundamental mismatch, in that COPY is designed to do row-by-row
processing with essentially no cross-row state. How would you square
that with the inherently nested nature of XML?

In xml case the difference is row delimiter . In xml mode user specifies
row delimiter tag name and starting from start tag of specified name up to
its end tag treated as single row and every text content in between will be
our columns value filed

The big-picture question here, though, is why expend effort on XML at all?
It seems like JSON is where it's at these days for that problem space.

there are a legacy systems and I think xml is still popular

regards
Surafel

#7Thomas Kellerer
shammat@gmx.net
In reply to: Surafel Temesgen (#4)
Re: Decomposing xml into table

Surafel Temesgen schrieb am 23.06.2020 um 13:59:

Did you try the xmltable function?

yes i know it but i am proposing changing given xml data in to
relational form and insert it to desired table at once

Well, xmltable() does change the XML data to a relational form and
the result can directly be used to insert into a table

insert into target_table (...)
select ...
from xmltable(...);

#8Andreas Karlsson
andreas@proxel.se
In reply to: Surafel Temesgen (#1)
Re: Decomposing xml into table

On 6/22/20 8:49 PM, Surafel Temesgen wrote:

Comments?

I feel it would make more sense to add features like this to an external
tool, e.g pgloader. But even if we add it to the core PostgreSQL project
I feel the XML parsing should be done in the client, not in the database
server. The COPY command is already very complex.

Andreas

#9Chapman Flack
chap@anastigmatix.net
In reply to: Thomas Kellerer (#7)
Re: Decomposing xml into table

On 06/23/20 08:57, Thomas Kellerer wrote:

Surafel Temesgen schrieb am 23.06.2020 um 13:59:

Did you try the xmltable function?

yes i know it but i am proposing changing given xml data in to
relational form and insert it to desired table at once

Well, xmltable() does change the XML data to a relational form and
the result can directly be used to insert into a table

insert into target_table (...)
select ...
from xmltable(...);

The use case that I imagine might be driving this would be where the XML
source is not deeply or elaborately nested, but is yuuge. In such a case,
PostgreSQL's XML handling and xmltable will not be doing beautiful things:

- the data coming from the frontend will have to be completely buffered
in backend memory, and then parsed as XML once by the XML data type
input routine, only for the purpose of confirming it's XML. The unparsed
form is what becomes the Datum value, which means

- xmltable gets to parse it a second time, again all in memory, and then
generate the set-returning function result tuples from it.

- as I last understood it [1]/messages/by-id/12389.1563746057@sss.pgh.pa.us, even the tuples generated as a result
get all piled up in a tuplestore before the next part of the (what
you would otherwise hope to call) "pipeline" can happen. (There may
be work on better pipelining that part.)

So I would say for that use case, it will be hard to do better than an
external process acting as a filter from XML in to COPY-formatted tuples
out.

The XML-processing library I'm most familiar with, Saxon, can do some
sophisticated analysis of an XML Query or XSLT transformation and
determine when it can be done while consuming the XML in streaming
mode rather than building a complete tree first. (The open-source
"community edition" doesn't have that trick, only the paid editions,
but they're otherwise compatible, so you can prototype stuff using
the community edition, and then drop in a paid version and poof, it
goes faster.)

On 06/23/20 08:25, Surafel Temesgen wrote:

On Mon, Jun 22, 2020 at 10:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The big-picture question here, though, is why expend effort on XML
at all?
It seems like JSON is where it's at these days for that problem space.

there are a legacy systems and I think xml is still popular

I had an interesting conversation about that at PGCon a year ago, with
someone who crystallized this idea better than I had at the time (but
may or may not want his name on it):

We tend to repeat a cycle of: a new technology is introduced, minimal
at first, then develops a good ecosystem of sophisticated tooling, then
looks complicated and gets replaced with something minimal that needs to
repeat the same process.

By this point, we're on to 3.x versions of XML Query, XPath, and XSLT,
very mature languages that can express sophisticated transformations
and optimize the daylights out of them.

JSON now has JSONPATH, which is coming along, and relearning the lessons
of XPath and XQuery, and by the time it has, there will be something else
that's appealing because it looks more minimal, and we'll be having the
"why expend effort on JSON at all?" conversation.

Regards,
-Chap

[1]: /messages/by-id/12389.1563746057@sss.pgh.pa.us