xml import/export tools and performance

Started by Ivan Sergio Borgonovoover 16 years ago5 messagesgeneral
Jump to latest
#1Ivan Sergio Borgonovo
mail@webthatworks.it

I need to import/export through xml files.

I was looking for tools/examples/HOWTO for postgresql.

Right now I still don't know the details of the requirements.

I know I won't need a GUI.

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

I'd like to have a tool that can write XSD from queries
automatically.

I may need to strictly specify one of the xml output format since
one of the importing parties pretend to support xml with something
that is more like a tagged csv.

Currently I'm mostly looking around to see where it would be
convenient to move the boundaries of the problem.

I'd be curious about what kind of performance impact they may have
compared to COPY.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#2Brian Modra
brian@zwartberg.com
In reply to: Ivan Sergio Borgonovo (#1)
Re: xml import/export tools and performance

2009/11/6 Ivan Sergio Borgonovo <mail@webthatworks.it>:

I need to import/export through xml files.

I was looking for tools/examples/HOWTO for postgresql.

Right now I still don't know the details of the requirements.

I know I won't need a GUI.

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

I'd like to have a tool that can write XSD from queries
automatically.

I may need to strictly specify one of the xml output format since
one of the importing parties pretend to support xml with something
that is more like a tagged csv.

Currently I'm mostly looking around to see where it would be
convenient to move the boundaries of the problem.

I'd be curious about what kind of performance impact they may have
compared to COPY.

thanks

I don't know of any such tools, but they may exist, if not:

Tell me the XML format you want in/out of postgres, and I can write a
tool for you. I had a quick look just now to see what the MS Access
format is, and its typically difficult to wade through the moutains of
useless information that people have posted on teh web about Microsoft
products, so i lost interest. However, if you can point me in the
right direction, so I can see the DTD or examples, or both... then I
can tell you how easy/difficult it would be to write a converter.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Brian Modra Land line: +27 23 5411 462
Mobile: +27 79 69 77 082
5 Jan Louw Str, Prince Albert, 6930
Postal: P.O. Box 2, Prince Albert 6930
South Africa
http://www.zwartberg.com/

#3John R Pierce
pierce@hogranch.com
In reply to: Ivan Sergio Borgonovo (#1)
Re: xml import/export tools and performance

Ivan Sergio Borgonovo wrote:

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

Microsoft SQL Server has a pretty good data translation tool, it used to
be called DTS, but I think its called something else now... if you
install a ODBC or OLEDB Postgres driver ontp the SQL Server system, and
configure your postgres server to allow the MS SQL Server to be able to
connect to it, you can use this service to bulk transfer data between
Postgres and MS SQL Server, either full tables or specific queries.

ahh, its called SSIS now, here's a pretty good summary of it,
http://en.wikipedia.org/wiki/SQL_Server_Integration_Services

and some info on MS's webpile about it...
http://www.microsoft.com/sqlserver/2005/en/us/integration-services.aspx
http://technet.microsoft.com/en-us/library/cc917721.aspx

if you go this route, you don't have to mess with any sort of
import/export files, XML or otherwise, just move the data directly
between the databases over the network.

personally, I find XML is extremely inefficient. the idea of sending
the name of every field with each data record just seems like a bad
idea. you end up with more metadata than actual data.

#4Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: John R Pierce (#3)
Re: xml import/export tools and performance

On Fri, 06 Nov 2009 01:28:33 -0800
John R Pierce <pierce@hogranch.com> wrote:

Ivan Sergio Borgonovo wrote:

I know one of the exporting parties will be a MS SQL 2005
server, so it would be nice if there was an easy way to import
xml generated with the FOR XML AUTO, XMLSCHEMA ('forpg').

Microsoft SQL Server has a pretty good data translation tool, it
used to be called DTS, but I think its called something else

SSIS... being used to DTS I think that in terms of usability it made
a step back... but still XML support in MS SQL looks reasonably neat.

and some info on MS's webpile about it...
http://www.microsoft.com/sqlserver/2005/en/us/integration-services.aspx
http://technet.microsoft.com/en-us/library/cc917721.aspx

if you go this route, you don't have to mess with any sort of
import/export files, XML or otherwise, just move the data directly
between the databases over the network.

Unfortunately that's not the only source of input/output and
internally the DB schema may be a bit different and towards some
destinations we just have one choice.

personally, I find XML is extremely inefficient. the idea of
sending the name of every field with each data record just seems
like a bad idea. you end up with more metadata than actual data.

It is.

We need at least one xml output, we may have some xml input and some
xml output. On some parties we don't have constraints and we just
thought that xml could be a common exchange format. Knowing nothing
about the tools available for pg, some example of real usage etc...
I was looking around to see where to start to cut the cake.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#5ries van Twisk
pg@rvt.dds.nl
In reply to: Ivan Sergio Borgonovo (#1)
Re: xml import/export tools and performance

On Nov 6, 2009, at 2:52 AM, Ivan Sergio Borgonovo wrote:

I need to import/export through xml files.

I was looking for tools/examples/HOWTO for postgresql.

Right now I still don't know the details of the requirements.

I know I won't need a GUI.

I know one of the exporting parties will be a MS SQL 2005 server, so
it would be nice if there was an easy way to import xml generated
with the FOR XML AUTO, XMLSCHEMA ('forpg').

I'd like to have a tool that can write XSD from queries
automatically.

I may need to strictly specify one of the xml output format since
one of the importing parties pretend to support xml with something
that is more like a tagged csv.

Currently I'm mostly looking around to see where it would be
convenient to move the boundaries of the problem.

I'd be curious about what kind of performance impact they may have
compared to COPY.

thanks

Ivan,

I have been using JasperETL for this purpose. But like what others say,
XML is very in efficient for this task.

Currently I am in the process if importing XML feeds into PostgreSQL.
At first I used the ETL tool to write to the correct table by creating
joins
into the ETL tool.

However, this slows things down quite a bit (but the process is VERY
manageable).
Currently I keep teh route within JasperETL as short as possible in
teh following steps

- load XML
- remove columns I don't need
- cleanup some data using a javarow
- load into a staging table

From there I execute a stored procedure to normalize the data (create
additional records where needed in
other foreign tables) and push data into production. Triggers on my
production table
do some additional work for tsearch2 and some other small stuff.

In my case, on a 32Bit system loading a 200Mb XML file consumes 1Gb
real memory.
On 64Bit systems with Java 1.6 you can push this further down the road
though.

if you can export to CSV and import using copy into PG, I would go for
that route
if you can control the format of the CSV well and reliable, because
detecting errors
in a CSV is much harder then in XML (missing column in CSV can mess up
data)

Ries