[OT?] ETL tools

Started by Jose Gonzalez Gomezover 20 years ago6 messagesgeneral
Jump to latest
#1Jose Gonzalez Gomez
jgonzalez.openinput@gmail.com

Hi there,

I have started working on a new place where they have a lot of medical
data stored in relational databases they later exploit with
statistical software. The situation is a bit chaotic as they're using
a lot of local Access databases, some databases hosted in an old
version of Microsoft SQL Server and a lot of data in other non
relational files (SPSS, Excel, ...). I was hoping to impose a bit of
order and I started installing a current version of PostgreSQL to host
all the databases they're using.

I now have two problems:

1. I have to migrate all the data they currently have to the new
system, maybe making some transformations in the way.

2. They're continously receiving new data in heterogeneous formats and
I need some way to import this data in a easy way into the existing
databases.

Googling a bit I have discovered that what I probably need is an ETL
(extract, transform and load) tool, preferably open sourced and very
PostgreSQL friendly... does anybody on this list have experience with
such kind of tools? Would anybody recommend anything? Maybe I could
try another approach?

Thanks in advance, best regards
Jose

#2Roger Hand
RHand@kailea.com
In reply to: Jose Gonzalez Gomez (#1)
Re: [OT?] ETL tools

Jose Gonzalez wrote:

The situation is a bit chaotic as they're using
a lot of local Access databases, some databases hosted in an old
version of Microsoft SQL Server and a lot of data in other non
relational files (SPSS, Excel, ...). I was hoping to impose a bit of
order and I started installing a current version of PostgreSQL to host
all the databases they're using.
...
Maybe I could try another approach?

Personally, I would write code (Java or whatever) to do the work. There will almost certainly be cases where you need to do special data massaging, or special rules for special cases, and that will be a lot easier to do when you are in complete control of what happens. I would be afraid that an ETL tool ...

1. Would have a tedious learning curve.
2. You would discover (after x hours) that it doesn't do something you absolutely need to be able to do

Then again, I haven't used any ETL tools (well, not for a long, long time), unless you count PGAdmin [http://www.pgadmin.org/]

The PGAdmin-II app had an excellent MS SQL Server -> Postgres data conversion plug-in. I used it many, many times with zero problems, with both SQL Server 7 and 2000. Unfortunately, the last I checked the current PGAdmin-III app doesn't seem to have or support this plugin, and the PGAdmin-II app doesn't work with Postgres 8 iirc. If, somehow, this converter was available again you could give it a shot, but I don't think it supported much more than straight table copy type stuff.

The problem with writing the code is that you'll need to do it from a platform that can access all the data sources. I've used Java for these types of tasks.

Postgres, of course, has a JDBC driver, so there's no problem there. MS SQL Server 2000 has a Microsoft JDBC driver, but I have used one that I bought that works with SQL Server 7, which did not come with a Microsoft JDBC driver. So if you're using a pre-2000 version of SQL Server you will need to hunt up a JDBC driver. (Actually, the ODBC-JDBC bridge exists ... that is not recommended for any kind of real world use, but maybe would work for a one-time pull.)

I've successfully accessed Excel data from Java using the free Java Excel API [http://www.andykhan.com/jexcelapi/index.html]

I've used the sun sun.jdbc.odbc.JdbcOdbcDriver driver to access MS Access files. Again, this uses the bridge, but for just reading data would probably prove adequate.

Good luck!

-Roger

Show quoted text

Jose

#3Robert Treat
xzilla@users.sourceforge.net
In reply to: Jose Gonzalez Gomez (#1)
Re: [OT?] ETL tools

On Wednesday 24 August 2005 03:50, Jose Gonzalez Gomez wrote:

Googling a bit I have discovered that what I probably need is an ETL
(extract, transform and load) tool, preferably open sourced and very
PostgreSQL friendly... does anybody on this list have experience with
such kind of tools? Would anybody recommend anything? Maybe I could
try another approach?

You might want to take a look at kettle, which is part of the bizgres package.
I haven't used it but it seemed pretty flexible when I saw the demo at oscon,
and there certainly pg friendly.
http://www.bizgres.org/pages.php?pg=downloads
--
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

#4Jose Gonzalez Gomez
jgonzalez.openinput@gmail.com
In reply to: Robert Treat (#3)
Re: [OT?] ETL tools

2005/8/24, Robert Treat <xzilla@users.sourceforge.net>:

On Wednesday 24 August 2005 03:50, Jose Gonzalez Gomez wrote:

Googling a bit I have discovered that what I probably need is an ETL
(extract, transform and load) tool, preferably open sourced and very
PostgreSQL friendly... does anybody on this list have experience with
such kind of tools? Would anybody recommend anything? Maybe I could
try another approach?

You might want to take a look at kettle, which is part of the bizgres package.
I haven't used it but it seemed pretty flexible when I saw the demo at oscon,
and there certainly pg friendly.
http://www.bizgres.org/pages.php?pg=downloads

I knew about that, but unfortunately there doesn't seem to be any
version for Windows (I develop in Windows :o( ) and I neither have
find anywhere to download just the ketl tool instead of the whole
bizgres server... if I don't find anything else I may try to download
the Linux version and give it a try...

Thanks a lot, best regards
Jose

#5David Fetter
david@fetter.org
In reply to: Jose Gonzalez Gomez (#1)
Re: [OT?] ETL tools

On Wed, Aug 24, 2005 at 09:50:06AM +0200, Jose Gonzalez Gomez wrote:

Hi there,

I have started working on a new place where they have a lot of medical
data stored in relational databases they later exploit with
statistical software. The situation is a bit chaotic as they're using
a lot of local Access databases, some databases hosted in an old
version of Microsoft SQL Server and a lot of data in other non
relational files (SPSS, Excel, ...). I was hoping to impose a bit of
order and I started installing a current version of PostgreSQL to host
all the databases they're using.

I now have two problems:

1. I have to migrate all the data they currently have to the new
system, maybe making some transformations in the way.

I generally use Perl for these kinds of tasks. One way that makes
this process less manual is with DBI-Link

http://pgfoundry.org/projects/dbi-link/

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#6Bob Ippolito
bob@redivi.com
In reply to: Jose Gonzalez Gomez (#4)
Re: [OT?] ETL tools

On Aug 24, 2005, at 6:00 AM, Jose Gonzalez Gomez wrote:

2005/8/24, Robert Treat <xzilla@users.sourceforge.net>:

On Wednesday 24 August 2005 03:50, Jose Gonzalez Gomez wrote:

Googling a bit I have discovered that what I probably need is an ETL
(extract, transform and load) tool, preferably open sourced and very
PostgreSQL friendly... does anybody on this list have experience
with
such kind of tools? Would anybody recommend anything? Maybe I could
try another approach?

You might want to take a look at kettle, which is part of the
bizgres package.
I haven't used it but it seemed pretty flexible when I saw the
demo at oscon,
and there certainly pg friendly.
http://www.bizgres.org/pages.php?pg=downloads

I knew about that, but unfortunately there doesn't seem to be any
version for Windows (I develop in Windows :o( ) and I neither have
find anywhere to download just the ketl tool instead of the whole
bizgres server... if I don't find anything else I may try to download
the Linux version and give it a try...

I'm pretty sure that the ketl tool is written in Java, so it should
run on Windows. Just download the source tarball, it's in there, you
don't have to install everything.

-bob