Getting Mysql data into Postgres: least painful methods?

Started by Ken Tanzerabout 13 years ago9 messagesgeneral
Jump to latest
#1Ken Tanzer
ken.tanzer@gmail.com

I'm wondering if anyone can point me towards a good method for moving mysql
data into Postgres? I've done some web searching, and found documentation
from various years, but it's not clear what's current and what works best.
Much of what I found seems to be flame war material (why Postgres is
better), or is both old and seemingly involved and complex.

Here's the fuller description of what I'm trying to do. I've got a dataset
(a UMLS* *Metathesaurus subset) that I need to get into a Postgres
database. It's all reference data, and so will be read-only. There's no
functions or logic involved. I anticipate having to update it at least
quarterly, so I'd like to get to a well-grooved import process.

The data as distributed can be had in Oracle or Mysql formats. (I already
gave them my two cents to include Postgres.) I did see some information
about modifying the Mysql distribution files to make them
Postgres-compatible, but I thought (perhaps foolishly) it would be easier
to bring them into Mysql, and from there export them to Postgres.

A recurring idea seemed to be to use:

mysqldump -v --compatible=postgresql umls_test > dumpfile.sql

followed by

sed -i "s/\\\'/\'\'/g" dumpfile.sql

but that didn't bring me much success. I figure this has to be a fairly
common need, and hopefully by 2013 there's an easy solution. Thanks in
advance!

Ken

--
AGENCY Software
A data system that puts you in control
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ken Tanzer (#1)
Re: Getting Mysql data into Postgres: least painful methods?

On 01/11/2013 03:54 PM, Ken Tanzer wrote:

but that didn't bring me much success. I figure this has to be a fairly
common need, and hopefully by 2013 there's an easy solution. Thanks in
advance!

Have you looked at Foreign Data Wrappers(FDW):

http://www.postgresql.org/docs/9.1/static/sql-createforeigndatawrapper.html

If you use Python there is Multicorn:
http://www.postgresql.org/docs/9.1/static/sql-createforeigndatawrapper.html

There is also mysql_fdw:
http://wiki.postgresql.org/wiki/Foreign_data_wrappers#mysql_fdw

Ken

--

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3wd
wd@wdicc.com
In reply to: Ken Tanzer (#1)
Re: Getting Mysql data into Postgres: least painful methods?

You can search from google,
https://www.google.com/search?q=mysql2pg&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a

On Sat, Jan 12, 2013 at 7:54 AM, Ken Tanzer <ken.tanzer@gmail.com> wrote:

Show quoted text

I'm wondering if anyone can point me towards a good method for moving
mysql data into Postgres? I've done some web searching, and found
documentation from various years, but it's not clear what's current and
what works best. Much of what I found seems to be flame war material (why
Postgres is better), or is both old and seemingly involved and complex.

Here's the fuller description of what I'm trying to do. I've got a
dataset (a UMLS* *Metathesaurus subset) that I need to get into a
Postgres database. It's all reference data, and so will be read-only.
There's no functions or logic involved. I anticipate having to update it at
least quarterly, so I'd like to get to a well-grooved import process.

The data as distributed can be had in Oracle or Mysql formats. (I already
gave them my two cents to include Postgres.) I did see some information
about modifying the Mysql distribution files to make them
Postgres-compatible, but I thought (perhaps foolishly) it would be easier
to bring them into Mysql, and from there export them to Postgres.

A recurring idea seemed to be to use:

mysqldump -v --compatible=postgresql umls_test > dumpfile.sql

followed by

sed -i "s/\\\'/\'\'/g" dumpfile.sql

but that didn't bring me much success. I figure this has to be a fairly
common need, and hopefully by 2013 there's an easy solution. Thanks in
advance!

Ken

--
AGENCY Software
A data system that puts you in control
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

#4Rich Shepard
rshepard@appl-ecosys.com
In reply to: Ken Tanzer (#1)
Re: Getting Mysql data into Postgres: least painful methods?

On Fri, 11 Jan 2013, Ken Tanzer wrote:

I'm wondering if anyone can point me towards a good method for moving
mysql data into Postgres?

I had to do this last year with the ITIS (Integrated Taxonomic Information
System) maintained by the US Geological Survey.

Some MySQL key words were immediately recognized and I used emac's
global-search-and-replace to change them to postgres words. Then I tried
reading in individual tables to a newly created database and redirected
errors to a disk file. I fixed the errors postgres identified, dropped the
table, and repeated until there were no errors. Took a bit of time but
worked just fine.

Then I sent the USGS database maintainer a dump of the postgres database
because he wanted to migrate from mysql to postgres there. I think of it as
a public service. :-)

Rich

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5John R Pierce
pierce@hogranch.com
In reply to: Ken Tanzer (#1)
Re: Getting Mysql data into Postgres: least painful methods?

On 1/11/2013 3:54 PM, Ken Tanzer wrote:

Here's the fuller description of what I'm trying to do. I've got a
dataset (a UMLS//Metathesaurus subset) that I need to get into a
Postgres database. It's all reference data, and so will be
read-only. There's no functions or logic involved. I anticipate
having to update it at least quarterly, so I'd like to get to a
well-grooved import process.

how many tables? if its just one or a couple tables, can you get the
data as CSV? then it would be trivial to import into postgres, using
the COPY command (or, \c from psql)...

another alternative, investigate "ETL" tools, these are general purpose
data manglers that can connect to a source database (usually any of
about 20 supported), extract data, transform it if needed, and load it
into a destination database (from a list of 20 or so typically supported)

#6Ken Tanzer
ken.tanzer@gmail.com
In reply to: John R Pierce (#5)
Re: Getting Mysql data into Postgres: least painful methods?

Thanks for all the responses. I would totally go with the FDW, were I not
stuck using (someone else's) 8.4.

In this case it looks like 24 tables, with CSV-like import files totaling
7G.

Since there didn't seem to be a clean, simple and automated path from mysql
to postgres, I'm back to skipping mysql entirely and just trying to modify
the mysql files to feed directly into postgres.

To that end, they have to be transformed a bit, which I've written a bit of
script to accomplish. I'm wondering if there's a way to avoid creating
another 7G of slightly-modified import files before feeding them to
postgres. Specifically, is there a way to do something like

\copy my_table FROM '`cat my_import_file | my_transform_script`'

My 2 goals here are to be somewhat efficient (by not duplicating the input
files), and to keep this all within a transaction. I could have the script
transform each file separately and pipe it to postgres:

(echo 'copy mytable from stdin...' ; cat my_import_file |
my_transform_script ) | psql

but I'm thinking that there's no way to group those all into a transaction.

Hopefully this makes sense, and any suggestions welcome. Thanks.

Ken

On Fri, Jan 11, 2013 at 7:13 PM, John R Pierce <pierce@hogranch.com> wrote:

On 1/11/2013 3:54 PM, Ken Tanzer wrote:

Here's the fuller description of what I'm trying to do. I've got a
dataset (a UMLS* *Metathesaurus subset) that I need to get into a
Postgres database. It's all reference data, and so will be read-only.
There's no functions or logic involved. I anticipate having to update it at
least quarterly, so I'd like to get to a well-grooved import process.

how many tables? if its just one or a couple tables, can you get the data
as CSV? then it would be trivial to import into postgres, using the COPY
command (or, \c from psql)...

another alternative, investigate "ETL" tools, these are general purpose
data manglers that can connect to a source database (usually any of about
20 supported), extract data, transform it if needed, and load it into a
destination database (from a list of 20 or so typically supported)

--
AGENCY Software
A data system that puts you in control
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

#7Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ken Tanzer (#6)
Re: Getting Mysql data into Postgres: least painful methods?

On 01/15/2013 04:09 PM, Ken Tanzer wrote:

Thanks for all the responses. I would totally go with the FDW, were I
not stuck using (someone else's) 8.4.

8.4 supports FDW so I will assume you do not have the permissions to
create one.

In this case it looks like 24 tables, with CSV-like import files
totaling 7G.

Since there didn't seem to be a clean, simple and automated path from
mysql to postgres, I'm back to skipping mysql entirely and just trying
to modify the mysql files to feed directly into postgres.

To that end, they have to be transformed a bit, which I've written a bit
of script to accomplish. I'm wondering if there's a way to avoid
creating another 7G of slightly-modified import files before feeding
them to postgres. Specifically, is there a way to do something like

\copy my_table FROM '`cat my_import_file | my_transform_script`'

The way I have done this is to create a script using Python that follows
this flow:

MySQL --> MySQLdb module --> Data transform --> psycopg2 --> Postgres
OR
csv --> csv module ------^

In the script you can set up the transactions as you like, per row,
batches, or everything in one transaction.

My 2 goals here are to be somewhat efficient (by not duplicating the
input files), and to keep this all within a transaction. I could have
the script transform each file separately and pipe it to postgres:

(echo 'copy mytable from stdin...' ; cat my_import_file |
my_transform_script ) | psql

but I'm thinking that there's no way to group those all into a transaction.

Hopefully this makes sense, and any suggestions welcome. Thanks.

Ken

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Ken Tanzer
ken.tanzer@gmail.com
In reply to: Adrian Klaver (#7)
Re: Getting Mysql data into Postgres: least painful methods?

8.4 supports FDW so I will assume you do not have the permissions to
create one.

Thanks but I'm confused--the doc page you mentioned says the mysql FDW
isn't supported until 9.1.

In this case it looks like 24 tables, with CSV-like import files
totaling 7G.

Since there didn't seem to be a clean, simple and automated path from
mysql to postgres, I'm back to skipping mysql entirely and just trying
to modify the mysql files to feed directly into postgres.

To that end, they have to be transformed a bit, which I've written a bit
of script to accomplish. I'm wondering if there's a way to avoid
creating another 7G of slightly-modified import files before feeding
them to postgres. Specifically, is there a way to do something like

\copy my_table FROM '`cat my_import_file | my_transform_script`'

The way I have done this is to create a script using Python that follows
this flow:

MySQL --> MySQLdb module --> Data transform --> psycopg2 --> Postgres
OR
csv --> csv module ------^

In the script you can set up the transactions as you like, per row,
batches, or everything in one transaction.

I'm also not sure about this. Are we both talking about a process that

bypasses mysql itself, and transforms its input files on the fly? I want
to write a script that will...

#!/bin/sh
... my script stuff...
... create tables...
psql \copy from (transformed mysql file 1)
psql \copy from (transformed mysql file 2)
psql \copy from (transformed mysql file 3)
... more script stuff...

without re-writing the mysql files, and within one transaction. I can't
tell if your answer was getting at that, or something else.

Ken

My 2 goals here are to be somewhat efficient (by not duplicating the
input files), and to keep this all within a transaction. I could have
the script transform each file separately and pipe it to postgres:

(echo 'copy mytable from stdin...' ; cat my_import_file |
my_transform_script ) | psql

but I'm thinking that there's no way to group those all into a
transaction.

Hopefully this makes sense, and any suggestions welcome. Thanks.

Ken

--
Adrian Klaver
adrian.klaver@gmail.com

--
AGENCY Software
A data system that puts you in control
*http://agency-software.org/*
ken.tanzer@agency-software.org
(253) 245-3801

#9Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Ken Tanzer (#8)
Re: Getting Mysql data into Postgres: least painful methods?

On 01/15/2013 04:41 PM, Ken Tanzer wrote:

8.4 supports FDW so I will assume you do not have the permissions to
create one.

Thanks but I'm confused--the doc page you mentioned says the mysql FDW
isn't supported until 9.1.

My apologies. FDW appeared in 8.4, the extension mechanism which the
mysql FDW and Multicorn use exists only in 9.1+

--
Adrian Klaver
adrian.klaver@gmail.com

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general