Binary data migration from MSSQL

Started by Roman Failover 23 years ago4 messagesgeneral
Jump to latest
#1Roman Fail
rfail@posportal.com

Getting the management to let go of their death grip on MSSQL was the first (big!) hurdle, but I succeeded.

Now I need to prove that PostgreSQL will work for us, but I can't seem to find a painless way to migrate my binary data. I've searched the MS docs, Postgres docs, pg TechDocs, List archives, Google, you name it. I have a table in MSSQL 2000 with 20 million rows, about 60 columns. One of the columns is of type VARBINARY.

** pgAdminII Migration Wizard (1.4.12): ignores the binary column when importing. To be fair, this possibility is mentioned in the pgAdminII docs. However, it is totally painless for all my other non-binary tables and I love it!

** MS DTS Transformation: Migrates the data (including varbinary) perfectly. However, the migration fails at about 400,000 rows because it runs out of memory (1.5 GB physical memory on the MSSQL machine). I'm trying to just import small chunks of the table using a query for the source, but it is very slow and tedious even over Gigabit Ethernet.

** bcp: I'd like to use bcp to do a text import, but I can't figure out how to make the binary data work on either end. It appears to me that Postgres COPY requires literal binary data to be in octal format (e.g. '\\047'). bcp outputs the binary data as an non-escaped ASCII string of hex values (e.g. DF9B52A3). I guess I could write a Java program to convert the hex to escaped octal, then run COPY. This would probably be pretty slow (both from a 20-million row performance standpoint and my personal productivity).

Is there a better way to do this?

Source machine:
Dual P3 1.1 GHz, 1.5 GB RAM, single 18GB SCSI disk
Win2k Server SP3, MS SQL 2000 SP3, pgAdmin 1.4.12, pgODBC 7.02.00.05

Destination machine:
Dual Xeon 2.0 GHz, 4.0 GB RAM, 104GB 6-disk RAID-10 dedicated to /usr/local/pgsql/data
RedHat 8.0
PostgreSQL 7.3.1 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903

Thanks!!!

Roman Fail
Sr. Web Application Developer
POS Portal, Inc.

#2Joe Conway
mail@joeconway.com
In reply to: Roman Fail (#1)
Re: Binary data migration from MSSQL

Roman Fail wrote:

** bcp: I'd like to use bcp to do a text import, but I can't figure out how
to make the binary data work on either end. It appears to me that Postgres
COPY requires literal binary data to be in octal format (e.g. '\\047').
bcp outputs the binary data as an non-escaped ASCII string of hex values
(e.g. DF9B52A3). I guess I could write a Java program to convert the hex to
escaped octal, then run COPY. This would probably be pretty slow (both from
a 20-million row performance standpoint and my personal productivity).

Is there a better way to do this?

I'd lean toward bcp as the fastest method, even if you need to do some data
massaging. In MSSQL 2000 it appears that bcp can bulk copy a query in similar
fashion to a table. Perhaps you could do the binary-to-octal transformation in
the export query from MSSQL?

Joe

#3Roman Fail
rfail@posportal.com
In reply to: Joe Conway (#2)
Re: Binary data migration from MSSQL

** bcp: I'd like to use bcp to do a text import, but I can't figure out how
to make the binary data work on either end. It appears to me that Postgres
COPY requires literal binary data to be in octal format (e.g. '\\047').
bcp outputs the binary data as an non-escaped ASCII string of hex values
(e.g. DF9B52A3). I guess I could write a Java program to convert the hex to
escaped octal, then run COPY. This would probably be pretty slow (both from
a 20-million row performance standpoint and my personal productivity).

Is there a better way to do this?

I'd lean toward bcp as the fastest method, even if you need to do some data
massaging. In MSSQL 2000 it appears that bcp can bulk copy a query in similar
fashion to a table. Perhaps you could do the binary-to-octal transformation in
the export query from MSSQL?

Joe

Thanks for the idea, Joe. I wrote a 'varbinary-to-octal_string' UDF on the MSSQL Server that I could call within a query on the bcp command line. After some tweaking, it worked great and takes about an hour to dump all 20 million rows to text (which I think is pretty fast!).

Wes Sheldah suggested modifying the bcp output with a perl script, but I opted not to do it that way. Although I don't know perl, I could have just as easily used Java. The drawback of doing it this way is that it's harder for others to follow if there is another language involved. I know everyone who wants to do this will already have a MSSQL server, and that's all you need to create the modified file.

I'd be happy to write up what I did and contribute it to techdocs, if someone would let me know where to send the document. I looked around on the techdocs site and didn't see any instructions. I'm still curious why the binary data won't migrate using pgAdminII, but I guess I'll ask that question on another list.

Roman Fail
Sr. Web Application Developer
POS Portal, Inc.

#4Joe Conway
mail@joeconway.com
In reply to: Roman Fail (#3)
Re: Binary data migration from MSSQL

Roman Fail wrote:

Thanks for the idea, Joe. I wrote a 'varbinary-to-octal_string' UDF on the
MSSQL Server that I could call within a query on the bcp command line.
After some tweaking, it worked great and takes about an hour to dump all 20
million rows to text (which I think is pretty fast!).

Wes Sheldah suggested modifying the bcp output with a perl script, but I
opted not to do it that way. Although I don't know perl, I could have just
as easily used Java. The drawback of doing it this way is that it's harder
for others to follow if there is another language involved. I know
everyone who wants to do this will already have a MSSQL server, and that's
all you need to create the modified file.

I'd be happy to write up what I did and contribute it to techdocs, if
someone would let me know where to send the document. I looked around on
the techdocs site and didn't see any instructions. I'm still curious why
the binary data won't migrate using pgAdminII, but I guess I'll ask that
question on another list.

I think Justin Clift is the right person to help you there -- Justin?

Joe