[Solved] SQL Server to PostgreSQL

Started by Roderick A. Andersonover 25 years ago22 messageshackersgeneral
Jump to latest
#1Roderick A. Anderson
raanders@altoplanos.net
hackersgeneral

A thanks to everyone on this list and especially; Jeffery Rhines, Chris
Knight, Chris Bitmead, and Sevo Stille.

The solution turned out to be very simple. After catching a SCSI BUS
speed mismatch problem which caused a NT Backup 'Restore' failure I
discovered that the actual data was in .mdb files! Copied the files to a
system running MS Access (Office 97) and was able to export them to a
delimited format which went into PostgreSQL with very few problems.
Mostly there were split lines which the \copy command didn't like. Hand
corrected them.

I was able to get the table format by using MS Access. Only question left
is what is the corresponding field type in PostgreSQL for a memo field in
SQL Server/Access (varchar(nnnn))?

Again thanks for all the help,
Rod
--
Roderick A. Anderson
raanders@altoplanos.net Altoplanos Information Systems, Inc.
Voice: 208.765.6149 212 S. 11th Street, Suite 5
FAX: 208.664.5299 Coeur d'Alene, ID 83814

#2Tressens Lionel
tressens@etud.insa-tlse.fr
In reply to: Roderick A. Anderson (#1)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait :

)I was able to get the table format by using MS Access. Only question left
)is what is the corresponding field type in PostgreSQL for a memo field in
)SQL Server/Access (varchar(nnnn))?

'text' type perhaps ?

Lionel

#3Roderick A. Anderson
raanders@altoplanos.net
In reply to: Roderick A. Anderson (#1)
hackersgeneral
I lied! [Solved] SQL Server to PostgreSQL

I hate it when I do this. See an answer I want and run with it rather
than find the real answer.

Turned out the data files (.mdb) _didn't_ belong to the database. They
were a piece of the database that was used for a report.

Back to the old grind wheel.

Rod
--
Roderick A. Anderson
raanders@altoplanos.net Altoplanos Information Systems, Inc.
Voice: 208.765.6149 212 S. 11th Street, Suite 5
FAX: 208.664.5299 Coeur d'Alene, ID 83814

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tressens Lionel (#2)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

Tressens Lionel <tressens@etud.insa-tlse.fr> writes:

Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait :
)I was able to get the table format by using MS Access. Only question left
)is what is the corresponding field type in PostgreSQL for a memo field in
)SQL Server/Access (varchar(nnnn))?

'text' type perhaps ?

Uh ... what's wrong with varchar(n) ?

regards, tom lane

#5Vince Vielhaber
vev@michvhf.com
In reply to: Tom Lane (#4)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

On Tue, 22 Aug 2000, Tom Lane wrote:

Tressens Lionel <tressens@etud.insa-tlse.fr> writes:

Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait :
)I was able to get the table format by using MS Access. Only question left
)is what is the corresponding field type in PostgreSQL for a memo field in
)SQL Server/Access (varchar(nnnn))?

'text' type perhaps ?

Uh ... what's wrong with varchar(n) ?

How big can our n be for varchar? By looking at his description I'm
thinking SQL Server allows a large n.

Vince.
--
==========================================================================
Vince Vielhaber -- KA8CSH email: vev@michvhf.com http://www.pop4.net
128K ISDN from $22.00/mo - 56K Dialup from $16.00/mo at Pop4 Networking
Online Campground Directory http://www.camping-usa.com
Online Giftshop Superstore http://www.cloudninegifts.com
==========================================================================

#6Jeffrey A. Rhines
jrhines@email.com
In reply to: Tressens Lionel (#2)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

I've wondered that myself, actually. What are the benefits and
drawbacks to going with one over the other, besides the obvious 255-char
field length limit for varchar? The reason to stay away from "memo"
fields in other serious RDBMSs are typically more difficult maintenance,
significantly lower performance, and requiring special function calls to
get the data out. Do any of those apply to PG?

Jeff

Tom Lane wrote:

Show quoted text

Tressens Lionel <tressens@etud.insa-tlse.fr> writes:

Le 22.08.00 a 09:37, "Roderick A. Anderson" m'ecrivait :
)I was able to get the table format by using MS Access. Only question left
)is what is the corresponding field type in PostgreSQL for a memo field in
)SQL Server/Access (varchar(nnnn))?

'text' type perhaps ?

Uh ... what's wrong with varchar(n) ?

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jeffrey A. Rhines (#6)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

"Jeffrey A. Rhines" <jrhines@email.com> writes:

Uh ... what's wrong with varchar(n) ?

I've wondered that myself, actually. What are the benefits and
drawbacks to going with one over the other, besides the obvious 255-char
field length limit for varchar?

AFAIK there has *never* been a 255-char limit on char or varchar in
pgsql ... you must be thinking of Some Other DBMS.

The limit for these datatypes in 7.0 and before is BLCKSZ less some
overhead --- ~8000 bytes in a default setup. Beginning in 7.1 it's
an essentially arbitrary number. I set it at 10Mb in current sources,
but there's no strong reason for that number over any other. In theory
it could be up to 1Gb, but as Jan Wieck points out in a nearby thread,
you probably wouldn't like the performance of shoving gigabyte-sized
text values around. We need to think about offering API functions that
will allow reading and writing huge field values in bite-sized chunks.

There's no essential performance difference between char(n), varchar(n),
and text in Postgres, given the same-sized data value. char(n)
truncates or blank-pads to exactly n characters; varchar(n) truncates
if more than n characters; text never truncates nor pads. Beyond that
they are completely identical in storage requirements. Pick one based
on the semantics you want for your application.

regards, tom lane

#8Franck Martin
Franck@sopac.org
In reply to: Tom Lane (#7)
general
RE: [Solved] SQL Server to PostgreSQL

As we are talking about 7.1 and huge field size...

MS-SQL has a function that allows you to retreive part of a field a kind of
mid$(field, start, length). This would be a good addition indeed.

last problem. PG does not allow to store binary data. I'm not talking about
the current implementation of BLOB, but what will happen in 7.1...

If I want to store an image in a field. I cannot do that in 7.1 because the
data sent by 7.1 and received in libpq must be formated in ASCII. I haven't
play around to see if I could create a user type called varbinary(n), which
will output via varbinary_out just the content of a buffer... May be varchar
does it already (even if there is a \0?).

I know I should submit this problem to the hacker list, but I don't want to
subscribe to hacker just to submit one message...

BTW is there an alpha/beta release of PG 7.1 ?

Franck Martin
Database Development Officer
SOPAC South Pacific Applied Geoscience Commission
Fiji
E-mail: franck@sopac.org <mailto:franck@sopac.org>
Web site: http://www.sopac.org/ <http://www.sopac.org/&gt;

This e-mail is intended for its recipients only. Do not forward this
e-mail without approval. The views expressed in this e-mail may not be
neccessarily the views of SOPAC.

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Wednesday, August 23, 2000 3:11 PM
To: Jeffrey A. Rhines
Cc: PostgreSQL::General List
Subject: Re: [GENERAL] [Solved] SQL Server to PostgreSQL

"Jeffrey A. Rhines" <jrhines@email.com> writes:

Uh ... what's wrong with varchar(n) ?

I've wondered that myself, actually. What are the benefits and
drawbacks to going with one over the other, besides the obvious 255-char
field length limit for varchar?

AFAIK there has *never* been a 255-char limit on char or varchar in
pgsql ... you must be thinking of Some Other DBMS.

The limit for these datatypes in 7.0 and before is BLCKSZ less some
overhead --- ~8000 bytes in a default setup. Beginning in 7.1 it's
an essentially arbitrary number. I set it at 10Mb in current sources,
but there's no strong reason for that number over any other. In theory
it could be up to 1Gb, but as Jan Wieck points out in a nearby thread,
you probably wouldn't like the performance of shoving gigabyte-sized
text values around. We need to think about offering API functions that
will allow reading and writing huge field values in bite-sized chunks.

There's no essential performance difference between char(n), varchar(n),
and text in Postgres, given the same-sized data value. char(n)
truncates or blank-pads to exactly n characters; varchar(n) truncates
if more than n characters; text never truncates nor pads. Beyond that
they are completely identical in storage requirements. Pick one based
on the semantics you want for your application.

regards, tom lane

#9Craig Johannsen
cjohan@home.com
In reply to: Tressens Lionel (#2)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

I think the ODBC spec limits varchar to 255 bytes.
Some ODBC drivers enforce that limit.

Tom Lane wrote:

Show quoted text

"Jeffrey A. Rhines" <jrhines@email.com> writes:

Uh ... what's wrong with varchar(n) ?

I've wondered that myself, actually. What are the benefits and
drawbacks to going with one over the other, besides the obvious 255-char
field length limit for varchar?

AFAIK there has *never* been a 255-char limit on char or varchar in
pgsql ... you must be thinking of Some Other DBMS.

[snip]
regards, tom lane

#10Martijn van Oosterhout
kleptog@svana.org
In reply to: Franck Martin (#8)
general
Re: [Solved] SQL Server to PostgreSQL

Franck Martin wrote:

As we are talking about 7.1 and huge field size...

MS-SQL has a function that allows you to retreive part of a field a kind of
mid$(field, start, length). This would be a good addition indeed.

does substr() not handle this?

last problem. PG does not allow to store binary data. I'm not talking about
the current implementation of BLOB, but what will happen in 7.1...

If I want to store an image in a field. I cannot do that in 7.1 because the
data sent by 7.1 and received in libpq must be formated in ASCII. I haven't
play around to see if I could create a user type called varbinary(n), which
will output via varbinary_out just the content of a buffer... May be varchar
does it already (even if there is a \0?).

One thing I've thought about is creating an escape char to delimit this
sort
of thing. Maybe a control-A, followed by four bytes giving the length
followed
by that many byes of data. Escape \0's with ^A0 and a real ^A with ^A^A.

I would love something like this because I always get tripped up by
fields
I'm inserting containing quotes and other characters that need to be
escaped.

I know I should submit this problem to the hacker list, but I don't want to
subscribe to hacker just to submit one message...

Heh, I know what you mean...

--
Martijn van Oosterhout <kleptog@cupid.suninternet.com>
http://cupid.suninternet.com/~kleptog/

In reply to: Tom Lane (#7)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

*** Tom Lane <tgl@sss.pgh.pa.us> [Tuesday, 22.August.2000, 23:11 -0400]:

There's no essential performance difference between char(n), varchar(n),
and text in Postgres, given the same-sized data value. char(n)
truncates or blank-pads to exactly n characters; varchar(n) truncates
if more than n characters; text never truncates nor pads. Beyond that
they are completely identical in storage requirements.

[.rs.]

Does varchar(188) takes 188 bytes (+ bytes for length storage) every
time, no matter if it contains 'my text' or 'my long 188 char text.....'
?

--
radoslaw.stachowiak.........................................http://alter.pl/

#12Martin Christensen
knightsofspamalot-factotum@mail1.stofanet.dk
In reply to: Radoslaw Stachowiak (#11)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

"Radoslaw" == Radoslaw Stachowiak <radek@alter.pl> writes:

Radoslaw> Does varchar(188) takes 188 bytes (+ bytes for length
Radoslaw> storage) every time, no matter if it contains 'my text' or
Radoslaw> 'my long 188 char text.....' ?

The way I understand it varchar(n) is variable-length, while char(n)
is fixed-lenght. Thus the behaviour you describe above is that of
char(n).

Martin

--
GPG public key: http://home1.stofanet.dk/factotum/gpgkey.txt

#13Jan Wieck
JanWieck@Yahoo.com
In reply to: Martin Christensen (#12)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

Martin Christensen wrote:

"Radoslaw" == Radoslaw Stachowiak <radek@alter.pl> writes:

Radoslaw> Does varchar(188) takes 188 bytes (+ bytes for length
Radoslaw> storage) every time, no matter if it contains 'my text' or
Radoslaw> 'my long 188 char text.....' ?

The way I understand it varchar(n) is variable-length, while char(n)
is fixed-lenght. Thus the behaviour you describe above is that of
char(n).

Right for any pre-7.1 version.

From 7.1 on the system will try to compress all types
internally stored as variable length (char(), varchar(), text
and some more). So the real amount of bytes for a char(188)
will be "at maximum 192 - probably less".

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#14Ian Turner
vectro@pipeline.com
In reply to: Jan Wieck (#13)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

From 7.1 on the system will try to compress all types
internally stored as variable length (char(), varchar(), text
and some more). So the real amount of bytes for a char(188)
will be "at maximum 192 - probably less".

Don't variable-length records incur a performance overhead? In this case,
ought I be able to specify the length for a record if I know ahead of time
it will be the same in every case? :o

Ian Turner
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.1 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE5sabgfn9ub9ZE1xoRAhayAKCwMjh/5tYlg8zZiAimJlgFSfCLsQCghBce
Gxx6X8sSwIACIHvdbxBsgGQ=
=bogc
-----END PGP SIGNATURE-----

#15Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ian Turner (#14)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

Ian Turner <vectro@pipeline.com> writes:

Don't variable-length records incur a performance overhead?

Only to the extent that the system can't cache offset information for
later columns in that table. While someone evidently once thought that
was worthwhile, I've never seen the column-access code show up as a
particularly hot spot in any profile I've run. I doubt you could
actually measure any difference, let alone show it to be important
enough to be worth worrying about.

In any case, char(n) will still do what you want for reasonable-size
records. The TOAST code only kicks in when the total tuple size exceeds
BLCKSZ/4 ... and at that point, compression is a good idea in any case.

Now that you mention it, though, doesn't TOAST break heapam's assumption
that char(n) is fixed length? Seems like we'd better either remove that
assumption or mark char(n) nontoastable. Any opinions which is better?

regards, tom lane

#16Jan Wieck
JanWieck@Yahoo.com
In reply to: Tom Lane (#15)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

Tom Lane wrote:

Now that you mention it, though, doesn't TOAST break heapam's assumption
that char(n) is fixed length? Seems like we'd better either remove that
assumption or mark char(n) nontoastable. Any opinions which is better?

Is the saved overhead from assuming char(n) is fixed really
that big that it's worth NOT to gain the TOAST advantages?
After the GB benchmarks we know that we have some spare
performance to waste for such things :-)

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck@Yahoo.com #

#17Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jan Wieck (#16)
hackersgeneral
Viability of VARLENA_FIXED_SIZE()

Jan Wieck <janwieck@Yahoo.com> writes:

Tom Lane wrote:

Now that you mention it, though, doesn't TOAST break heapam's assumption
that char(n) is fixed length? Seems like we'd better either remove that
assumption or mark char(n) nontoastable. Any opinions which is better?

Is the saved overhead from assuming char(n) is fixed really
that big that it's worth NOT to gain the TOAST advantages?

No, I don't think so. Instead of pulling out the code entirely,
however, we could extend the VARLENA_FIXED_SIZE macro to also check
whether attstorage = 'p' before reporting that a char(n) field is
fixed-size. Then someone who's really intent on keeping the old
behavior could hack the attribute entry to make it so.

I seem to recall that your original idea for TOAST included an ALTER
command to allow adjustment of attstorage settings, but that didn't
get done did it? Seems like it would be risky to change the setting
except on an empty table.

Not sure if any of this is worth keeping, or if we should just simplify
the code in heaptuple.c to get rid of the notion of "fixed size"
varlena attributes. It's certainly not going to be a mainstream case
anymore, so I question whether the check has any hope of saving more
cycles than it costs. Yet it seems a shame to wipe out this hack
entirely...

regards, tom lane

#18Thomas Lockhart
lockhart@alumni.caltech.edu
In reply to: Jan Wieck (#16)
hackersgeneral
Re: Viability of VARLENA_FIXED_SIZE()

Not sure if any of this is worth keeping, or if we should just simplify
the code in heaptuple.c to get rid of the notion of "fixed size"
varlena attributes. It's certainly not going to be a mainstream case
anymore, so I question whether the check has any hope of saving more
cycles than it costs. Yet it seems a shame to wipe out this hack
entirely...

Not sure if this is relevant (but when does that stop me ;):

The only truly "fixed length" string from a storage standpoint is for
single-byte encodings (and Unicode, I suppose). Eventually, we will need
the notion of both "octet length" *and* "character length" in our
backend code, and for non-ASCII encodings nothing will be of fixed octet
length anyway.

- Thomas

#19Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#15)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

[ Charset ISO-8859-1 unsupported, converting... ]

Ian Turner <vectro@pipeline.com> writes:

Don't variable-length records incur a performance overhead?

Only to the extent that the system can't cache offset information for
later columns in that table. While someone evidently once thought that
was worthwhile, I've never seen the column-access code show up as a
particularly hot spot in any profile I've run. I doubt you could
actually measure any difference, let alone show it to be important
enough to be worth worrying about.

It clearly is a hot-spot. That monster macro, fastgetattr(), in
heapam.h is in there for a reason. It accounts for about 5% for straight
sequential scan case, last I heard from someone who ran a test.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#20Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#15)
hackersgeneral
Re: [Solved] SQL Server to PostgreSQL

Ian Turner <vectro@pipeline.com> writes:

Don't variable-length records incur a performance overhead?

Only to the extent that the system can't cache offset information for
later columns in that table. While someone evidently once thought that
was worthwhile, I've never seen the column-access code show up as a
particularly hot spot in any profile I've run. I doubt you could
actually measure any difference, let alone show it to be important
enough to be worth worrying about.

In any case, char(n) will still do what you want for reasonable-size
records. The TOAST code only kicks in when the total tuple size exceeds
BLCKSZ/4 ... and at that point, compression is a good idea in any case.

My logic is that I use char() when I want the length to be fixed, like
2-letter state codes, and varchar() for others where I just want a
maximum allowed, like last name. I use text for arbitrary length stuff.
Tom is right that though there is a small performance difference, it is
better just to use the right type.

Now that you mention it, though, doesn't TOAST break heapam's assumption
that char(n) is fixed length? Seems like we'd better either remove that
assumption or mark char(n) nontoastable. Any opinions which is better?

I am sure Jan handled that.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
#21Bruce Momjian
bruce@momjian.us
In reply to: Jan Wieck (#16)
hackersgeneral
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#21)
general