equivalent of sqlload?

Started by John Hollandover 27 years ago10 messagesgeneral
Jump to latest
#1John Holland
jholland@isr.umd.edu

Oracle has a facility for loading in massive data files, sqlload I think
is the name. I'd like to load some big data into PostgreSQL 6.4 for
testing purposes. Is there a way built in or would a perl or c hack be the
way to go?

john holland

#2Sferacarta Software
sferac@bo.nettuno.it
In reply to: John Holland (#1)
Re: [GENERAL] equivalent of sqlload?

Hello John,

mercoled�, 25 novembre 98, you wrote:

JH> Oracle has a facility for loading in massive data files, sqlload I think
JH> is the name. I'd like to load some big data into PostgreSQL 6.4 for
JH> testing purposes. Is there a way built in or would a perl or c hack be the
JH> way to go?

JH> john holland

use:
copy [binary] classname [with oids]
to|from 'filename'|stdin|stdout
[using delimiters 'delim']

or:
\copy classname from filename

-Jose'-

#3Herouth Maoz
herouth@oumail.openu.ac.il
In reply to: John Holland (#1)
Re: [GENERAL] equivalent of sqlload?

At 16:45 +0200 on 25/11/98, John Holland wrote:

Oracle has a facility for loading in massive data files, sqlload I think
is the name. I'd like to load some big data into PostgreSQL 6.4 for
testing purposes. Is there a way built in or would a perl or c hack be the
way to go?

Use the COPY command. It is preferable not to define indices (or to drop
the indices) on the table, then do the massive load with COPY, and then
define the indices.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

#4Michael A. Koerber
mak@ll.mit.edu
In reply to: Herouth Maoz (#3)
Re: [GENERAL] equivalent of sqlload?

I am running v6.3.2 under Linux and have found that the "copy" command
works only for small amounts of data. When trying to "copy" several
thousand records I notice that system RAM and swap space continue to get
eaten until there is no further memory available. "psql" then fails.
What remains is a .../pgdata/base/XYZ file system with the table being
copied into. That table may be several (tens, hundreds) of Meg in size,
but a "psql -d XYS -c 'select count(*) table'" will only return a zero
count.

Now...I have broken the "copy" process down into smaller chunks and make
multiple calls to "copy". I have a total of about 5.4 million records and
the job isn't done yet...my Pentium 433 has been working on this copy for
over 24 hours.

I don't know if there are any changes that can be made to speed this type
of process up, but this is definitely a black-mark.

mike

Dr Michael A. Koerber
MIT/Lincoln Laboratory
781-981-3250

On Wed, 25 Nov 1998, Herouth Maoz wrote:

Show quoted text

At 16:45 +0200 on 25/11/98, John Holland wrote:

Oracle has a facility for loading in massive data files, sqlload I think
is the name. I'd like to load some big data into PostgreSQL 6.4 for
testing purposes. Is there a way built in or would a perl or c hack be the
way to go?

Use the COPY command. It is preferable not to define indices (or to drop
the indices) on the table, then do the massive load with COPY, and then
define the indices.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

#5Jeff Hoffmann
jeff@remapcorp.com
In reply to: Michael A. Koerber (#4)
Re: [GENERAL] equivalent of sqlload?

I am running v6.3.2 under Linux and have found that the "copy" command
works only for small amounts of data.

i wouldn't say for only small amounts of data -- i've loaded over 5 million
records (700+ MB) into a table with copy. i don't know how long it took
because i just let it run overnight (it made a couple of indexes, too), but
it didn't crash (running on a PPro 180 with 96 MB RAM) and was done in the
morning.

When trying to "copy" several
thousand records I notice that system RAM and swap space continue to get
eaten until there is no further memory available. "psql" then fails.
What remains is a .../pgdata/base/XYZ file system with the table being
copied into. That table may be several (tens, hundreds) of Meg in size,
but a "psql -d XYS -c 'select count(*) table'" will only return a zero
count.

you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust it for
the 5 million record+ table)

I don't know if there are any changes that can be made to speed this type
of process up, but this is definitely a black-mark.

it is kind of ugly, but it gets the job done.

#6Michael A. Koerber
mak@ll.mit.edu
In reply to: Jeff Hoffmann (#5)
Re: [GENERAL] equivalent of sqlload?

Were am I looking for "limit" or "ulimit"?

mike

On Wed, 25 Nov 1998, Jeff Hoffmann wrote:

Show quoted text

you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust it for
the 5 million record+ table)

#7Jeff Hoffmann
jeff@remapcorp.com
In reply to: Michael A. Koerber (#6)
Re: [GENERAL] equivalent of sqlload?

On Wed, 25 Nov 1998, Jeff Hoffmann wrote:

you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust it

for

the 5 million record+ table)

Were am I looking for "limit" or "ulimit"?

mike

sorry -- it's a shell thing. do a man on whatever shell you're using and it
should explain. C shells (and derivatives) use limit; Bourne shells (and
derivatives) use ulimit.

basically just kill the postmaster, run one of these commands, and restart
the postmaster and things should be fine...

for csh/tcsh:
% limit datasize 64m

for sh/bash:
$ ulimit -d 65536

there's also something in one of the FAQ's; search for "palloc" should find
it

jeff

#8Michael A. Koerber
mak@ll.mit.edu
In reply to: Jeff Hoffmann (#7)
Re: [GENERAL] equivalent of sqlload?

Thanks. The defaults for my system are
mak> ulimit -a
core file size (blocks) 1000000
data seg size (kbytes) unlimited
file size (blocks) unlimited
max memory size (kbytes) unlimited
stack size (kbytes) 8192
cpu time (seconds) unlimited
max user processes 256
pipe size (512 bytes) 8
open files 256
virtual memory (kbytes) 2105343

Seems that these should be okay.

Ideas?
tnx,

mike

On Wed, 25 Nov 1998, Jeff Hoffmann wrote:

Show quoted text

On Wed, 25 Nov 1998, Jeff Hoffmann wrote:

you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust it

for

the 5 million record+ table)

Were am I looking for "limit" or "ulimit"?

mike

sorry -- it's a shell thing. do a man on whatever shell you're using and it
should explain. C shells (and derivatives) use limit; Bourne shells (and
derivatives) use ulimit.

basically just kill the postmaster, run one of these commands, and restart
the postmaster and things should be fine...

for csh/tcsh:
% limit datasize 64m

for sh/bash:
$ ulimit -d 65536

there's also something in one of the FAQ's; search for "palloc" should find
it

jeff

#9Herouth Maoz
herouth@oumail.openu.ac.il
In reply to: Michael A. Koerber (#4)
Re: [GENERAL] equivalent of sqlload?

At 21:01 +0200 on 25/11/98, Michael A. Koerber wrote:

Now...I have broken the "copy" process down into smaller chunks and make
multiple calls to "copy". I have a total of about 5.4 million records and
the job isn't done yet...my Pentium 433 has been working on this copy for
over 24 hours.

Try putting all the separate COPYs in one transaction.

BEGIN TRANSACTION;
COPY...
COPY...
COPY...
END;

Without the surrounding transaction, each copy is a transaction and
requires transaction overhead. I don't know how much time this will save,
though. Are you sure you dropped all the indices? When you declare a
primary key it declares a unique index, so watch out for that as well.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma

#10Anton de Wet
adw@obsidian.co.za
In reply to: Herouth Maoz (#9)
Re: [GENERAL] equivalent of sqlload?

Now...I have broken the "copy" process down into smaller chunks and make
multiple calls to "copy". I have a total of about 5.4 million records and
the job isn't done yet...my Pentium 433 has been working on this copy for
over 24 hours.

I have seen pretty much the same effect, to test this I set up script that
copied in 10000 records at a time. The time it took for each copy
increased linearly (starting at about 2.0 seconds on my PII-233) by approx
0.5 seconds per 10000. That means that by the time you have 5 million
records it takes more than 4 minutes to insert 10000 records.

Yes, I checked, there were NO indices defined on the table.

Anton