equivalent of sqlload?
Oracle has a facility for loading in massive data files, sqlload I think
is the name. I'd like to load some big data into PostgreSQL 6.4 for
testing purposes. Is there a way built in or would a perl or c hack be the
way to go?
john holland
Hello John,
mercoled�, 25 novembre 98, you wrote:
JH> Oracle has a facility for loading in massive data files, sqlload I think
JH> is the name. I'd like to load some big data into PostgreSQL 6.4 for
JH> testing purposes. Is there a way built in or would a perl or c hack be the
JH> way to go?
JH> john holland
use:
copy [binary] classname [with oids]
to|from 'filename'|stdin|stdout
[using delimiters 'delim']
or:
\copy classname from filename
-Jose'-
At 16:45 +0200 on 25/11/98, John Holland wrote:
Oracle has a facility for loading in massive data files, sqlload I think
is the name. I'd like to load some big data into PostgreSQL 6.4 for
testing purposes. Is there a way built in or would a perl or c hack be the
way to go?
Use the COPY command. It is preferable not to define indices (or to drop
the indices) on the table, then do the massive load with COPY, and then
define the indices.
Herouth
--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma
I am running v6.3.2 under Linux and have found that the "copy" command
works only for small amounts of data. When trying to "copy" several
thousand records I notice that system RAM and swap space continue to get
eaten until there is no further memory available. "psql" then fails.
What remains is a .../pgdata/base/XYZ file system with the table being
copied into. That table may be several (tens, hundreds) of Meg in size,
but a "psql -d XYS -c 'select count(*) table'" will only return a zero
count.
Now...I have broken the "copy" process down into smaller chunks and make
multiple calls to "copy". I have a total of about 5.4 million records and
the job isn't done yet...my Pentium 433 has been working on this copy for
over 24 hours.
I don't know if there are any changes that can be made to speed this type
of process up, but this is definitely a black-mark.
mike
Dr Michael A. Koerber
MIT/Lincoln Laboratory
781-981-3250
On Wed, 25 Nov 1998, Herouth Maoz wrote:
Show quoted text
At 16:45 +0200 on 25/11/98, John Holland wrote:
Oracle has a facility for loading in massive data files, sqlload I think
is the name. I'd like to load some big data into PostgreSQL 6.4 for
testing purposes. Is there a way built in or would a perl or c hack be the
way to go?Use the COPY command. It is preferable not to define indices (or to drop
the indices) on the table, then do the massive load with COPY, and then
define the indices.Herouth
--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma
I am running v6.3.2 under Linux and have found that the "copy" command
works only for small amounts of data.
i wouldn't say for only small amounts of data -- i've loaded over 5 million
records (700+ MB) into a table with copy. i don't know how long it took
because i just let it run overnight (it made a couple of indexes, too), but
it didn't crash (running on a PPro 180 with 96 MB RAM) and was done in the
morning.
When trying to "copy" several
thousand records I notice that system RAM and swap space continue to get
eaten until there is no further memory available. "psql" then fails.
What remains is a .../pgdata/base/XYZ file system with the table being
copied into. That table may be several (tens, hundreds) of Meg in size,
but a "psql -d XYS -c 'select count(*) table'" will only return a zero
count.
you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust it for
the 5 million record+ table)
I don't know if there are any changes that can be made to speed this type
of process up, but this is definitely a black-mark.
it is kind of ugly, but it gets the job done.
Import Notes
Resolved by subject fallback
Were am I looking for "limit" or "ulimit"?
mike
On Wed, 25 Nov 1998, Jeff Hoffmann wrote:
Show quoted text
you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust it for
the 5 million record+ table)
On Wed, 25 Nov 1998, Jeff Hoffmann wrote:
you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust it
for
the 5 million record+ table)
Were am I looking for "limit" or "ulimit"?
mike
sorry -- it's a shell thing. do a man on whatever shell you're using and it
should explain. C shells (and derivatives) use limit; Bourne shells (and
derivatives) use ulimit.
basically just kill the postmaster, run one of these commands, and restart
the postmaster and things should be fine...
for csh/tcsh:
% limit datasize 64m
for sh/bash:
$ ulimit -d 65536
there's also something in one of the FAQ's; search for "palloc" should find
it
jeff
Import Notes
Resolved by subject fallback
Thanks. The defaults for my system are
mak> ulimit -a
core file size (blocks) 1000000
data seg size (kbytes) unlimited
file size (blocks) unlimited
max memory size (kbytes) unlimited
stack size (kbytes) 8192
cpu time (seconds) unlimited
max user processes 256
pipe size (512 bytes) 8
open files 256
virtual memory (kbytes) 2105343
Seems that these should be okay.
Ideas?
tnx,
mike
On Wed, 25 Nov 1998, Jeff Hoffmann wrote:
Show quoted text
On Wed, 25 Nov 1998, Jeff Hoffmann wrote:
you probably ran out of memory for the server process. check out "limit"
(or "ulimit") -- you should be able to bump up the datasize to 64m or so
(that's what mine is normally set to; i don't think i had to adjust itfor
the 5 million record+ table)
Were am I looking for "limit" or "ulimit"?
mike
sorry -- it's a shell thing. do a man on whatever shell you're using and it
should explain. C shells (and derivatives) use limit; Bourne shells (and
derivatives) use ulimit.basically just kill the postmaster, run one of these commands, and restart
the postmaster and things should be fine...for csh/tcsh:
% limit datasize 64mfor sh/bash:
$ ulimit -d 65536there's also something in one of the FAQ's; search for "palloc" should find
itjeff
At 21:01 +0200 on 25/11/98, Michael A. Koerber wrote:
Now...I have broken the "copy" process down into smaller chunks and make
multiple calls to "copy". I have a total of about 5.4 million records and
the job isn't done yet...my Pentium 433 has been working on this copy for
over 24 hours.
Try putting all the separate COPYs in one transaction.
BEGIN TRANSACTION;
COPY...
COPY...
COPY...
END;
Without the surrounding transaction, each copy is a transaction and
requires transaction overhead. I don't know how much time this will save,
though. Are you sure you dropped all the indices? When you declare a
primary key it declares a unique index, so watch out for that as well.
Herouth
--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma
Now...I have broken the "copy" process down into smaller chunks and make
multiple calls to "copy". I have a total of about 5.4 million records and
the job isn't done yet...my Pentium 433 has been working on this copy for
over 24 hours.
I have seen pretty much the same effect, to test this I set up script that
copied in 10000 records at a time. The time it took for each copy
increased linearly (starting at about 2.0 seconds on my PII-233) by approx
0.5 seconds per 10000. That means that by the time you have 5 million
records it takes more than 4 minutes to insert 10000 records.
Yes, I checked, there were NO indices defined on the table.
Anton