Recommended technique for large imports?

Started by Stephen Baconover 23 years ago3 messagesgeneral
Jump to latest
#1Stephen Bacon
sbacon@13x.com

Hello,

I'm running a tomcat-based web app (Tomcat 3.3 under Linux 7.3) with
PostgreSQL (7.2.1) as the back end. I need to add new import
functionality. From previous importer experience with this site, I'm
worried that it can take so long that the user's browser times out
waiting for the process to complete (only ever happens when they're
importing a lot of records when the system is under heavy demand - the
main set of tables have a lot of indexes, so the loop / insert method
can take a bit).

Of course the data gets in there, but the user can end up with a
404-type of error anyways and no one likes to see that.

Now I know the COPY command is much faster because it doesn't update the
indexes after every row insert, but building that and passing it via
jdbc seems iffy (or C, PHP, etc. for that matter).

Can anyone give a recommended technique for this sort of process?

Basically (I think) I need to do something like:

Start transaction
Turn off indexing for this transaction
loop 1..n
insert record X
end loop
Turn indexing back on
Commit / End transaction

thanks,
-Steve

(appologies for the cross-post, but I figured it's not specifically jdbc
related)

#2Jeff Davis
pgsql@j-davis.com
In reply to: Stephen Bacon (#1)
Re: Recommended technique for large imports?

You didn't seem concerned about the upload time of the data, wouldn't that
take a while as well?

Anyway, assuming that you can get the uploaded file properly before the
browser times out, one thing you can do is throw all of the data in a temp
file and copy from there.

You can even do that via an async query so that your application doesn't wait
for the import to finish. Note: there might be some issues with async queries
if you want to close the connection to the backend when the HTTP connection
is finished (i.e. when your script is done), or if you want to return an
error if the import fails. I suggest reading the docs before messing too much
with async queryies.

You can also try doing what you suggested below. I think that's the second
fastest way (copy being the fastest).

Regards,
Jeff

Show quoted text

On Saturday 14 September 2002 02:22 pm, you wrote:

Hello,

I'm running a tomcat-based web app (Tomcat 3.3 under Linux 7.3) with
PostgreSQL (7.2.1) as the back end. I need to add new import
functionality. From previous importer experience with this site, I'm
worried that it can take so long that the user's browser times out
waiting for the process to complete (only ever happens when they're
importing a lot of records when the system is under heavy demand - the
main set of tables have a lot of indexes, so the loop / insert method
can take a bit).

Of course the data gets in there, but the user can end up with a
404-type of error anyways and no one likes to see that.

Now I know the COPY command is much faster because it doesn't update the
indexes after every row insert, but building that and passing it via
jdbc seems iffy (or C, PHP, etc. for that matter).

Can anyone give a recommended technique for this sort of process?

Basically (I think) I need to do something like:

Start transaction
Turn off indexing for this transaction
loop 1..n
insert record X
end loop
Turn indexing back on
Commit / End transaction

thanks,
-Steve

(appologies for the cross-post, but I figured it's not specifically jdbc
related)

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

#3Sam Varshavchik
mrsam@courier-mta.com
In reply to: Stephen Bacon (#1)
Re: Recommended technique for large imports?

Stephen Bacon writes:

Now I know the COPY command is much faster because it doesn't update the
indexes after every row insert, but building that and passing it via
jdbc seems iffy (or C, PHP, etc. for that matter).

I think someone was working on a COPY implementation for jdbc, but I don't
think it's there yet.

Can anyone give a recommended technique for this sort of process?

Feed a few thousand INSERTs to addBatch(), then call executeBatch(). That
seems to be the fastest way to import data, at this time.