Optimizing large data loads

Started by John Wellsover 20 years ago3 messagesgeneral
Jump to latest
#1John Wells
jb@sourceillustrated.com

Hi guys,

We have a Java process that uses Hibernate to load approximately 14 GB of
data. One a dual-proc 2.4 GHZ Xeon with 2048 MB RAM, it's currently
taking over 13 hours to load (PostgreSQL 7.4.8). We're flushing from
hibernate every 50 records.

I've turned fsync to false in postgresql.conf, and we've turned autocommit
off in our code. Is there anything else I could try to temporarily speed
up inserts?

Thanks very much for your help.

John

#2Richard Huxton
dev@archonet.com
In reply to: John Wells (#1)
Re: Optimizing large data loads

John Wells wrote:

Hi guys,

We have a Java process that uses Hibernate to load approximately 14 GB of
data. One a dual-proc 2.4 GHZ Xeon with 2048 MB RAM, it's currently
taking over 13 hours to load (PostgreSQL 7.4.8). We're flushing from
hibernate every 50 records.

I've turned fsync to false in postgresql.conf, and we've turned autocommit
off in our code. Is there anything else I could try to temporarily speed
up inserts?

You don't say what the limitations of Hibernate are. Usually you might
look to:
1. Use COPY not INSERTs
2. If not, block INSERTS into BEGIN/COMMIT transactions of say 100-1000
3. Turn fsync off
4. DROP/RESTORE constraints/triggers/indexes while you load your data
5. Increase sort_mem/work_mem in your postgresql.conf when recreating
indexes etc.
6. Use multiple processes to make sure the I/O is maxed out.

Any of those do-able?

--
Richard Huxton
Archonet Ltd

#3John Wells
jb@sourceillustrated.com
In reply to: Richard Huxton (#2)
Re: Optimizing large data loads

Richard Huxton said:

You don't say what the limitations of Hibernate are. Usually you might
look to:
1. Use COPY not INSERTs

Not an option, unfortunately.

2. If not, block INSERTS into BEGIN/COMMIT transactions of say 100-1000

We're using 50/commit...we can easily up this I suppose.

3. Turn fsync off

Done.

4. DROP/RESTORE constraints/triggers/indexes while you load your data

Hmmm...will have to think about this a bit...not a bad idea but not sure
how we can make it work in our situation.

5. Increase sort_mem/work_mem in your postgresql.conf when recreating
indexes etc.
6. Use multiple processes to make sure the I/O is maxed out.

5. falls in line with 4. 6. is definitely doable.

Thanks for the suggestions!

John