PostgreSQL Performance issue
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data
and populates them into tables in Postgres database.
I have 622,000 number of records but it is taking almost 4 and half
hours to load these data into the tables.
I have a simple function in db which is being called from Java batch
program to populate the records into tables from flat files.
I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive - 454 GB & D drive 1.99 TB]
and I have set my postgresql.conf parameters as below.
======================================================================
#-----------------------------------------------------------------------
-------
# RESOURCE USAGE (except WAL)
#-----------------------------------------------------------------------
-------
# - Memory -
shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100
work_mem = 512MB
maintenance_work_mem = 512MB
# - Free Space Map -
max_fsm_pages = 1600000
max_fsm_relations = 10000
'
#-----------------------------------------------------------------------
-------
# WRITE AHEAD LOG
#-----------------------------------------------------------------------
-------
wal_buffers = 5MB # min 32kB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
#-----------------------------------------------------------------------
-------
# QUERY TUNING
#-----------------------------------------------------------------------
-------
# - Planner Method Configuration -
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
effective_cache_size = 8GB
========================================================================
Please advise me the best or optimum way setting these parameters to
achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high ,
Postgres is throwing an error "shared_buffer size cannot be more than
size_t"
It would be very grateful, if anyone can help me on this.
Many thanks
On 27 April 2010 09:11, <A.Bhattacharya@sungard.com> wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data
and populates them into tables in Postgres database.I have *622,000 number of records *but it is taking almost *4 and half
hours* to load these data into the tables.I have a simple function in db which is being called from Java batch
program to populate the records into tables from flat files.I have the below system configuration for my database server.
Database Server
*PostgreSQL v8.3.5*
Operating System
*Windows 2003 Server 64 bit, Service Pack 2*
CPU
*2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz*
Memory
*16 GB RAM*
Disk Space
*total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]*
and I have set my postgresql.conf parameters as below.
======================================================================
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
# - Memory -
shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100
work_mem = 512MB
maintenance_work_mem = 512MB
# - Free Space Map -
max_fsm_pages = 1600000
max_fsm_relations = 10000
‘#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_buffers = 5MB # min 32kB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
# - Planner Method Configuration -
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
effective_cache_size = 8GB
========================================================================
Please advise me the best or optimum way setting these parameters to
achieve better performance.Also note that, when I am setting my *shared_buffer = 2GB or high , *Postgres
is throwing an error “*shared_buffer size cannot be more than size_t*”It would be very grateful, if anyone can help me on this.
Many thanks
Are these all being sent in 1 transaction? Can't you use COPY to bulk
insert into the database? If not, can you insert in batches (like 1,000 at
a time) Have you got any triggers/constraints/complicated domains on the
table you're inserting into?
Thom
Most likely you are inserting one per transaction. Set autocommit to
false and commit only after all the inserts are done.
-n.
Show quoted text
On 27-04-2010 13:41, A.Bhattacharya@sungard.com wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the
data and populates them into tables in Postgres database.I have *622,000 number of records *but it is taking almost *4 and half
hours* to load these data into the tables.I have a simple function in db which is being called from Java batch
program to populate the records into tables from flat files.I have the below system configuration for my database server.
Database Server
*PostgreSQL v8.3.5*
Operating System
*Windows 2003 Server 64 bit, Service Pack 2*
CPU
*2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz*
Memory
*16 GB RAM*
Disk Space
*total 2.5 TB [C drive -- 454 GB & D drive 1.99 TB]*
and I have set my postgresql.conf parameters as below.
======================================================================
#------------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#------------------------------------------------------------------------------
# - Memory -
shared_buffers = 1GB
temp_buffers = 256MB
max_prepared_transactions = 100
work_mem = 512MB
maintenance_work_mem = 512MB
# - Free Space Map -
max_fsm_pages = 1600000
max_fsm_relations = 10000
'#------------------------------------------------------------------------------
# WRITE AHEAD LOG
#------------------------------------------------------------------------------
wal_buffers = 5MB # min 32kB
checkpoint_segments = 32
checkpoint_completion_target = 0.9
#------------------------------------------------------------------------------
# QUERY TUNING
#------------------------------------------------------------------------------
# - Planner Method Configuration -
enable_hashagg = on
enable_hashjoin = on
enable_indexscan = on
enable_mergejoin = on
enable_nestloop = on
enable_seqscan = on
enable_sort = on
effective_cache_size = 8GB
========================================================================
Please advise me the best or optimum way setting these parameters to
achieve better performance.Also note that, when I am setting my *shared_buffer = 2GB or high ,
*Postgres is throwing an error "/shared_buffer size cannot be more
than size_t/"It would be very grateful, if anyone can help me on this.
Many thanks
On Tue, Apr 27, 2010 at 10:11, <A.Bhattacharya@sungard.com> wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
You really need to upgrade. At least to 8.3.10. It has many important
bugfixes.
However, I have a batch program written in Java which processes the data
and populates them into tables in Postgres database.I have *622,000 number of records *but it is taking almost *4 and half
hours* to load these data into the tables.I have a simple function in db which is being called from Java batch
program to populate the records into tables from flat files.I have the below system configuration for my database server.
Database Server
*PostgreSQL v8.3.5*
Operating System
*Windows 2003 Server 64 bit, Service Pack 2*
CPU
*2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz*
Memory
*16 GB RAM*
Disk Space
*total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]*
The interesting point is not how much disk you have, but what configuration
you have it in. Eitehr way, 622k records in 4 1/2 hours is obviously crappy
even for a single disk though.
and I have set my postgresql.conf parameters as below.
shared_buffers = 1GB
You might want to try to lower that one drastically, say 128Mb. In some
cases, this has been known to give better performance on Windows. not in all
case though, so you have to try it out.
temp_buffers = 256MB
max_prepared_transactions = 100
Are you really using 2-phase commit on the system? If not, turn this off.
This is prepared transactions, not prepared statements.
========================================================================
Please advise me the best or optimum way setting these parameters to
achieve better performance.Also note that, when I am setting my *shared_buffer = 2GB or high , *Postgres
is throwing an error “*shared_buffer size cannot be more than size_t*”
That is normal since your binary is 32-bit. In fact, having shared_buffers
at 1Gb may give you some trouble with your fairly high work_mem as well, as
the *total* amount of memory in the process is limited. That's another
reason to try a lower shared_buffers.
(other than that, read the comments from Thom)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
A.Bhattacharya@sungard.com wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the
data and populates them into tables in Postgres database.I have *622,000 number of records *but it is taking almost *4 and half
hours* to load these data into the tables.I have a simple function in db which is being called from Java batch
program to populate the records into tables from flat files.I have the below system configuration for my database server.
Database Server
*PostgreSQL v8.3.5*
Operating System
*Windows 2003 Server 64 bit, Service Pack 2*
CPU
*2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz*
Memory
*16 GB RAM*
Disk Space
*total 2.5 TB [C drive – 454 GB & D drive 1.99 TB]*
...
with one thread doing inserts, the other 7 cores will be idle. but
you're almost certainly disk IO bound.
OTOH, if you're calling a function (is that pl-pgsql, pl-java, pl-perl,
or what?) for each insert, you could be compute bound on that single
core. really depends on what that function is doing.
Thanks a lot for your help. However I am new to Postgres database
therefore it would be nice if you can let me know how to set autocommit
off.
I know from psql client issuing "\set Autocommit Off" would set it off
but unfortunately it doesn't set it off.
Many thanks for your help.
From: Magnus Hagander [mailto:magnus@hagander.net]
Sent: Tuesday, April 27, 2010 2:19 PM
To: Bhattacharya, A
Cc: pgsql-general
Subject: Re: [GENERAL] PostgreSQL Performance issue
On Tue, Apr 27, 2010 at 10:11, <A.Bhattacharya@sungard.com> wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
You really need to upgrade. At least to 8.3.10. It has many important
bugfixes.
However, I have a batch program written in Java which processes
the data and populates them into tables in Postgres database.
I have 622,000 number of records but it is taking almost 4 and
half hours to load these data into the tables.
I have a simple function in db which is being called from Java
batch program to populate the records into tables from flat files.
I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive - 454 GB & D drive 1.99 TB]
The interesting point is not how much disk you have, but what
configuration you have it in. Eitehr way, 622k records in 4 1/2 hours is
obviously crappy even for a single disk though.
and I have set my postgresql.conf parameters as below.
shared_buffers = 1GB
You might want to try to lower that one drastically, say 128Mb. In some
cases, this has been known to give better performance on Windows. not in
all case though, so you have to try it out.
temp_buffers = 256MB
max_prepared_transactions = 100
Are you really using 2-phase commit on the system? If not, turn this
off. This is prepared transactions, not prepared statements.
========================================================================
Please advise me the best or optimum way setting these
parameters to achieve better performance.
Also note that, when I am setting my shared_buffer = 2GB or high
, Postgres is throwing an error "shared_buffer size cannot be more than
size_t"
That is normal since your binary is 32-bit. In fact, having
shared_buffers at 1Gb may give you some trouble with your fairly high
work_mem as well, as the *total* amount of memory in the process is
limited. That's another reason to try a lower shared_buffers.
(other than that, read the comments from Thom)
--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/
On 27 Apr 2010, at 10:11, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
Wow, that's averaging almost 40s per insert! I don't think those inserts not being in a single transaction explains that much of a delay. Not that changing that doesn't help, but there's more going on here.
Maybe you got some foreign key constraints involved that don't have indices on the foreign keys? In that case I expect that you either have a lot of foreign references, only a few but from rather large tables (several million rows at least) or something in between?
Any other constraints that could be relevant?
I have a simple function in db which is being called from Java batch program to populate the records into tables from flat files.
Did you verify that most of the time is spent waiting on the database?
I'm not entirely sure what you mean by the above. Do you have a batch program that starts the java interpreter for each flat file?
Are they running synchronously (one after the other) or parallel? In the latter case you may be waiting on locks.
Is the data that your program needs to insert in one line in the flat file or does it need to collect data from multiple lines throughout the file?
How much memory does your java program use, could it be that it causes postgres to be swapped out?
Did you do any benchmarking on your "simple function in db" or on the queries it performs (assuming it does perform any)?
I have the below system configuration for my database server.
Database Server
PostgreSQL v8.3.5
Operating System
Windows 2003 Server 64 bit, Service Pack 2
CPU
2 * Quad Core AMD Opteron Processor 2376 @ 2.29 GHz
Memory
16 GB RAM
Disk Space
total 2.5 TB [C drive � 454 GB & D drive 1.99 TB]
A system like that should be able to insert that small a number of records in no time.
Also note that, when I am setting my shared_buffer = 2GB or high , Postgres is throwing an error �shared_buffer size cannot be more than size_t�
That's odd... Is this a 32-bit Postgres build or is a 64-bit Windows incapable of assigning more than a 32-bit number for the amount of shared memory? Are you running in some kind of 32-bit compatibility mode maybe (PAE comes to mind)?
That said, I haven't used Windows for anything more serious than gaming since last century - I'm not exactly an expert on its behaviour.
Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.
!DSPAM:737,4bd6abc310411173714063!
I am curious to know how much of your delay is due to PostgreSQL and how
much to your Java batch program. If you comment out the call to the
database function, so that you are reading your input file but not doing
anything with the data, how long does your batch program take to run?
RobR
On 27 Apr 2010, at 11:15, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote:
Thanks a lot for your help. However I am new to Postgres database therefore it would be nice if you can let me know how to set autocommit off.
I know from psql client issuing �\set Autocommit Off� would set it off but unfortunately it doesn�t set it off.
It's a client-side setting, not a server-side one. Hence you need to turn it off in your java program, probably in the database connection settings.
P.S. Your message was so mangled that I couldn't find which part of the original message you were referring to, so I deleted the remainder. Top-posting is considered bad form in mailing-lists.
Alban Hertroys
--
Screwing up is an excellent way to attach something to the ceiling.
!DSPAM:737,4bd6e4f110411684215286!
On Tue, Apr 27, 2010 at 5:17 AM, Alban Hertroys
<dalroi@solfertje.student.utwente.nl> wrote:
On 27 Apr 2010, at 10:11, <A.Bhattacharya@sungard.com> <A.Bhattacharya@sungard.com> wrote:
Dear All Experts,
I am using in PostgreSQL 8.3.5 database on windows 64 bit OS.
However, I have a batch program written in Java which processes the data and populates them into tables in Postgres database.I have 622,000 number of records but it is taking almost 4 and half hours to load these data into the tables.
Wow, that's averaging almost 40s per insert! I don't think those inserts not being in a single transaction explains that much of a delay. Not that changing that doesn't help, but there's more going on here.
You got that backwards: it's 40 inserts/sec. This is about par for
the course on windows style fsync on slow hardware iirc. Solution is
to use transactions, or play with fsync.
merlin
Thanks a lot for your help. However I am new to Postgres database
therefore it would be nice if you can let me know how to set autocommit
off.
I know from psql client issuing �\set Autocommit Off� would set it off but
unfortunately it doesn�t set it off.
It's a client-side setting, not a server-side one. Hence you need to turn it
off in your java program, probably in the database connection settings.
P.S. Your message was so mangled that I couldn't find which part of the
original message you were referring to, so I deleted the remainder.
Top-posting is considered bad form in mailing-lists.
Alban Hertroys
--------------------------------------------------------------------------------------------------------
Actually, top posting is considered bad form in THIS mailing list. All the
others I participate in top post regularly. That said though, it is
considered good form to post in the manner expected by the list that one is
posting to. ;-)
Bayless
A.Bhattacharya@sungard.com wrote:
I have *622,000 number of records *but it is taking almost *4 and half
hours* to load these data into the tables.I have a simple function in db which is being called from Java batch
program to populate the records into tables from flat files.
Four likely possibilities here, in order of how easy they are to test
for and therefore resolve:
1) Your hardware doesn't handle commits very well. You can turn off the
synchronous_commit command while doing the data loading to see if that
helps. See http://www.postgresql.org/docs/8.3/static/wal-async-commit.html
2) You're doing a commit after every single transaction. See
http://java.sun.com/docs/books/tutorial/jdbc/basics/transactions.html
for an intro to disabling this. Common practice is to commit every 100
to 1000 transactions instead.
3) The simple function in your database is executing very slowly.
4) The design of the database involves a lot of overhead, such as
difficult to process foreign keys or an excess of indexes on some tables.
--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.us
2010/4/27 Greg Smith <greg@2ndquadrant.com>:
A.Bhattacharya@sungard.com wrote:
I have *622,000 number of records *but it is taking almost *4 and half
hours* to load these data into the tables.
Without the schema and the queries, all you can get is guessing.
--
Vincenzo Romano
NotOrAnd Information Technologies
NON QVIETIS MARIBVS NAVTA PERITVS
I am curious to know how much of your delay is due to PostgreSQL and how
much to your Java batch program. If you comment out the call to the
database function, so that you are reading your input file but not doing
anything with the data, how long does your batch program take to run?
RobR
--
The Java program hardly takes a minute to process all the flat files but
at the time inserting the records into the db, the entire process takes
more than 4 hours.
Many thanks
Hello there,
1. Try using COPY Command, you will see significant decrease in the loading
time.
2. Turn off auto commit and Remove foreign key constraints if it is only one
time load - this will also help in decreasing the load time.
Try these options and let us know how it went.
We load around 6M rows of data into a table using copy command it takes few
mins to load the data and system configuration is not that high too. Also
one more thing we use linux box over here.
Do a small test as to how long will it take to do 1000 inserts into a
similar table and send us your timings and definition of the table. I will
compare against mine.
Thanks
Deepak
On Tue, Apr 27, 2010 at 10:09 PM, <A.Bhattacharya@sungard.com> wrote:
Show quoted text
I am curious to know how much of your delay is due to PostgreSQL and how
much to your Java batch program. If you comment out the call to the
database function, so that you are reading your input file but not doing
anything with the data, how long does your batch program take to run?RobR
--
The Java program hardly takes a minute to process all the flat files but at
the time inserting the records into the db, the entire process takes more
than 4 hours.Many thanks