Tuning / performance questions
Hello experts! (and other like me).
**
We have a system, that due to recent events is getting a lot heavier use.
The application makes extensive use of functions. These functions would
typically run sub-second but now can take several seconds.
I'm wondering what general ways there are to monitor and improve
performance? We look at pgadmin's server status but that only sees the
function being run. Additionally, is there a good way to 'explain' a
function? Or do you have to execute the function steps individually and
explain those?
The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a virtual
machine.
Running pg_version returns 'PostgreSQL 9.1.0 on x86_64-unknown-linux-gnu,
compiled by gcc (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision
167585], 64-bit' and select pg_size_pretty(pg_database_size('nrgdb'));
returns 63 GB.
The server typically has up to 500 connections with a max of 750
connections.
Below are the non-default values of our configuration file.
Any thoughts on what we should look at?
Thanks,
Bryan
listen_addresses = '*' # what IP address(es) to listen on;
max_connections = 750 # (change requires restart)
superuser_reserved_connections = 9 # (change requires restart)
shared_buffers = 8192MB # min 128kB or max_connections*16kB
temp_buffers = 64MB # min 800kB
max_prepared_transactions = 250 # can be 0 or more
work_mem = 512MB # min 64kB
maintenance_work_mem = 1GB # min 1MB
fsync = off # turns forced synchronization on
or off
full_page_writes = off # recover from partial page writes
wal_buffers = 16MB # min 32kB
commit_delay = 1000 # range 0-100000, in microseconds
commit_siblings = 5 # range 1-1000
checkpoint_segments = 50 # in logfile segments, min 1, 16MB
each
checkpoint_timeout = 5min # range 30s-1h
checkpoint_warning = 1min # 0 is off
effective_cache_size = 16GB
log_destination = 'stderr' # Valid values are combinations of
logging_collector = on
Log_directory = '/var/log/postgres' # Directory where log files
are written
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
log_rotation_age = 1d # Automatic rotation of logfiles
will
log_rotation_size = 20MB # Automatic rotation of logfiles
will
log_min_messages = info # Values, in order of decreasing detail:
log_min_duration_statement = 10000 # -1 is disabled, 0 logs all statements
log_line_prefix = '%t %p %u@%h: ' # Special values for Pgsi
log_statement = 'none' # none, ddl, mod, all
log_duration = off
autovacuum = on # enable autovacuum subprocess?
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error
message
lc_monetary = 'en_US.UTF-8' # locale for monetary
formatting
lc_numeric = 'en_US.UTF-8' # locale for number
formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
Bryan Montgomery wrote:
We have a system, that due to recent events is getting a lot
heavier use. The application makes extensive use of functions.
These functions would typically run sub-second but now can take
several seconds.
The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a
virtual machine.Running pg_version returns 'PostgreSQL 9.1.0 on
x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.5.1
20101208 [gcc-4_5-branch revision 167585], 64-bit' and select
pg_size_pretty(pg_database_size('nrgdb')); returns 63 GB.
The server typically has up to 500 connections with a max of 750
connections.
max_connections = 750 # (change requires restart)
temp_buffers = 64MB # min 800kB
work_mem = 512MB # min 64kB
750 * ((512 MB) + (64 MB)) = 421.87500 gigabytes
Once a connection uses memory for temp_buffers, it doesn't release it
for as long as that connection runs. Each connection can allocate
work_mem for each node of a query plan. Even though maximum usage can
be more than one allocation per connection, rule of thumb is to
assume just one. You are up to needing 422 GB + shared memory + OS
space (including some buffers and cache) + whatever else you run on
this VM. You have 32 GB. You will almost certainly have problems at
high load.
Try putting pgbouncer in front of the database, configured to use
transaction mode and accept 750 user connections while only keeping
20 or so database conections open.
max_prepared_transactions = 250 # can be 0 or more
Are you really using two phase commit and a transaction manager?
(Don't confuse this setting with something related to prepared
*statements* -- prepared *transacitons* are a separate issue.) Even
if you are using prepared transactions, do you really expect your
transaction manager to let 250 transactions pile up in the database
between the first and second phase of commit?
BTW, you should be monitoring this table for old prepared
transactions that to prevent problems with bloat.
fsync = off
full_page_writes = off
You didn't mention your backup scheme, but be prepared for the fact
that with these settings, if the VM (or its underlying OS or
hardward) fails, your database will be corrupted and you may have no
choice but to use your backup.
commit_delay = 1000 # range 0-100000, in microseconds
commit_siblings = 5 # range 1-1000
These settings are notoriously hard to configure from the default
without actually making things worse. Be very sure you know what you
are doing and have carefully benchmarked this against your real
workload; otherwise it is probably better to put these back to the
defaults.
There may be some other fine-tuning opportunities, but these issues
should be fixed first, and it would be best to have an actual query
that is performing poorly to try to tune some of the other settings.
-Kevin
Import Notes
Resolved by subject fallback
Thanks for including your configuration and version; it makes things
much easier.
Reply follows inline.
On 11/06/2012 09:04 PM, Bryan Montgomery wrote:
I'm wondering what general ways there are to monitor and improve
performance? We look at pgadmin's server status but that only sees the
function being run. Additionally, is there a good way to 'explain' a
function? Or do you have to execute the function steps individually
and explain those?
See the auto_explain contrib module. It can explain statements within
functions, as well as the functions themselves.
http://www.postgresql.org/docs/current/static/auto-explain.html
<http://www.postgresql.org/docs/devel/static/auto-explain.html>
The server typically has up to 500 connections with a max of 750
connections.
Get a connection pooler. Urgently. See
http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely
that your server is running efficiently with that many concurrent
connections actively working. Reducing it to (say) 100 and using
transaction-level connection pooling may boost performance significantly.
work_mem = 512MB # min 64kB
That's really dangerous with your connection count. If many connections
actually use that, you'll run out of RAM in a hurry and enter nasty
paging storm. If possible, reduce it, then raise it selectively in
transactions where you know a high work_mem is needed.
fsync = off # turns forced synchronization
on or off
So you don't value your data and don't mind if you lose all of it,
permanently and unrecoverably, if your server loses power or the host OS
hard crashes?
It's much safer to use `synchronous_commit = off` and a commit_delay. If
that isn't enough, get fast-flushing storage like a good raid controller
with a battery backed cache you can put in write-back mode, or some high
quality SSDs with power-protected write caches.
full_page_writes = off # recover from partial page writes
As above: I hope your data isn't important to you.
--
Craig Ringer
Thanks for the insight. I know why some of the settings were set, but there
are others that have been highlighted that now don't make sense. I'll
strongly recommend adjusting those now.
On Tue, Nov 6, 2012 at 8:54 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
Show quoted text
Thanks for including your configuration and version; it makes things
much easier.Reply follows inline.
On 11/06/2012 09:04 PM, Bryan Montgomery wrote:
I'm wondering what general ways there are to monitor and improve
performance? We look at pgadmin's server status but that only sees the
function being run. Additionally, is there a good way to 'explain' a
function? Or do you have to execute the function steps individually and
explain those?See the auto_explain contrib module. It can explain statements within
functions, as well as the functions themselves.The server typically has up to 500 connections with a max of 750
connections.Get a connection pooler. Urgently. See
http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely that
your server is running efficiently with that many concurrent connections
actively working. Reducing it to (say) 100 and using transaction-level
connection pooling may boost performance significantly.work_mem = 512MB # min 64kB
That's really dangerous with your connection count. If many connections
actually use that, you'll run out of RAM in a hurry and enter nasty paging
storm. If possible, reduce it, then raise it selectively in transactions
where you know a high work_mem is needed.fsync = off # turns forced synchronization
on or offSo you don't value your data and don't mind if you lose all of it,
permanently and unrecoverably, if your server loses power or the host OS
hard crashes?It's much safer to use `synchronous_commit = off` and a commit_delay. If
that isn't enough, get fast-flushing storage like a good raid controller
with a battery backed cache you can put in write-back mode, or some high
quality SSDs with power-protected write caches.full_page_writes = off # recover from partial page
writesAs above: I hope your data isn't important to you.
--
Craig Ringer
On 11/07/2012 01:29 PM, Bryan Montgomery wrote:
Thanks for the insight. I know why some of the settings were set, but
there are others that have been highlighted that now don't make sense.
I'll strongly recommend adjusting those now.
Good. If you've been running with fsync=off all this time I hope you've
also been keeping good backups.
BTW, I should've sent you a link to
http://wiki.postgresql.org/wiki/Number_Of_Database_Connections in my
prior email, where I recommended connection pooling.
--
Craig Ringer