Tuning / performance questions

Started by Bryan Montgomeryover 13 years ago5 messagesgeneral

monty@english.net

over 13 years ago

Hello experts! (and other like me).
**

We have a system, that due to recent events is getting a lot heavier use.
The application makes extensive use of functions. These functions would
typically run sub-second but now can take several seconds.

I'm wondering what general ways there are to monitor and improve
performance? We look at pgadmin's server status but that only sees the
function being run. Additionally, is there a good way to 'explain' a
function? Or do you have to execute the function steps individually and
explain those?

The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a virtual
machine.

Running pg_version returns 'PostgreSQL 9.1.0 on x86_64-unknown-linux-gnu,
compiled by gcc (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision
167585], 64-bit' and select pg_size_pretty(pg_database_size('nrgdb'));
returns 63 GB.

The server typically has up to 500 connections with a max of 750
connections.

Below are the non-default values of our configuration file.

Any thoughts on what we should look at?

Thanks,

Bryan

listen_addresses = '*' # what IP address(es) to listen on;

max_connections = 750 # (change requires restart)

superuser_reserved_connections = 9 # (change requires restart)

shared_buffers = 8192MB # min 128kB or max_connections*16kB

temp_buffers = 64MB # min 800kB

max_prepared_transactions = 250 # can be 0 or more

work_mem = 512MB # min 64kB

maintenance_work_mem = 1GB # min 1MB

fsync = off # turns forced synchronization on
or off

full_page_writes = off # recover from partial page writes

wal_buffers = 16MB # min 32kB

commit_delay = 1000 # range 0-100000, in microseconds

commit_siblings = 5 # range 1-1000

checkpoint_segments = 50 # in logfile segments, min 1, 16MB
each

checkpoint_timeout = 5min # range 30s-1h

checkpoint_warning = 1min # 0 is off

effective_cache_size = 16GB

log_destination = 'stderr' # Valid values are combinations of

logging_collector = on

Log_directory = '/var/log/postgres' # Directory where log files
are written

log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.

log_rotation_age = 1d # Automatic rotation of logfiles
will

log_rotation_size = 20MB # Automatic rotation of logfiles
will

log_min_messages = info # Values, in order of decreasing detail:

log_min_duration_statement = 10000 # -1 is disabled, 0 logs all statements

log_line_prefix = '%t %p %u@%h: ' # Special values for Pgsi

log_statement = 'none' # none, ddl, mod, all

log_duration = off

autovacuum = on # enable autovacuum subprocess?

datestyle = 'iso, mdy'

lc_messages = 'en_US.UTF-8' # locale for system error
message

lc_monetary = 'en_US.UTF-8' # locale for monetary
formatting

lc_numeric = 'en_US.UTF-8' # locale for number
formatting

lc_time = 'en_US.UTF-8' # locale for time formatting

Kevin Grittner

Kevin.Grittner@wicourts.gov

over 13 years ago

In reply to: Bryan Montgomery (#1)

Re: Tuning / performance questions

Bryan Montgomery wrote:

We have a system, that due to recent events is getting a lot
heavier use. The application makes extensive use of functions.
These functions would typically run sub-second but now can take
several seconds.

The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a
virtual machine.

Running pg_version returns 'PostgreSQL 9.1.0 on
x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.5.1
20101208 [gcc-4_5-branch revision 167585], 64-bit' and select
pg_size_pretty(pg_database_size('nrgdb')); returns 63 GB.

The server typically has up to 500 connections with a max of 750
connections.

max_connections = 750 # (change requires restart)

temp_buffers = 64MB # min 800kB

work_mem = 512MB # min 64kB

750 * ((512 MB) + (64 MB)) = 421.87500 gigabytes

Once a connection uses memory for temp_buffers, it doesn't release it
for as long as that connection runs. Each connection can allocate
work_mem for each node of a query plan. Even though maximum usage can
be more than one allocation per connection, rule of thumb is to
assume just one. You are up to needing 422 GB + shared memory + OS
space (including some buffers and cache) + whatever else you run on
this VM. You have 32 GB. You will almost certainly have problems at
high load.

Try putting pgbouncer in front of the database, configured to use
transaction mode and accept 750 user connections while only keeping
20 or so database conections open.

max_prepared_transactions = 250 # can be 0 or more

Are you really using two phase commit and a transaction manager?
(Don't confuse this setting with something related to prepared
*statements* -- prepared *transacitons* are a separate issue.) Even
if you are using prepared transactions, do you really expect your
transaction manager to let 250 transactions pile up in the database
between the first and second phase of commit?

BTW, you should be monitoring this table for old prepared
transactions that to prevent problems with bloat.

fsync = off
full_page_writes = off

You didn't mention your backup scheme, but be prepared for the fact
that with these settings, if the VM (or its underlying OS or
hardward) fails, your database will be corrupted and you may have no
choice but to use your backup.

commit_delay = 1000 # range 0-100000, in microseconds

commit_siblings = 5 # range 1-1000

These settings are notoriously hard to configure from the default
without actually making things worse. Be very sure you know what you
are doing and have carefully benchmarked this against your real
workload; otherwise it is probably better to put these back to the
defaults.

There may be some other fine-tuning opportunities, but these issues
should be fixed first, and it would be best to have an actual query
that is performing poorly to try to tune some of the other settings.

-Kevin

Import Notes

Resolved by subject fallback

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Bryan Montgomery (#1)

Re: Tuning / performance questions

Thanks for including your configuration and version; it makes things
much easier.

Reply follows inline.

On 11/06/2012 09:04 PM, Bryan Montgomery wrote:

I'm wondering what general ways there are to monitor and improve
performance? We look at pgadmin's server status but that only sees the
function being run. Additionally, is there a good way to 'explain' a
function? Or do you have to execute the function steps individually
and explain those?

See the auto_explain contrib module. It can explain statements within
functions, as well as the functions themselves.

http://www.postgresql.org/docs/current/static/auto-explain.html
<http://www.postgresql.org/docs/devel/static/auto-explain.html>

The server typically has up to 500 connections with a max of 750
connections.

Get a connection pooler. Urgently. See
http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely
that your server is running efficiently with that many concurrent
connections actively working. Reducing it to (say) 100 and using
transaction-level connection pooling may boost performance significantly.

work_mem = 512MB # min 64kB

That's really dangerous with your connection count. If many connections
actually use that, you'll run out of RAM in a hurry and enter nasty
paging storm. If possible, reduce it, then raise it selectively in
transactions where you know a high work_mem is needed.

fsync = off # turns forced synchronization
on or off

So you don't value your data and don't mind if you lose all of it,
permanently and unrecoverably, if your server loses power or the host OS
hard crashes?

It's much safer to use `synchronous_commit = off` and a commit_delay. If
that isn't enough, get fast-flushing storage like a good raid controller
with a battery backed cache you can put in write-back mode, or some high
quality SSDs with power-protected write caches.

full_page_writes = off # recover from partial page writes

As above: I hope your data isn't important to you.

--
Craig Ringer

Bryan Montgomery

monty@english.net

over 13 years ago

In reply to: Craig Ringer (#3)

Re: Tuning / performance questions

Thanks for the insight. I know why some of the settings were set, but there
are others that have been highlighted that now don't make sense. I'll
strongly recommend adjusting those now.

On Tue, Nov 6, 2012 at 8:54 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:

Show quoted text

Thanks for including your configuration and version; it makes things
much easier.

Reply follows inline.

On 11/06/2012 09:04 PM, Bryan Montgomery wrote:

I'm wondering what general ways there are to monitor and improve
performance? We look at pgadmin's server status but that only sees the
function being run. Additionally, is there a good way to 'explain' a
function? Or do you have to execute the function steps individually and
explain those?

See the auto_explain contrib module. It can explain statements within
functions, as well as the functions themselves.

http://www.postgresql.org/docs/current/static/auto-explain.html<http://www.postgresql.org/docs/devel/static/auto-explain.html>

The server typically has up to 500 connections with a max of 750
connections.

Get a connection pooler. Urgently. See
http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely that
your server is running efficiently with that many concurrent connections
actively working. Reducing it to (say) 100 and using transaction-level
connection pooling may boost performance significantly.

work_mem = 512MB # min 64kB

That's really dangerous with your connection count. If many connections
actually use that, you'll run out of RAM in a hurry and enter nasty paging
storm. If possible, reduce it, then raise it selectively in transactions
where you know a high work_mem is needed.

fsync = off # turns forced synchronization
on or off

So you don't value your data and don't mind if you lose all of it,
permanently and unrecoverably, if your server loses power or the host OS
hard crashes?

It's much safer to use `synchronous_commit = off` and a commit_delay. If
that isn't enough, get fast-flushing storage like a good raid controller
with a battery backed cache you can put in write-back mode, or some high
quality SSDs with power-protected write caches.

full_page_writes = off # recover from partial page
writes

As above: I hope your data isn't important to you.

--
Craig Ringer

Craig Ringer

craig@2ndquadrant.com

over 13 years ago

In reply to: Bryan Montgomery (#4)

Re: Tuning / performance questions

On 11/07/2012 01:29 PM, Bryan Montgomery wrote:

Thanks for the insight. I know why some of the settings were set, but
there are others that have been highlighted that now don't make sense.
I'll strongly recommend adjusting those now.

Good. If you've been running with fsync=off all this time I hope you've
also been keeping good backups.

BTW, I should've sent you a link to
http://wiki.postgresql.org/wiki/Number_Of_Database_Connections in my
prior email, where I recommended connection pooling.

--
Craig Ringer