Estimated resources for a 500 connections instance (VM)

Started by David Gauthierabout 6 years ago9 messagesgeneral

davegauthierpg@gmail.com

about 6 years ago

psql (9.6.0, server 11.3) on linux

We've ramped up usage on a PG server (a VM, I have no choice about this)
and are approaching the 100 connections limit. We could increase the
limit, but I've read that this can lead to a degradation in performance. If
we bump it up to 500, what kind of compute resources would that require
(mem/cpu)? Any estimates ?

Thanks !

Josef Šimánek

josef.simanek@gmail.com

about 6 years ago

In reply to: David Gauthier (#1)

Re: Estimated resources for a 500 connections instance (VM)

Hello.

Have you checked
https://wiki.postgresql.org/wiki/Number_Of_Database_Connections?

For raw estimations you can also check http://pgconfigurator.cybertec.at/
or https://pgtune.leopard.in.ua/#/.

út 7. 4. 2020 v 18:25 odesílatel David Gauthier <davegauthierpg@gmail.com>
napsal:

Show quoted text

psql (9.6.0, server 11.3) on linux

We've ramped up usage on a PG server (a VM, I have no choice about this)
and are approaching the 100 connections limit. We could increase the
limit, but I've read that this can lead to a degradation in performance. If
we bump it up to 500, what kind of compute resources would that require
(mem/cpu)? Any estimates ?

Thanks !

Laurenz Albe

laurenz.albe@cybertec.at

about 6 years ago

In reply to: David Gauthier (#1)

Re: Estimated resources for a 500 connections instance (VM)

On Tue, 2020-04-07 at 12:24 -0400, David Gauthier wrote:

psql (9.6.0, server 11.3) on linux

We've ramped up usage on a PG server (a VM, I have no choice about this) and are approaching the
100 connections limit. We could increase the limit, but I've read that this can lead to a
degradation in performance. If we bump it up to 500, what kind of compute resources would that
require (mem/cpu)? Any estimates ?

You should upgrade to the latest minor release.

Don't increase max_connections. The main problem is that the more connections
there are, the greater the likelihood that too many of them will be active,
overloading the database machine.

This can for example happen if a DDL statement has to wait behind a lock.
Then other queries will "pile up" behind it, and as soon as the DDL is done
or canceled, the avalance will break loose.

The better solution is to use a connection pool. If your application doesn't
have one, use pgBouncer.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

David Gauthier

davegauthierpg@gmail.com

about 6 years ago

In reply to: Laurenz Albe (#3)

Re: Estimated resources for a 500 connections instance (VM)

After looking at some of the factors that can affect this, I think it may
be important to know that most of the connections will be almost idle (in
terms of interacting with the DB). The "users" are perl/dbi scripts which
connect to the DB and spend the vast majority of the time doing things
other than interacting with the DB. So a connection is consumed, but it's
not really working very hard with the DB per-se. I am cleaning up some of
that code by strategically connecting/disconnecting only when a DB
interaction is required. But for my edification, is it roughly true that 2
connections working with the DB 100% of the time is equivalent to 20
connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?

Sándor Daku

daku.sandor@gmail.com

about 6 years ago

In reply to: David Gauthier (#4)

Re: Estimated resources for a 500 connections instance (VM)

On Tue, 7 Apr 2020 at 21:52, David Gauthier <davegauthierpg@gmail.com>
wrote:

After looking at some of the factors that can affect this, I think it may
be important to know that most of the connections will be almost idle (in
terms of interacting with the DB). The "users" are perl/dbi scripts which
connect to the DB and spend the vast majority of the time doing things
other than interacting with the DB. So a connection is consumed, but it's
not really working very hard with the DB per-se. I am cleaning up some of
that code by strategically connecting/disconnecting only when a DB
interaction is required. But for my edification, is it roughly true that 2
connections working with the DB 100% of the time is equivalent to 20
connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?

Hi,

Every open connection consumes a bit of resources witch is not a big deal
if you keeping open a few more connections than you strictly needed.
However when you keeping a few hundred idle connections those resources add
up quickly. So don't do that if it's possible.
Likewise, establishing a new connection is resource costly process. So
don't do that either if it's possible.
Long story short, if those connections don't use many different users
then(as others already suggested) connection pooling will be the best
solution.

Regards,
Sándor

Tim Cross

theophilusx@gmail.com

about 6 years ago

In reply to: David Gauthier (#4)

Re: Estimated resources for a 500 connections instance (VM)

David Gauthier <davegauthierpg@gmail.com> writes:

After looking at some of the factors that can affect this, I think it may
be important to know that most of the connections will be almost idle (in
terms of interacting with the DB). The "users" are perl/dbi scripts which
connect to the DB and spend the vast majority of the time doing things
other than interacting with the DB. So a connection is consumed, but it's
not really working very hard with the DB per-se. I am cleaning up some of
that code by strategically connecting/disconnecting only when a DB
interaction is required. But for my edification, is it roughly true that 2
connections working with the DB 100% of the time is equivalent to 20
connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?

Based on that additional info, I would definitely follow Laurenz's
suggestion. Long time since I used Perl DBI, but I'm pretty sure there
is is support for connection pools or you can use one of the PG
connection pooling solutions.

There is a fixed memory allocation per connection, so 2 connections at
100% is not the same as 20 connections @ 10%.

Using a connection pool is usually the first thing I will setup. If
additional connections are still required, then I would increase the
limit in small jumps - definitely would not go from 100 to 500.

BTW running PG on a virtual is not an issue in itself - this is very
common these days. However, I would ensure you are up-to-date wrt latest
minor release for that version and would use clients with the same
version as the master.

--
Tim Cross

Rob Sargent

robjsargent@gmail.com

about 6 years ago

In reply to: Sándor Daku (#5)

Re: Estimated resources for a 500 connections instance (VM)

On 4/7/20 2:23 PM, Sándor Daku wrote:

On Tue, 7 Apr 2020 at 21:52, David Gauthier <davegauthierpg@gmail.com
<mailto:davegauthierpg@gmail.com>> wrote:

After looking at some of the factors that can affect this, I think
it may be important to know that most of the connections will be
almost idle (in terms of interacting with the DB). The "users"
are perl/dbi scripts which connect to the DB and spend the vast
majority of the time doing things other than interacting with the
DB. So a connection is consumed, but it's not really working very
hard with the DB per-se. I am cleaning up some of that code by
strategically connecting/disconnecting only when a DB
interaction is required. But for my edification, is it roughly
true that 2 connections working with the DB 100% of the time is
equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you
know what I mean) ?

Hi,

Every open connection consumes a bit of resources witch is not a big
deal if you keeping open a few more connections than you strictly
needed. However when you keeping a few hundred idle connections those
resources add up quickly. So don't do that if it's possible.
Likewise, establishing a new connection is resource costly process. So
don't do that either if it's possible.
Long story short, if those connections don't use many different users
then(as others already suggested) connection pooling will be the best
solution.

Regards,
Sándor

And from my experience pg_bouncer if very easy to include in your
stack. (If not tried pg_pool.)

Adrian Klaver

adrian.klaver@aklaver.com

about 6 years ago

In reply to: David Gauthier (#4)

Re: Estimated resources for a 500 connections instance (VM)

On 4/7/20 12:51 PM, David Gauthier wrote:

After looking at some of the factors that can affect this, I think it
may be important to know that most of the connections will be almost
idle (in terms of interacting with the DB). The "users" are perl/dbi
scripts which connect to the DB and spend the vast majority of the time
doing things other than interacting with the DB. So a connection is
consumed, but it's not really working very hard with the DB per-se. I
am cleaning up some of that code by strategically
connecting/disconnecting only when a DB interaction is required. But
for my edification, is it roughly true that 2 connections working with
the DB 100% of the time is equivalent to 20 connections @ 10% = 200
connections @ 1 % (if you know what I mean) ?

Well to get a sense of the load you could use top, with in top:

1) Hitting u key and entering postgres as user

2) Hitting c key to get full command line

That should result in something like(though formatted better):

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND

978 postgres 20 0 72072 7688 6548 S 0.000 0.096 0:00.04
/usr/lib/systemd/systemd --user

981 postgres 20 0 117124 2704 52 S 0.000 0.034 0:00.00
(sd-pam)

1201 postgres 20 0 184292 20396 19520 S 0.000 0.254 0:01.19
/usr/local/pgsql12/bin/postmaster -D /usr/local/pgsql12/data

1255 postgres 20 0 38364 4192 3332 S 0.000 0.052 0:00.00
postgres: logger

1263 postgres 20 0 184408 5916 5016 S 0.000 0.074 0:00.00
postgres: checkpointer

1264 postgres 20 0 184424 5520 4640 S 0.000 0.069 0:00.18
postgres: background writer

1265 postgres 20 0 184292 9500 8620 S 0.000 0.118 0:00.18
postgres: walwriter

1266 postgres 20 0 185116 7756 6520 S 0.000 0.096 0:01.32
postgres: autovacuum launcher

1267 postgres 20 0 39488 5316 3780 S 0.000 0.066 0:02.86
postgres: stats collector

1268 postgres 20 0 184844 6064 4980 S 0.000 0.075 0:00.00
postgres: logical replication launcher

14478 postgres 20 0 185252 9612 8184 S 0.000 0.119 0:00.00
postgres: postgres production [local] idle

14507 postgres 20 0 185348 11380 9848 S 0.000 0.141 0:00.00
postgres: aklaver task_manager ::1(45202) idle

--
Adrian Klaver
adrian.klaver@aklaver.com

Laurenz Albe

laurenz.albe@cybertec.at

about 6 years ago

In reply to: David Gauthier (#4)

Re: Estimated resources for a 500 connections instance (VM)

On Tue, 2020-04-07 at 15:51 -0400, David Gauthier wrote:

But for my edification, is it roughly true that 2 connections working with the DB 100%
of the time is equivalent to 20 connections @ 10% = 200 connections @ 1 % (if you know what I mean) ?

Roughly, yes.
There is a certain overhead, as each query has to take a snapshot of the
database, which has to consider all active connections.

But the main problem is that you have no way to ensure that all those
idle connections stay idle.

Imagine that most of these sessions issue short statements
against "table1". There are also some longer SELECTs.
No problem so far.
Now along comes a statement that blocks the table for a very short
time, like a CREATE INDEX CONCURRENTLY.
Unfortunately, that ACCESS EXCLUSIVE lock has to wait for one of
the longer SELECTs to finish.
Now all those moderately short statements cannot run any more, but
they have to queue behind the ACCESS EXCLUSIVE lock. For a
while, nothing happens. Then, as soon as the CREATE INDEX CONCURRENTLY
has got its lock, done its work and finished, all hell breaks loose,
because all sessions start executing their statements at the same time.

I have seen databases servers go down because of such events, and
the problem is the high number of connections, even if they are idle
most of the time.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com