sensible configuration of max_connections

Started by Chris Withersabout 6 years ago9 messagesgeneral
Jump to latest
#1Chris Withers
chris@withers.org

Hi All,

What's a sensible way to pick the number to use for max_connections?

I'm looking after a reasonable size multi-tenant cluster, where the
master handles all the load and there's a slave in case of hardware
failure in the master.
The machine is used to host what I suspect are mainly django
applications, so lots of short web requests, not sure how much, if any,
django's orm does connection pooling.

I arbitrarily picked 1000 for max_connections and haven't had any
problems, but onboarding another app that handles a few million requests
per day  on Monday and thinking about potential problems related to the
number of available connections.

What's "too much" for max_connections? What happens when you set it to
high? What factors affect that number?

cheers,

Chris

#2Chris Ellis
chris@intrbiz.com
In reply to: Chris Withers (#1)
Fwd: sensible configuration of max_connections

Hi Chris

On Fri, 7 Feb 2020, 08:36 Chris Withers, <chris@withers.org> wrote:

Hi All,

What's a sensible way to pick the number to use for max_connections?

Sensible in this context is some what variable. Each connection in
PostgreSQL will be allocated a backend process. These are not the lightest
weight of things.

Each connection takes up space in shared memory, as mentioned in the manual.

I'm looking after a reasonable size multi-tenant cluster, where the
master handles all the load and there's a slave in case of hardware
failure in the master.
The machine is used to host what I suspect are mainly django
applications, so lots of short web requests, not sure how much, if any,
django's orm does connection pooling.

I arbitrarily picked 1000 for max_connections and haven't had any
problems, but onboarding another app that handles a few million requests
per day on Monday and thinking about potential problems related to the
number of available connections.

What's "too much" for max_connections? What happens when you set it to
high? What factors affect that number?

When sizing max_connections you need to trade off how many connections your
application will use at peak vs how much RAM and CPU you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system out of
RAM.

Given your situation I'd very seriously look at connection pooling using
PgBouncer or similar. That way you can run with a far smaller
max_connections and still cope with applications configured with large
usually idle connection pools.

cheers,

Chris

Regards,
Chris Ellis

Show quoted text
#3Chris Withers
chris@withers.org
In reply to: Chris Ellis (#2)
Re: Fwd: sensible configuration of max_connections

On 07/02/2020 12:49, Chris Ellis wrote:

What's "too much" for max_connections? What happens when you set it to

high? What factors affect that number?

When sizing max_connections you need to trade off how many connections
your application will use at peak vs how much RAM and CPU you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system
out of RAM.

Sure, but that's where I'm trying to find out what's sensible. The box
has 196GB memory, most of that in hugepages, 18 core Intel Skylake with
HT on giving 36 cores and tonnes of SSD for storage. How would I turn
that spec into a sensible number for max_connections? As that number
grows, what contention points in postgres will start creaking (shared
memory where the IPC happens?)

In case I forgot to say, this is PostgreSQL 11...

Chris

PS: definitely thinking of pg_bouncer, but still trying to figure out
what to sensibly set for max_connections.

#4Justin
zzzzz.graf@gmail.com
In reply to: Chris Ellis (#2)
Re: sensible configuration of max_connections

Hi Chris Withers

As stated each connection uses X amount of resources and its very easy to
configure Postgresql where even small number of connections will each up
all the RAM

WorkMem is the biggest consumer of resources lets say its set to 5 megs
per connection at 1000 connections that 5,000 megs that can be allocated.

Connection pooler may or may not work depends on how security is laid out
in the Application if this is a valid option... If the application
understands how to keep tenants out of each other data with no leaks then
yes, if the each tenant is assigned shema or specific database connection
pooler can still work but the configuration is going to difficult,

On Fri, Feb 7, 2020 at 7:50 AM Chris Ellis <chris@intrbiz.com> wrote:

Show quoted text

Hi Chris

On Fri, 7 Feb 2020, 08:36 Chris Withers, <chris@withers.org> wrote:

Hi All,

What's a sensible way to pick the number to use for max_connections?

Sensible in this context is some what variable. Each connection in
PostgreSQL will be allocated a backend process. These are not the lightest
weight of things.

Each connection takes up space in shared memory, as mentioned in the
manual.

I'm looking after a reasonable size multi-tenant cluster, where the
master handles all the load and there's a slave in case of hardware
failure in the master.
The machine is used to host what I suspect are mainly django
applications, so lots of short web requests, not sure how much, if any,
django's orm does connection pooling.

I arbitrarily picked 1000 for max_connections and haven't had any
problems, but onboarding another app that handles a few million requests
per day on Monday and thinking about potential problems related to the
number of available connections.

What's "too much" for max_connections? What happens when you set it to
high? What factors affect that number?

When sizing max_connections you need to trade off how many connections
your application will use at peak vs how much RAM and CPU you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system out
of RAM.

Given your situation I'd very seriously look at connection pooling using
PgBouncer or similar. That way you can run with a far smaller
max_connections and still cope with applications configured with large
usually idle connection pools.

cheers,

Chris

Regards,
Chris Ellis

#5Steve Atkins
steve@blighty.com
In reply to: Chris Withers (#3)
Re: Fwd: sensible configuration of max_connections

On 07/02/2020 13:18, Chris Withers wrote:

On 07/02/2020 12:49, Chris Ellis wrote:

What's "too much" for max_connections? What happens when you set it to

high? What factors affect that number?

When sizing max_connections you need to trade off how many
connections your application will use at peak vs how much RAM and CPU
you have.

Each connection is capable of allocating work_mem and has a stack etc.

As such you don't want max_connections to be able to run your system
out of RAM.

Sure, but that's where I'm trying to find out what's sensible. The box
has 196GB memory, most of that in hugepages, 18 core Intel Skylake
with HT on giving 36 cores and tonnes of SSD for storage. How would I
turn that spec into a sensible number for max_connections? As that
number grows, what contention points in postgres will start creaking
(shared memory where the IPC happens?)

The max_connections setting  is an upper limit after which postgresql
will reject connections. You don't really want to hit that limit, rather
you want to keep the number of concurrent connections to a reasonable
number (and have max_connections somewhere above that).

Each connection is a postgresql process, so active connections are
competing for resources and even idle connections take up some RAM.
Creating a new connection is launching a new process (and doing some
setup) so it's relatively expensive.

Doing some sort of connection pooling is a good idea, especially for
web-apps that connect, do a few short queries and disconnect. Django is
probably doing a passable job at pooling already, so you might want to
see how many connections it's using under normal load. Adding a
dedicated pooler in between Django and PostgreSQL would give you more
flexibility and might be a good idea, but if what Django is already
doing is reasonable you may not need it.

What's a good number of active connections to aim for? It probably
depends on whether they tend to be CPU-bound or IO-bound, but I've seen
the rule of thumb of "around twice the number of CPU cores" tossed
around, and it's probably a decent place to start, then run it under
normal load and see how it behaves - cpu usage, RAM, IO, request latency
and throughput.

Cheers,
  Steve

#6Michael Lewis
mlewis@entrata.com
In reply to: Justin (#4)
Re: sensible configuration of max_connections

On Fri, Feb 7, 2020 at 6:29 AM Justin <zzzzz.graf@gmail.com> wrote:

WorkMem is the biggest consumer of resources lets say its set to 5 megs
per connection at 1000 connections that 5,000 megs that can be allocated.

Clarification- work_mem is used per operation (sort, hash, etc) and could
be many many times with a complicated query, and/or parallel processing
enabled. It could be that a single connection uses 10x work_mem or more.

https://www.postgresql.org/docs/current/runtime-config-resource.html

#7Sam Gendler
sgendler@ideasculptor.com
In reply to: Steve Atkins (#5)
Re: Fwd: sensible configuration of max_connections

On Fri, Feb 7, 2020 at 5:36 AM Steve Atkins <steve@blighty.com> wrote:

What's a good number of active connections to aim for? It probably depends
on whether they tend to be CPU-bound or IO-bound, but I've seen the rule of
thumb of "around twice the number of CPU cores" tossed around, and it's
probably a decent place to start, then run it under normal load and see how
it behaves - cpu usage, RAM, IO, request latency and throughput.

Back in the day of spinning media, when I was more active on the list and

postgresql 9 was just released, conventional wisdom for starting number was
2*cores + 1*spindles, if memory serves. You can set max_connections higher,
but that was the number you wanted to have active, and then adjust for
workload - OLTP vs warehouse, how much disk access vs buffer cache, etc.
Benchmarks, at the time, showed that performance started to fall off due to
contention if the number of processes got much larger. I imagine that the
speed of storage today would maybe make 3 or 4x core count a pretty
reasonable place to start. There will be a point of diminishing returns
somewhere, but you can probably construct your own benchmarks to determine
where that point is likely to be for your workload.

I was doing a lot of java development at the time, and tended to use a
connection pool per application server rather than an external connection
pool in front of postgresql, just for ease of administration, so I might
have more connections than the desired pool size, but I tried to keep the
number of active connections under that limit and set max_connections to a
value that prevented me from being locked out if each application server
was at max pool size. I'm not sure how well that strategy would work now
that autoscaling is so ubiquitous, since there is memory allocation
overhead associated even with idle connections and the sheer number of
per-server pools could get quite high

#8Justin
zzzzz.graf@gmail.com
In reply to: Sam Gendler (#7)
Re: Fwd: sensible configuration of max_connections

On Fri, Feb 7, 2020 at 1:56 PM Sam Gendler <sgendler@ideasculptor.com>
wrote:

Benchmarks, at the time, showed that performance started to fall off due
to contention if the number of processes got much larger. I imagine that
the speed of storage today would maybe make 3 or 4x core count a pretty
reasonable place to start. There will be a point of diminishing returns
somewhere, but you can probably construct your own benchmarks to determine
where that point is likely to be for your workload.

I wonder if anyone has run benchmark like that lately? Doing such a
benchmark maybe worth while given that so much is now running either in the
cloud or running in a VM or some other kind of Container. all this
abstraction from the hardware layer surely has had to have an impact on the
numbers and rules of thumb...

I still run on real hardware and spinning disk.

#9Sam Gendler
sgendler@ideasculptor.com
In reply to: Justin (#8)
Re: Fwd: sensible configuration of max_connections

On Fri, Feb 7, 2020 at 11:14 AM Justin <zzzzz.graf@gmail.com> wrote:

On Fri, Feb 7, 2020 at 1:56 PM Sam Gendler <sgendler@ideasculptor.com>
wrote:

Benchmarks, at the time, showed that performance started to fall off due
to contention if the number of processes got much larger. I imagine that
the speed of storage today would maybe make 3 or 4x core count a pretty
reasonable place to start. There will be a point of diminishing returns
somewhere, but you can probably construct your own benchmarks to determine
where that point is likely to be for your workload.

I wonder if anyone has run benchmark like that lately? Doing such a
benchmark maybe worth while given that so much is now running either in the
cloud or running in a VM or some other kind of Container. all this
abstraction from the hardware layer surely has had to have an impact on the
numbers and rules of thumb...

I still run on real hardware and spinning disk.

To be honest, I don't even know if the old rule of thumb would still apply,
given the changes that have likely occurred within the postgresql codebase
over the course of a decade. But there were plenty of people benchmarking
and writing about how to administer large installations and do performance
tuning back then. I don't imagine that they don't exist today, too.
They'll probably chime in on this thread soon enough.

A quick amazon search for 'postgresql performance' turns up plenty of books
on the topic that address more recent versions of the db. I'd go hit the
O'Reilly bookshelf website and use a trial membership to see what they have
to say (I generally consider the o'reilly bookshelf, which gives you access
to pretty much all books by all technical publishers, to be an invaluable
tool and worth every penny). I wouldn't be surprised if the postgresql
documentation itself doesn't provide insight as to appropriate numbers, but
no one ever reads the manual any longer.