Thousands of parallel connections
Is there any practical limit on the number of parallel connections that a
PostgreSQL server can service? We're in the process of setting up a system
that will require up to 10000 connections open in parallel. The query load
is not the problem, but we're wondering about the number of connections.
Does anyone have experience with these kinds of numbers?
Peter Eisentraut wrote:
Is there any practical limit on the number of parallel connections that a
PostgreSQL server can service? We're in the process of setting up a system
that will require up to 10000 connections open in parallel. The query load
is not the problem, but we're wondering about the number of connections.
Does anyone have experience with these kinds of numbers?
No experience, but a little thinking and elementary school math tells
me, that you'd need huge amount of RAM to support 10000 connections,
since postgres is multi-process. Our typical postgres process eats 5-40
megs of memory, depending on activity. So even if it was just 5 megs,
with 10k connections we are talking about 50G of RAM. If these
connections are idle, it would be plain waste of resources.
I suggest you look into some sort of connection pooling.
--
Michal Taborsky
http://www.taborsky.cz
Hi guys,
Peter is definitely not a newby on this list, so i'm sure he already
thought about some kind of pooling if applicable... but then I'm
dead-curious what kind of application could possibly rule out connection
pooling even if it means so many open connections ? Please give us some
light Peter...
Cheers,
Csaba.
Show quoted text
On Mon, 2004-08-16 at 15:53, Michal Taborsky wrote:
Peter Eisentraut wrote:
Is there any practical limit on the number of parallel connections that a
PostgreSQL server can service? We're in the process of setting up a system
that will require up to 10000 connections open in parallel. The query load
is not the problem, but we're wondering about the number of connections.
Does anyone have experience with these kinds of numbers?No experience, but a little thinking and elementary school math tells
me, that you'd need huge amount of RAM to support 10000 connections,
since postgres is multi-process. Our typical postgres process eats 5-40
megs of memory, depending on activity. So even if it was just 5 megs,
with 10k connections we are talking about 50G of RAM. If these
connections are idle, it would be plain waste of resources.I suggest you look into some sort of connection pooling.
Am Montag, 16. August 2004 16:20 schrieb Csaba Nagy:
Peter is definitely not a newby on this list, so i'm sure he already
thought about some kind of pooling if applicable... but then I'm
dead-curious what kind of application could possibly rule out connection
pooling even if it means so many open connections ? Please give us some
light Peter...
There is already a connection pool in front of the real server, but the
connection pool doesn't help you if you have in fact 10000 concurrent
requests, it only saves connection start effort. (You could make the
connection pool server queue the requests, but that is not the point of this
exercise.) I didn't quite consider the RAM question, but the machine is
almost big enough that it wouldn't matter. I'm thinking more in terms of the
practical limits of the internal structures or the (Linux 2.6) kernel.
Michal Taborsky <michal@taborsky.cz> writes:
Peter Eisentraut wrote:
Is there any practical limit on the number of parallel connections that a
PostgreSQL server can service? We're in the process of setting up a system
that will require up to 10000 connections open in parallel. The query load
is not the problem, but we're wondering about the number of connections.
Does anyone have experience with these kinds of numbers?
No experience, but a little thinking and elementary school math tells
me, that you'd need huge amount of RAM to support 10000 connections,
since postgres is multi-process. Our typical postgres process eats 5-40
megs of memory, depending on activity. So even if it was just 5 megs,
with 10k connections we are talking about 50G of RAM. If these
connections are idle, it would be plain waste of resources.
5-40 megs sounds high, unless you run very complex queries. I wonder
whether you aren't counting Postgres shared memory in that "per process"
figure. (Most implementations of "top" are not very good about
distinguishing shared and private memory, FWIW.)
But even estimating just a meg or two of private space apiece, the total
is daunting.
You'd also have to worry about overstressing the kernel --- allowing
for 50 or so open files per process, which is not a lot, you need a
half-million-entry open files table.
I suggest you look into some sort of connection pooling.
Agreed. If you are not actually *running* 10000 queries at a time,
it'd be better to try to pool the connections.
regards, tom lane
Really, this seems like it would be a pretty strong case for a
replicated database..... assuming not all 10000 clients will need to be
doing modifications. Or if they do, that they could open up a seperate,
temporary connection with the master db.
On Aug 16, 2004, at 7:37 AM, Peter Eisentraut wrote:
Show quoted text
Am Montag, 16. August 2004 16:20 schrieb Csaba Nagy:
Peter is definitely not a newby on this list, so i'm sure he already
thought about some kind of pooling if applicable... but then I'm
dead-curious what kind of application could possibly rule out
connection
pooling even if it means so many open connections ? Please give us
some
light Peter...There is already a connection pool in front of the real server, but the
connection pool doesn't help you if you have in fact 10000 concurrent
requests, it only saves connection start effort. (You could make the
connection pool server queue the requests, but that is not the point
of this
exercise.) I didn't quite consider the RAM question, but the machine
is
almost big enough that it wouldn't matter. I'm thinking more in terms
of the
practical limits of the internal structures or the (Linux 2.6) kernel.---------------------------(end of
broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Centuries ago, Nostradamus foresaw when peter_e@gmx.net (Peter Eisentraut) would write:
Is there any practical limit on the number of parallel connections that a
PostgreSQL server can service? We're in the process of setting up a system
that will require up to 10000 connections open in parallel. The query load
is not the problem, but we're wondering about the number of connections.
Does anyone have experience with these kinds of numbers?
We commonly have a thousand connections open, on some servers, and
while it works, we consider there to be something problematic about
it. It tends to lead to using spinlocks a lot.
You might want to look into pgpool:
<http://www2b.biglobe.ne.jp/~caco/pgpool/index-e.html>
Jan Wieck has tried it out with his version of the TPC-W benchmark,
and found that it allowed cutting down on the _true_ number of
connections, and was very helpful in improving performance under
conditions where the application imagined it needed a lot of
connections.
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://www.ntlug.org/~cbbrowne/spiritual.html
"The last good thing written in C was Franz Schubert's Symphony number
9." -- Erwin Dieterich
[snip]
requests, it only saves connection start effort. (You could make the
connection pool server queue the requests, but that is not the point of this
exercise.) I didn't quite consider the RAM question, but the machine is
[snip]
Well, I would disagree here. If the connections are not busy all the
time, a queueing connection pool will help you drastically reduce the
effective number of needed concurrent connections. We do this with good
results, and the programming overhead is practically null, the queueing
can be hidden in the pool itself (which of course must be well written,
but that's a one time job).
Just my 2c,
Csaba.
Tom Lane wrote:
Michal Taborsky <michal@taborsky.cz> writes:
Peter Eisentraut wrote:
Is there any practical limit on the number of parallel connections that a
PostgreSQL server can service? We're in the process of setting up a system
that will require up to 10000 connections open in parallel. The query load
is not the problem, but we're wondering about the number of connections.
Does anyone have experience with these kinds of numbers?No experience, but a little thinking and elementary school math tells
me, that you'd need huge amount of RAM to support 10000 connections,
since postgres is multi-process. Our typical postgres process eats 5-40
megs of memory, depending on activity. So even if it was just 5 megs,
with 10k connections we are talking about 50G of RAM. If these
connections are idle, it would be plain waste of resources.5-40 megs sounds high, unless you run very complex queries. I wonder
whether you aren't counting Postgres shared memory in that "per process"
figure. (Most implementations of "top" are not very good about
distinguishing shared and private memory, FWIW.)But even estimating just a meg or two of private space apiece, the total
is daunting.
I did last week an Ariadne+Postgresql valutation for the company where I work
and I learned that
with 250 MB you can open up to 80 concurrent query
with 500 MB you can open up to 120 concurrent query
from now on for each 250MB you can have ~40 connections more
if you break these rules that machine trash...
Peter for 10000 connections need then 61 GB that is quite amazing :-)
Regards
Gaetano Mendola
Gaetano Mendola <mendola@bigfoot.com> writes:
I did last week an Ariadne+Postgresql valutation for the company where I work
and I learned that
with 250 MB you can open up to 80 concurrent query
with 500 MB you can open up to 120 concurrent query
from now on for each 250MB you can have ~40 connections more
That does not add up: the graph can't have a negative y-intercept.
There should be a substantial cost to run the postmaster at all,
and then an essentially fixed cost per connection --- assuming
that all the connections are running similar queries, of course.
You're telling us the first 40 connections require zero RAM.
regards, tom lane
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Tom Lane wrote:
| Gaetano Mendola <mendola@bigfoot.com> writes:
|
|>I did last week an Ariadne+Postgresql valutation for the company where I work
|>and I learned that
|>with 250 MB you can open up to 80 concurrent query
|>with 500 MB you can open up to 120 concurrent query
|>from now on for each 250MB you can have ~40 connections more
|
|
| That does not add up: the graph can't have a negative y-intercept.
| There should be a substantial cost to run the postmaster at all,
| and then an essentially fixed cost per connection --- assuming
| that all the connections are running similar queries, of course.
| You're telling us the first 40 connections require zero RAM.
I was not speaking about a single process memory consumption I was
speaking in general, and indeed I don't know why but seems the first
concurrent queries are less expensive, I was able to "confirm" this
rule till 2GB I don't know what there is after.
BTW the machine is a single processor with HT enabled.
Regards
Gaetano Mendola
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iD8DBQFBIUB17UpzwH2SGd4RAiF7AJ9SFrs+sjcHhNyT4BU9svvBHqmrRgCg7A0w
es6qvgRJPiu7XzmJ/zup5gU=
=6k1Q
-----END PGP SIGNATURE-----
Tom Lane wrote:
Gaetano Mendola <mendola@bigfoot.com> writes:
I did last week an Ariadne+Postgresql valutation for the company where I work
and I learned that
with 250 MB you can open up to 80 concurrent query
with 500 MB you can open up to 120 concurrent query
from now on for each 250MB you can have ~40 connections moreThat does not add up: the graph can't have a negative y-intercept.
There should be a substantial cost to run the postmaster at all,
and then an essentially fixed cost per connection --- assuming
that all the connections are running similar queries, of course.
You're telling us the first 40 connections require zero RAM.
That is strange. Is it really linear or does the cost go up somewhat
after the first few?
Show quoted text
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Chris Travers <chris@metatrontech.com> writes:
Tom Lane wrote:
That does not add up: the graph can't have a negative y-intercept.
There should be a substantial cost to run the postmaster at all,
and then an essentially fixed cost per connection --- assuming
that all the connections are running similar queries, of course.
You're telling us the first 40 connections require zero RAM.
That is strange. Is it really linear or does the cost go up somewhat
after the first few?
Well, if you have significant contention problems then the speed could
be worse than linear --- but he was talking about memory usage. AFAICS,
a backend doing a particular query should need X amount of RAM pretty
much independently of how many others there are. The only data structure
I can think of that would be impacted at all is QuerySnapshot, and at
4 bytes per sibling backend it's *way* down in the noise...
regards, tom lane
Tom Lane wrote:
Chris Travers <chris@metatrontech.com> writes:
Tom Lane wrote:
That does not add up: the graph can't have a negative y-intercept.
There should be a substantial cost to run the postmaster at all,
and then an essentially fixed cost per connection --- assuming
that all the connections are running similar queries, of course.
You're telling us the first 40 connections require zero RAM.That is strange. Is it really linear or does the cost go up somewhat
after the first few?Well, if you have significant contention problems then the speed could
be worse than linear --- but he was talking about memory usage. AFAICS,
a backend doing a particular query should need X amount of RAM pretty
much independently of how many others there are. The only data structure
I can think of that would be impacted at all is QuerySnapshot, and at
4 bytes per sibling backend it's *way* down in the noise...regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Then my final question is whether part of the issue could be increased
shared memory and other performance related settings in order to
gracefully handle the larger number of connections.
Best Wishes,
Chris Travers
Metatron Technology Consulting