Monitoring number of backends

Started by Andy Colsonover 12 years ago22 messagesgeneral
Jump to latest
#1Andy Colson
andy@squeakycode.net

Hi all.

My website is about to get a little more popular. I'm trying to add in
some measurements to determine an upper limit of how many concurrent
database connections I'm currently using.

I've started running this:

SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM
pg_stat_database;

Every 10 seconds or so. I don't think its what I want though. It seems
way too small. I'm guessing that its not a measure of the previous 10
seconds. Its a count of how many backends are in use at the exact
moment I run the sql.

Is there a cumulative count someplace?

Thanks for your time,

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Stephen Frost
sfrost@snowman.net
In reply to: Andy Colson (#1)
Re: Monitoring number of backends

Andy,

* andy (andy@squeakycode.net) wrote:

My website is about to get a little more popular. I'm trying to add
in some measurements to determine an upper limit of how many
concurrent database connections I'm currently using.

PG is really *much* happier if you have only one backend per CPU in your
system. The way to get there is by using a connection pooler like
pg_bouncer and configuring it based on how many CPUs you have.

pg_bouncer can also provide stats for you.

Thanks,

Stephen

#3Andy Colson
andy@squeakycode.net
In reply to: Stephen Frost (#2)
Re: Monitoring number of backends

On 10/22/2013 12:59 PM, Stephen Frost wrote:

Andy,

* andy (andy@squeakycode.net) wrote:

My website is about to get a little more popular. I'm trying to add
in some measurements to determine an upper limit of how many
concurrent database connections I'm currently using.

PG is really *much* happier if you have only one backend per CPU in your
system. The way to get there is by using a connection pooler like
pg_bouncer and configuring it based on how many CPUs you have.

pg_bouncer can also provide stats for you.

Thanks,

Stephen

Hum.. I had not thought of that. My current setup uses 40 max
connections, and I don't think I've ever hit it. I use apache and php,
and my db connections are not persistent.

If I did plugin pg_bouncer, is it worth switching my php from pg_connect
to pg_pconnect?

I'd bet plugging in pg_bouncer now while I'm not too busy would help me
grow in the long run, huh? I like the sound of that. Thanks!

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Stephen Frost
sfrost@snowman.net
In reply to: Andy Colson (#3)
Re: Monitoring number of backends

* andy (andy@squeakycode.net) wrote:

If I did plugin pg_bouncer, is it worth switching my php from
pg_connect to pg_pconnect?

No, let pg_bouncer manage the connection pooling. Having two levels of
pooling isn't a good idea (and pg_bouncer does a *much* better job of it
anyway, imv..).

I'd bet plugging in pg_bouncer now while I'm not too busy would help
me grow in the long run, huh? I like the sound of that. Thanks!

Yes. Note that, as you scale, you can actually connect pg_bouncers to
other pg_bouncers and play other fun games like having pg_bouncer switch
which servers it's connecting to transparently to the app..

Thanks,

Stephen

#5John R Pierce
pierce@hogranch.com
In reply to: Stephen Frost (#2)
Re: Monitoring number of backends

On 10/22/2013 10:59 AM, Stephen Frost wrote:

PG is really*much* happier if you have only one backend per CPU in your
system. The way to get there is by using a connection pooler like
pg_bouncer and configuring it based on how many CPUs you have.

Actually, I've found peak throughputs on a decent multicore server with
lots of ram, and lots of disk IO parallelism (eg, big raid10) is aruond
2X the socket*hyperthread*core count... so for instance, on a modern 2
socket E5-2665 kind of server, thats 2 x 8 cores with 2 threads per
core, thats 16 total cores, 32 total hardware threads, so about 64
database connections would be peak, given a decent raid10 of lots of
SAS2 10k/15k disks

--
john r pierce 37N 122W
somewhere on the middle of the left coast

#6John R Pierce
pierce@hogranch.com
In reply to: Andy Colson (#3)
Re: Monitoring number of backends

On 10/22/2013 11:25 AM, andy wrote:

Hum.. I had not thought of that. My current setup uses 40 max
connections, and I don't think I've ever hit it. I use apache and
php, and my db connections are not persistent.

that style of php programming, you're getting some HUGE overhead in
connect/disconnect per web page. putting pg_bouncer in the middle
will make a HUGE improvement, possibly a second per page load on a busy
server.

You can even set it up in parallel with things live... first install
pgbouncer, and configure it to listen on an alternate port, say 15432,
and verify you can connect and everything looks good (psql -p 15432...)
then tweak your PHP app's database config to use that port instead of
5432...

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Stephen Frost
sfrost@snowman.net
In reply to: John R Pierce (#5)
Re: Monitoring number of backends

* John R Pierce (pierce@hogranch.com) wrote:

On 10/22/2013 10:59 AM, Stephen Frost wrote:

PG is really*much* happier if you have only one backend per CPU in your
system. The way to get there is by using a connection pooler like
pg_bouncer and configuring it based on how many CPUs you have.

Actually, I've found peak throughputs on a decent multicore server
with lots of ram, and lots of disk IO parallelism (eg, big raid10)
is aruond 2X the socket*hyperthread*core count... so for instance,
on a modern 2 socket E5-2665 kind of server, thats 2 x 8 cores with
2 threads per core, thats 16 total cores, 32 total hardware threads,
so about 64 database connections would be peak, given a decent
raid10 of lots of SAS2 10k/15k disks

Sure. As always with performance- test, test, test on gear that is as
close to identical to the prod gear (or the prod gear itself, if you can
get away with it) as possible.. Every workload is different.

Thanks,

Stephen

#8Andy Colson
andy@squeakycode.net
In reply to: John R Pierce (#6)
Re: Monitoring number of backends

On 10/22/2013 2:18 PM, John R Pierce wrote:

On 10/22/2013 11:25 AM, andy wrote:

Hum.. I had not thought of that. My current setup uses 40 max
connections, and I don't think I've ever hit it. I use apache and
php, and my db connections are not persistent.

that style of php programming, you're getting some HUGE overhead in
connect/disconnect per web page. putting pg_bouncer in the middle
will make a HUGE improvement, possibly a second per page load on a busy
server.

No, actually, I don't think my connect overhead is huge. My apache and
postgres are on the same box, and it connects using unix socket.
Perhaps if my apache on db were on different boxes it would be a problem.

My page response time is sub-second, and I run quite a few queries to
build the page. But also, my server isn't to busy at the moment. The
load is around 0.3 to 0.5 when its busy.

Stephen Said:

If I did plugin pg_bouncer, is it worth switching my php from
pg_connect to pg_pconnect?

No, let pg_bouncer manage the connection pooling. Having two levels of
pooling isn't a good idea (and pg_bouncer does a *much* better job of it
anyway, imv..).

So you say DO use persistent connections, and Stephen says DONT use
them. Although there are a few new players. Assuming Apache, pgbouncer
and postgres are all on the same box, and I'm using unix sockets as much
as possible, it probably doesn't matter if I use non-persistent
connections from php.

But if I need to move the db to its own box... then should I move
pgbouncer there too?

Assuming db is on a different box, persistent connections from php to
postgres might be bad. But how about persistent connections to pgbouncer?

Thinking about it, if I need to move the db, I'll leave pgbouncer on the
webserv box. That way I can unix socket from php to pgbouncer
(non-persistent, cuz its fast enough anyway), and let pgbouncer do it's
own thing to the database box. Seem like a reasonable sort of thing?

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andy Colson (#8)
Re: Monitoring number of backends

andy <andy@squeakycode.net> writes:

On 10/22/2013 2:18 PM, John R Pierce wrote:

that style of php programming, you're getting some HUGE overhead in
connect/disconnect per web page. putting pg_bouncer in the middle
will make a HUGE improvement, possibly a second per page load on a busy
server.

No, actually, I don't think my connect overhead is huge. My apache and
postgres are on the same box, and it connects using unix socket.

You're ignoring the fact that PG backends have a pretty considerable
startup transient. By the time a backend has gotten its caches populated
enough to be efficient, it's expended a lot of cycles. You might be
getting away with this approach under low load, but it will bite you in
painful places eventually.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#10John R Pierce
pierce@hogranch.com
In reply to: Andy Colson (#8)
Re: Monitoring number of backends

On 10/22/2013 1:13 PM, andy wrote:

No, actually, I don't think my connect overhead is huge. My apache
and postgres are on the same box, and it connects using unix socket.
Perhaps if my apache on db were on different boxes it would be a problem.

each postgres connection, if you're not using a pool, requires a fork()
of the postgres process. fork is inherently an expensive process,
especially for a moderately large and fairly complex piece of software
like postgresql.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#11Andy Colson
andy@squeakycode.net
In reply to: Tom Lane (#9)
Re: Monitoring number of backends

On 10/22/2013 3:44 PM, Tom Lane wrote:

andy <andy@squeakycode.net> writes:

On 10/22/2013 2:18 PM, John R Pierce wrote:

that style of php programming, you're getting some HUGE overhead in
connect/disconnect per web page. putting pg_bouncer in the middle
will make a HUGE improvement, possibly a second per page load on a busy
server.

No, actually, I don't think my connect overhead is huge. My apache and
postgres are on the same box, and it connects using unix socket.

You're ignoring the fact that PG backends have a pretty considerable
startup transient. By the time a backend has gotten its caches populated
enough to be efficient, it's expended a lot of cycles. You might be
getting away with this approach under low load, but it will bite you in
painful places eventually.

regards, tom lane

but it will bite you in
painful places eventually.

:-) heh.

Well I think PG is even more impressive now. My server is on a VM, and
I'm pretty much doing things the slow way, and I get a page back in
500ms. And this is a busy time of day.

Of course, I'm right next to the server. Anyone wanna check page times
for me?

http://jasper.iowaassessors.com/parcel.php?gid=99680

I'm talking JUST parcel.php ... the maps and photos don't count.

Thanks all.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#12Stephen Frost
sfrost@snowman.net
In reply to: John R Pierce (#10)
Re: Monitoring number of backends

* John R Pierce (pierce@hogranch.com) wrote:

On 10/22/2013 1:13 PM, andy wrote:

No, actually, I don't think my connect overhead is huge. My
apache and postgres are on the same box, and it connects using
unix socket. Perhaps if my apache on db were on different boxes it
would be a problem.

each postgres connection, if you're not using a pool, requires a
fork() of the postgres process. fork is inherently an expensive
process, especially for a moderately large and fairly complex piece
of software like postgresql.

As Tom points out, it's really PG that makes the new connections slow;
fork(), while it can be slow on some platforms, really is small potatos
compared to PG opening a database, populating caches, etc.

Thanks,

Stephen

#13Andy Colson
andy@squeakycode.net
In reply to: Stephen Frost (#2)
Re: Monitoring number of backends

On 10/22/2013 12:59 PM, Stephen Frost wrote:

Andy,

* andy (andy@squeakycode.net) wrote:

My website is about to get a little more popular. I'm trying to add
in some measurements to determine an upper limit of how many
concurrent database connections I'm currently using.

PG is really *much* happier if you have only one backend per CPU in your
system. The way to get there is by using a connection pooler like
pg_bouncer and configuring it based on how many CPUs you have.

pg_bouncer can also provide stats for you.

Thanks,

Stephen

Ahh, bummer, man. PgBouncer doesn't work so well when you have lots of databases. I have about 90 databases, the website could connect to any one of them on any request. (They are all about as equally likely to be hit)

In my pgbouncer.ini I set
[databases]
* =
[pgbouncer]
max_cleint_conn = 200
default_pool_size = 20

but that made each database its own pool, so 90*20 possible (but then maxed out at 200, I assume). Postgres has only a max_connections of 40, so I started getting "FATAL sorry too many cleints already" errors.

I set the max_cleint_conn = 40
and default_pool_size = 3

so it should stop erroring out, but is this type of setup even going to help me? If pgbouncer has 40 connections open to 40 different databases, and then a request comes in for a db it doesnt have, it'll have to drop one and build a new connection, yes? Won't that be slower than just connecting right to postgres in the first place?

I need a Most-Recently-Used pool, not a per-database pool. Is there a way to do this with pgbouncer? (With a large number of databases)

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#14John R Pierce
pierce@hogranch.com
In reply to: Andy Colson (#13)
Re: Monitoring number of backends

On 10/22/2013 7:45 PM, Andy Colson wrote:

Ahh, bummer, man. PgBouncer doesn't work so well when you have lots
of databases. I have about 90 databases, the website could connect to
any one of them on any request. (They are all about as equally likely
to be hit)

that scenario would better be met by using schemas in a single database.

--
john r pierce 37N 122W
somewhere on the middle of the left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#15David Kerr
dmk@mr-paradox.net
In reply to: Andy Colson (#1)
Re: Monitoring number of backends

On Tue, Oct 22, 2013 at 12:41:58PM -0500, andy wrote:
- Hi all.
-
- My website is about to get a little more popular. I'm trying to add in
- some measurements to determine an upper limit of how many concurrent
- database connections I'm currently using.
-
- I've started running this:
-
- SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM
- pg_stat_database;
-
- Every 10 seconds or so. I don't think its what I want though. It seems
- way too small. I'm guessing that its not a measure of the previous 10
- seconds. Its a count of how many backends are in use at the exact
- moment I run the sql.
-
- Is there a cumulative count someplace?
-
- Thanks for your time,
-
- -Andy

You've gotten good info from the other guys on how to scale your're DB
but to answer you're original question. I usually use this query:

select count(*) from pg_stat_activity where state <> 'idle';

That gives you the # of "active" connections to your database and is
something you want to try to get good metrics on.

Idle connections have some overhead but if Active > # of CPUs your performance
starts to degrade. Now, really that's pretty normal but, ideally, you need to
know what the ratio of Active Connections to # CPUs still gives you acceptable
performance. And that's really based on your app and hardware.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#16Andy Colson
andy@squeakycode.net
In reply to: David Kerr (#15)
Re: Monitoring number of backends

On 10/23/2013 11:07 AM, David Kerr wrote:

On Tue, Oct 22, 2013 at 12:41:58PM -0500, andy wrote:
- Hi all.
-
- My website is about to get a little more popular. I'm trying to add in
- some measurements to determine an upper limit of how many concurrent
- database connections I'm currently using.
-
- I've started running this:
-
- SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM
- pg_stat_database;
-
- Every 10 seconds or so. I don't think its what I want though. It seems
- way too small. I'm guessing that its not a measure of the previous 10
- seconds. Its a count of how many backends are in use at the exact
- moment I run the sql.
-
- Is there a cumulative count someplace?
-
- Thanks for your time,
-
- -Andy

You've gotten good info from the other guys on how to scale your're DB
but to answer you're original question. I usually use this query:

select count(*) from pg_stat_activity where state <> 'idle';

That gives you the # of "active" connections to your database and is
something you want to try to get good metrics on.

Idle connections have some overhead but if Active > # of CPUs your performance
starts to degrade. Now, really that's pretty normal but, ideally, you need to
know what the ratio of Active Connections to # CPUs still gives you acceptable
performance. And that's really based on your app and hardware.

How often do you run that? Once a second? And graph it? I was doing
it every 10 seconds, but it doesn't give me a good view of the system.

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#17David Kerr
dmk@mr-paradox.net
In reply to: Andy Colson (#16)
Re: Monitoring number of backends

On Wed, Oct 23, 2013 at 12:11:39PM -0500, andy wrote:
- On 10/23/2013 11:07 AM, David Kerr wrote:
- >On Tue, Oct 22, 2013 at 12:41:58PM -0500, andy wrote:
- >- Hi all.
- >-
- >- My website is about to get a little more popular. I'm trying to add in
- >- some measurements to determine an upper limit of how many concurrent
- >- database connections I'm currently using.
- >-
- >- I've started running this:
- >-
- >- SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM
- >- pg_stat_database;
- >-
- >- Every 10 seconds or so. I don't think its what I want though. It seems
- >- way too small. I'm guessing that its not a measure of the previous 10
- >- seconds. Its a count of how many backends are in use at the exact
- >- moment I run the sql.
- >-
- >- Is there a cumulative count someplace?
- >-
- >- Thanks for your time,
- >-
- >- -Andy
- >
- >You've gotten good info from the other guys on how to scale your're DB
- >but to answer you're original question. I usually use this query:
- >
- >select count(*) from pg_stat_activity where state <> 'idle';
- >
- >That gives you the # of "active" connections to your database and is
- >something you want to try to get good metrics on.
- >
- >Idle connections have some overhead but if Active > # of CPUs your
- >performance
- >starts to degrade. Now, really that's pretty normal but, ideally, you need
- >to
- >know what the ratio of Active Connections to # CPUs still gives you
- >acceptable
- >performance. And that's really based on your app and hardware.
- >
- >
-
- How often do you run that? Once a second? And graph it? I was doing
- it every 10 seconds, but it doesn't give me a good view of the system.

I actually have it as a munin module so it runs every few minutes.
If I'm actually doing a performance test or something I would run it
every second or every 5 / 10 seconds.

The knowledge is only really useful if you have a good trend built up so you
know how your app responds to having a certian # of active connections.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#18Marko Kreen
markokr@gmail.com
In reply to: Andy Colson (#13)
Re: Monitoring number of backends

On Tue, Oct 22, 2013 at 09:45:24PM -0500, Andy Colson wrote:

On 10/22/2013 12:59 PM, Stephen Frost wrote:

Andy,

* andy (andy@squeakycode.net) wrote:

My website is about to get a little more popular. I'm trying to add
in some measurements to determine an upper limit of how many
concurrent database connections I'm currently using.

PG is really *much* happier if you have only one backend per CPU in your
system. The way to get there is by using a connection pooler like
pg_bouncer and configuring it based on how many CPUs you have.

pg_bouncer can also provide stats for you.

Thanks,

Stephen

Ahh, bummer, man. PgBouncer doesn't work so well when you have lots of databases. I have about 90 databases, the website could connect to any one of them on any request. (They are all about as equally likely to be hit)

In my pgbouncer.ini I set
[databases]
* =
[pgbouncer]
max_cleint_conn = 200
default_pool_size = 20

but that made each database its own pool, so 90*20 possible (but then maxed out at 200, I assume). Postgres has only a max_connections of 40, so I started getting "FATAL sorry too many cleints already" errors.

I set the max_cleint_conn = 40
and default_pool_size = 3

so it should stop erroring out, but is this type of setup even going to help me? If pgbouncer has 40 connections open to 40 different databases, and then a request comes in for a db it doesnt have, it'll have to drop one and build a new connection, yes? Won't that be slower than just connecting right to postgres in the first place?

I need a Most-Recently-Used pool, not a per-database pool. Is there a way to do this with pgbouncer? (With a large number of databases)

PgBouncer does not do any cross-pool limiting, so you still can have
3*90 server connections open.

The max_client_conn is meant for rescue limit when something is
completely broken and should not be hit in normal work. 40000 would
be reasonable number...

With that many databases, you just need to accept you need few
connections to each db open, otherwise you won't get much win from
pooling. So Postgres max_connections should be indeed >= 3*90 for you.
And you can limit server connection via server_idle_timeout.

--
marko

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#19Jeff Janes
jeff.janes@gmail.com
In reply to: Andy Colson (#1)
Re: Monitoring number of backends

On Tue, Oct 22, 2013 at 10:41 AM, andy <andy@squeakycode.net> wrote:

Hi all.

My website is about to get a little more popular. I'm trying to add in
some measurements to determine an upper limit of how many concurrent
database connections I'm currently using.

I've started running this:

SELECT sum(numbackends) AS count, sum(xact_commit) as ttlcommit FROM
pg_stat_database;

Every 10 seconds or so. I don't think its what I want though. It seems
way too small. I'm guessing that its not a measure of the previous 10
seconds. Its a count of how many backends are in use at the exact moment I
run the sql.

Is there a cumulative count someplace?

You can pull it from the log files if you change postgres.conf to use:

log_connections = on
log_disconnections = on

Cheers,

Jeff

#20Jeff Janes
jeff.janes@gmail.com
In reply to: John R Pierce (#6)
Re: Monitoring number of backends

On Tue, Oct 22, 2013 at 12:18 PM, John R Pierce <pierce@hogranch.com> wrote:

On 10/22/2013 11:25 AM, andy wrote:

Hum.. I had not thought of that. My current setup uses 40 max
connections, and I don't think I've ever hit it. I use apache and php, and
my db connections are not persistent.

that style of php programming, you're getting some HUGE overhead in
connect/disconnect per web page. putting pg_bouncer in the middle will
make a HUGE improvement, possibly a second per page load on a busy server.

My recent experience with mediawiki is that php is such a slow beast anyway
(even with APC) that connection/disconnect overhead is likely not to be
significant. But it would still be a good idea for him to learn pgbouncer,
in case his php code is much faster than mediawiki's is, or he runs into
the spinlock contention inside postgresql that has been all the rage
lately. It just isn't the first place I would look anymore.

Cheers,

Jeff

#21Jeff Janes
jeff.janes@gmail.com
In reply to: Andy Colson (#8)
#22Andy Colson
andy@squeakycode.net
In reply to: Jeff Janes (#21)