How to cripple a postgres server
I've observed the following, and I wonder if anyone has seen it or has a
workaround, before I report it as a bug.
Doing the following will result in the load on the server reaching ~600,
and the server becoming very unresponsive. This seems to be as a result
of the SIGUSR2 which is delivered for async notify to free space in the
event queue thingy (I'm sort of in the dark here).
1) Open many (I used 800) database connections and leave them idle.
2) run: while true; do vacuum {database} ; done
Wait. Observe db has stopped doing useful work.
This causes glitches of up to many minutes, during which the db fails to
respond in any way.
We noticed this on a production system, so it's fairly dire. I know one
workaround is to reduce the number of idle connections, but that will
make our system less responsive.
It looks like a bug to me - anyone got any ideas?
Stephen
Stephen Robert Norris <srn@commsecure.com.au> writes:
1) Open many (I used 800) database connections and leave them idle.
2) run: while true; do vacuum {database} ; done
Wait. Observe db has stopped doing useful work.
Is your machine capable of supporting 800 active backends in the absence
of any vacuum processing? I'm not sure that "we're driving our server
into swapping hell" qualifies as a Postgres bug ...
If you're seeing load peaks in excess of what would be observed with
800 active queries, then I would agree there's something to investigate
here.
regards, tom lane
On Tue, 2002-05-28 at 11:57, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
1) Open many (I used 800) database connections and leave them idle.
2) run: while true; do vacuum {database} ; doneWait. Observe db has stopped doing useful work.
Is your machine capable of supporting 800 active backends in the absence
of any vacuum processing? I'm not sure that "we're driving our server
into swapping hell" qualifies as a Postgres bug ...If you're seeing load peaks in excess of what would be observed with
800 active queries, then I would agree there's something to investigate
here.regards, tom lane
Yep, indeed with 800 backends doing a query every second, nothing
happens.
The machine itself has 1GB of RAM, and uses no swap in the above
situation. Instead, system time goes to 99% of CPU. The machine is a
dual-CPU athlon 1900 (1.5GHz).
It _only_ happens with idle connections!
Stephen
Stephen Robert Norris <srn@commsecure.com.au> writes:
If you're seeing load peaks in excess of what would be observed with
800 active queries, then I would agree there's something to investigate
here.
Yep, indeed with 800 backends doing a query every second, nothing
happens.
The machine itself has 1GB of RAM, and uses no swap in the above
situation. Instead, system time goes to 99% of CPU. The machine is a
dual-CPU athlon 1900 (1.5GHz).
It _only_ happens with idle connections!
Hmm, you mean if the 800 other connections are *not* idle, you can
do VACUUMs with impunity? If so, I'd agree we got a bug ...
regards, tom lane
On Tue, 2002-05-28 at 12:44, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
If you're seeing load peaks in excess of what would be observed with
800 active queries, then I would agree there's something to investigate
here.Yep, indeed with 800 backends doing a query every second, nothing
happens.The machine itself has 1GB of RAM, and uses no swap in the above
situation. Instead, system time goes to 99% of CPU. The machine is a
dual-CPU athlon 1900 (1.5GHz).It _only_ happens with idle connections!
Hmm, you mean if the 800 other connections are *not* idle, you can
do VACUUMs with impunity? If so, I'd agree we got a bug ...regards, tom lane
Yes, that's what I'm saying.
If you put a sleep 5 into my while loop above, the problem still happens
(albeit after a longer time).
If you put a delay in, and also make sure that each of the 800
connections does a query (I'm using "select 1;"), then the problem never
happens.
Without the delay, you get a bit of load, but only up to about 10 or so,
rather than 600.
What seems to happen is that all the idle backends get woken up every
now and then to do _something_ (it seemed to coincide with getting the
SIGUSR2 to trigger an async_notify), and that shoots up the load. Making
sure all the backends have done something every now and then seems to
avoid the problem.
I suspect the comment near the async_notify code explains the problem -
where it talks about idle backends having to be woken up to clear out
pending events...
We only notice it on our production system when it's got lots of idle db
connections.
Other people seem to have spotted it (I found references to similar
effects with Google) but nobody seems to have worked out what it is -
the other people typically had the problem with web servers which keep a
pool of db connections.
Just to reiterate; this machine never swaps, it can handle > 1000
queries per second and it doesn't seem to run out of resources...
Stephen
Stephen Robert Norris <srn@commsecure.com.au> writes:
On Tue, 2002-05-28 at 12:44, Tom Lane wrote:
Hmm, you mean if the 800 other connections are *not* idle, you can
do VACUUMs with impunity? If so, I'd agree we got a bug ...
Yes, that's what I'm saying.
Fascinating.
I suspect the comment near the async_notify code explains the problem -
where it talks about idle backends having to be woken up to clear out
pending events...
Well, we need to do that, but your report seems to suggest that that
code path isn't getting the job done completely --- my first guess is
that a new normal query has to arrive before a SIGUSR2'd backend is
completely happy again. Interesting.
You didn't specify: what PG version are you running, and on what
platform?
regards, tom lane
On Tue, 2002-05-28 at 13:21, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
On Tue, 2002-05-28 at 12:44, Tom Lane wrote:
Hmm, you mean if the 800 other connections are *not* idle, you can
do VACUUMs with impunity? If so, I'd agree we got a bug ...Yes, that's what I'm saying.
Fascinating.
I suspect the comment near the async_notify code explains the problem -
where it talks about idle backends having to be woken up to clear out
pending events...Well, we need to do that, but your report seems to suggest that that
code path isn't getting the job done completely --- my first guess is
that a new normal query has to arrive before a SIGUSR2'd backend is
completely happy again. Interesting.You didn't specify: what PG version are you running, and on what
platform?regards, tom lane
Sorry, forgot that.
PG 7.1 or 7.2 (both have the problem) on linux 2.2.x (x>=17) and 2.4.x
(4 <= x <= 18).
I've tried a variety of intel machines (and AMD machines) and managed to
reproduce the same problem.
To clarify my comments, I suspect the problem is that all 800 of the
backends get the SIGUSR2 at the same time, and all wake up, causing the
kernel scheduler to go mad trying to decide which one to schedule... If
the connections aren't idle, the queue never fills up enough to require
the signals...
Stephen
Stephen Robert Norris <srn@commsecure.com.au> writes:
To clarify my comments, I suspect the problem is that all 800 of the
backends get the SIGUSR2 at the same time, and all wake up,
That would indeed happen ...
causing the
kernel scheduler to go mad trying to decide which one to schedule... If
the connections aren't idle, the queue never fills up enough to require
the signals...
... but it doesn't follow that this is causing your problem. All 800
are going to get woken up at the same time anyway (or as close to the
same time as the postmaster can issue the signals, that is). Why does
it make a difference whether they have or soon will process a normal
query? If you are able to prevent the problem by issuing "select 1"
once per second on each connection, then for sure it's not a case of
whether they are busy at receipt of the signal or not. Most of 'em
will not be busy at any given instant, with a load as trivial as that.
Do you use LISTEN/NOTIFY at all? When was your pg_listener table
last vacuumed?
regards, tom lane
On Tue, 2002-05-28 at 13:42, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
To clarify my comments, I suspect the problem is that all 800 of the
backends get the SIGUSR2 at the same time, and all wake up,That would indeed happen ...
causing the
kernel scheduler to go mad trying to decide which one to schedule... If
the connections aren't idle, the queue never fills up enough to require
the signals...... but it doesn't follow that this is causing your problem. All 800
are going to get woken up at the same time anyway (or as close to the
same time as the postmaster can issue the signals, that is). Why does
it make a difference whether they have or soon will process a normal
query? If you are able to prevent the problem by issuing "select 1"
once per second on each connection, then for sure it's not a case of
whether they are busy at receipt of the signal or not. Most of 'em
will not be busy at any given instant, with a load as trivial as that.Do you use LISTEN/NOTIFY at all? When was your pg_listener table
last vacuumed?regards, tom lane
My reading of the code was that the signals didn't get delivered unless
the queue got too full, and that entries on the queue are created by the
vacuum (and other stuff) and processed when a backend does something,
thus the queue never gets too full.
My test program just cycles through each connection doing the select 1
on each in series.
No LISTEN/NOTIFY at all. I've been creating entirely new db setups to
test it (with initdb).
Stephen
Stephen Robert Norris <srn@commsecure.com.au> writes:
My reading of the code was that the signals didn't get delivered unless
the queue got too full, and that entries on the queue are created by the
vacuum (and other stuff) and processed when a backend does something,
thus the queue never gets too full.
Good point. And certainly the async-notify code (which scans through
pg_listener) is a lot more expensive than is needed just to respond to
an SInval-queue-full condition; that looks to be a quick hack that was
inserted without thought to performance. But I don't think we quite
understand this issue yet. If your system can support 800 simultaneous
useful queries then it shouldn't have a problem with 800 simultaneous
scans of an empty pg_listener table.
I'll ask again: is your system sized to support 800 *simultaneous*
user queries? (One query per second on 800 connections is hardly
the same thing.)
regards, tom lane
On Tue, 2002-05-28 at 13:57, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
My reading of the code was that the signals didn't get delivered unless
the queue got too full, and that entries on the queue are created by the
vacuum (and other stuff) and processed when a backend does something,
thus the queue never gets too full.Good point. And certainly the async-notify code (which scans through
pg_listener) is a lot more expensive than is needed just to respond to
an SInval-queue-full condition; that looks to be a quick hack that was
inserted without thought to performance. But I don't think we quite
understand this issue yet. If your system can support 800 simultaneous
useful queries then it shouldn't have a problem with 800 simultaneous
scans of an empty pg_listener table.I'll ask again: is your system sized to support 800 *simultaneous*
user queries? (One query per second on 800 connections is hardly
the same thing.)regards, tom lane
I'm not sure; it can certainly do >1k queries/second through 800
simultaneous connections (about 1/connection second), but it's hard to
find enough machines to load it up that much...
One big difference, though, is that with the vacuum problem, the CPU
used is almost all (99%) system time; loading up the db with lots of
queries increases user time mostly, with little system time...
In any event, it seems a bug that merely having connections open causes
this problem! They aren't even in transactions...
Stephen
Stephen Robert Norris <srn@commsecure.com.au> writes:
One big difference, though, is that with the vacuum problem, the CPU
used is almost all (99%) system time; loading up the db with lots of
queries increases user time mostly, with little system time...
Hmm, that's a curious point; leaves one wondering about possible kernel
bugs.
In any event, it seems a bug that merely having connections open causes
this problem! They aren't even in transactions...
If the problem is that you've launched far more backends than the system
can really support, I'd have no hesitation in writing it off as user
error. "Idle" processes are not without cost. But at this point
I can't tell whether that's the case, or whether you're looking at a
genuine performance bug in either Postgres or the kernel.
Can you run strace (or truss or kernel-call-tracer-of-your-choice) on
the postmaster, and also on one of the putatively idle backends, so
we can see some more data about what's happening?
regards, tom lane
On Tue, 2002-05-28 at 14:24, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
One big difference, though, is that with the vacuum problem, the CPU
used is almost all (99%) system time; loading up the db with lots of
queries increases user time mostly, with little system time...Hmm, that's a curious point; leaves one wondering about possible kernel
bugs.In any event, it seems a bug that merely having connections open causes
this problem! They aren't even in transactions...If the problem is that you've launched far more backends than the system
can really support, I'd have no hesitation in writing it off as user
error. "Idle" processes are not without cost. But at this point
I can't tell whether that's the case, or whether you're looking at a
genuine performance bug in either Postgres or the kernel.Can you run strace (or truss or kernel-call-tracer-of-your-choice) on
the postmaster, and also on one of the putatively idle backends, so
we can see some more data about what's happening?regards, tom lane
I've already strace'ed the idle backend, and I can see the SIGUSR2 being
delivered just before everything goes bad.
What resource would you think idle backends might be exhausting?
On the production system, the problem doesn't happen under load (of
about 60-80 non-trivial queries/second) but does happen when the system
is largely idle. The number of connections is exactly the same in both
cases...
Stephen
On Tue, 28 May 2002, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
One big difference, though, is that with the vacuum problem, the CPU
used is almost all (99%) system time; loading up the db with lots of
queries increases user time mostly, with little system time...Hmm, that's a curious point; leaves one wondering about possible kernel
bugs.
If it turns out to be so, this would not be the first problem I've
heard about in the Linux scheduler. (It was notoriously bad for years.)
I'd suggest a good test would be to try this on a BSD machine and
see if the problem exists there, too. That will at least tell you
if it's Postgres or Linux.
cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
On Tue, 2002-05-28 at 15:43, Curt Sampson wrote:
On Tue, 28 May 2002, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
One big difference, though, is that with the vacuum problem, the CPU
used is almost all (99%) system time; loading up the db with lots of
queries increases user time mostly, with little system time...Hmm, that's a curious point; leaves one wondering about possible kernel
bugs.If it turns out to be so, this would not be the first problem I've
heard about in the Linux scheduler. (It was notoriously bad for years.)I'd suggest a good test would be to try this on a BSD machine and
see if the problem exists there, too. That will at least tell you
if it's Postgres or Linux.cjs
--
Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC
Can someone with access to a BSD machine try this? It's pretty easy to
set up, a simple loop to open a few hundred connections and a 1-line
shell script. It doesn't seem to matter what's in the database...
I'm going to try the O(1) scheduler patch for the linux kernel and see
if that helps...
Stephen
On Tue, 2002-05-28 at 14:24, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
One big difference, though, is that with the vacuum problem, the CPU
used is almost all (99%) system time; loading up the db with lots of
queries increases user time mostly, with little system time...Hmm, that's a curious point; leaves one wondering about possible kernel
bugs.In any event, it seems a bug that merely having connections open causes
this problem! They aren't even in transactions...If the problem is that you've launched far more backends than the system
can really support, I'd have no hesitation in writing it off as user
error. "Idle" processes are not without cost. But at this point
I can't tell whether that's the case, or whether you're looking at a
genuine performance bug in either Postgres or the kernel.Can you run strace (or truss or kernel-call-tracer-of-your-choice) on
the postmaster, and also on one of the putatively idle backends, so
we can see some more data about what's happening?regards, tom lane
I wonder if it's a problem with a second SIGUSR2 arriving before the
first is finished? It seems much easier to trigger the effect with more
rapid vacuums than with a delay (even accounting for the reduced number
of vacuums occurring).
Stephen
Can someone with access to a BSD machine try this? It's pretty easy to
set up, a simple loop to open a few hundred connections and a 1-line
shell script. It doesn't seem to matter what's in the database...I'm going to try the O(1) scheduler patch for the linux kernel and see
if that helps...Stephen
I'm willing, send over what you were using.. Also the schema and dump of any
database you were testing against (if it matters)..
I don't think I have a machine that I can get to tomorrow that can do 800
connections at the same time -- does a smaller number of connections produce
the same result on your machine(s)?
-Mitch
Hi Stephen,
Are you able to give some detailed technical specs of the hardware
you're running?
I consider yours to be a higher-end PostgreSQL server, and I'd like to
have a good practical understanding of what is required (hardware wise)
to put together a 1k/second transaction PostgreSQL server.
:-)
Regards and best wishes,
Justin Clift
<snip>
I'm not sure; it can certainly do >1k queries/second through 800
simultaneous connections (about 1/connection second), but it's hard to
find enough machines to load it up that much...One big difference, though, is that with the vacuum problem, the CPU
used is almost all (99%) system time; loading up the db with lots of
queries increases user time mostly, with little system time...In any event, it seems a bug that merely having connections open causes
this problem! They aren't even in transactions...Stephen
------------------------------------------------------------------------
Name: signature.asc
signature.asc Type: PGP Armored File (application/x-unknown-content-type-PGP Armored File)
Description: This is a digitally signed message part
--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi
Stephen Robert Norris <srn@commsecure.com.au> writes:
I've already strace'ed the idle backend, and I can see the SIGUSR2 being
delivered just before everything goes bad.
Yes, but what happens after that?
If you don't see anything obvious by examining a single process, maybe
strace'ing the postmaster + all descendant processes would offer a
better viewpoint.
What resource would you think idle backends might be exhausting?
Difficult to say. I suspect your normal load doesn't have *all* the
backends trying to run queries at once. But in any case the SIGUSR2
event should only produce a momentary spike in load, AFAICS.
regards, tom lane
On Tue, 2002-05-28 at 23:29, Tom Lane wrote:
Stephen Robert Norris <srn@commsecure.com.au> writes:
I've already strace'ed the idle backend, and I can see the SIGUSR2 being
delivered just before everything goes bad.Yes, but what happens after that?
The strace stops until I manually kill the connecting process - the
machine stops in general until then (vmstat 1 stops producing output,
shells stop responding ...). So who knows what happens :(
If you don't see anything obvious by examining a single process, maybe
strace'ing the postmaster + all descendant processes would offer a
better viewpoint.What resource would you think idle backends might be exhausting?
Difficult to say. I suspect your normal load doesn't have *all* the
backends trying to run queries at once. But in any case the SIGUSR2
event should only produce a momentary spike in load, AFAICS.regards, tom lane
I agree (about not having 800 simultaneous queries). Sometimes, the
SIGUSR2 does just create a very brief load spike (vmstat shows >500
processes on the run queue, but the next second everything is back to
normal and no unusual amount of CPU is consumed).
This sort of rules out (to me) a kernel problem, unless it's something
triggered at a specific number of processes (like 700 is bad, 699 is
OK).
Stephen