Startup death!
From time to time on postgres 7.2 or postgres 7.2.1 we get a case when we
have the maximum number of postgres processes all taking all available CPU
shared among themselves stuck in "startup" mode (as "ps -fwwwwu postgres"
shows).
The only cure is to is to do a shutdown (which doesn't work) and then kill
-9 one of the stuck-in-startup processes upon which they all die and it
shuts down properly within seconds.
We then restart postgres and all is well.
The only extra info I have is that under 7.2 (not 7.2.1) after such
circumstances, if I then did opened a psql process on that DB it would take
many (perhaps 10 seconds) before psql gave me a prompt. If before this time
I open the DB to many clients they all get stuck in startup again, but if I
wait till after this prompt then they do not get stuck in startup again.
In contrast 7.2.1 psql client gives the prompt right away but the first
simple query (select * from channelregion; - a few hundred row) takes maybe
5 seconds the first time.
Why are all these processes stuck in startup and taking as much cpu as they
can?
Sam
_____
Samuel Liddicott
Support Consultant
sam@ananova.com <mailto:sam@ananova.com>
Direct Dial: +44 (0)113 367 4523
Fax: +44 (0)113 367 4680
Switchboard: +44 (0)113 367 4600
Ananova Limited
Marshall Mill
Marshall Street
Leeds
LS11 9YJ
Registered Office:
St James Court
Great Park Road
Almondsbury Park
Bradley Stoke
Bristol BS32 4QJ
Registered in England No.2858918
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you receive
this in error, please contact the sender and delete the material from any
computer.
"Sam Liddicott" <sam.liddicott@ananova.com> writes:
Why are all these processes stuck in startup and taking as much cpu as they
can?
You tell us. Attach to a few of them with gdb and get stack traces.
(It will help if you've built PG with --enable-debug.)
regards, tom lane
Seems I had this same problem a while back with 7.2.1
We had I/O problems. Our RAID controller driver was acting up. Upgrading
the i20 driver from Redhat finally and definitively solved the problem.
If you check your processlist, you will see that those "startup"
processes are in an Uninterruptible Sleep mode. We ended up having to
hard reboot the machine to shut down Postgresql. After about a week of
this we found out about the driver.
I would love to hear what your solution was, but am almost sure it is
related to a disk i/o issue.
For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to be doing in that
mode?
- Ericson Smith
eric@did-it.com
Show quoted text
On Thu, 2002-07-18 at 09:57, Tom Lane wrote:
"Sam Liddicott" <sam.liddicott@ananova.com> writes:
Why are all these processes stuck in startup and taking as much cpu as they
can?You tell us. Attach to a few of them with gdb and get stack traces.
(It will help if you've built PG with --enable-debug.)regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 18 July 2002 15:34
To: Tom Lane
Cc: Postgresql General Mailing List
Subject: Re: [GENERAL] Startup death!Seems I had this same problem a while back with 7.2.1
We had I/O problems. Our RAID controller driver was acting
up. Upgrading
the i20 driver from Redhat finally and definitively solved
the problem.
We're using redhat 7.3 with raid...
When was this that you got the i20 driver update. Did you have to say any
magic words? Is it part of any release lately? What version do you use
now?
For us, lsmod doesn't show any kind of i20
We have /dev/hdi20 which is owned by the dev-3.3-4 package, but it has i21,
i22 etc
The descriptions of all the packages installed don't mention i20
We have unused (no disks) Adaptec AIC7899 and then we actually use a
MegaRAID card.
I would love to hear what your solution was, but am almost sure it is
related to a disk i/o issue.
When it next happens we will strace -p and gdb the processes to see what
they are doing.
For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to be doing in that
mode?
yeah!
Sam
Import Notes
Resolved by subject fallback
We got the i20 driver update from Adaptec's site, THEN updated RedHat's
kernel using their up2date utility.
Here's the steps:
1. Have your SCSI Raid driver disk ready
2. You need to reinstall RedHat in expert mode so it will *not load* the
default redhat driver for your RAID (this was part of the problem).
3. Insert the SCSI Raid driver when it prompts you
4. Install Linux as necessary
5. As soon as your install is finished, run rhn_register, and up2date to
download the latest kernels for your machine.
6. Install and run Postgres
These are the steps that we used with success.
- Ericson Smith
eric@did-it.com
Show quoted text
On Fri, 2002-07-19 at 03:52, Sam Liddicott wrote:
-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 18 July 2002 15:34
To: Tom Lane
Cc: Postgresql General Mailing List
Subject: Re: [GENERAL] Startup death!Seems I had this same problem a while back with 7.2.1
We had I/O problems. Our RAID controller driver was acting
up. Upgrading
the i20 driver from Redhat finally and definitively solved
the problem.We're using redhat 7.3 with raid...
When was this that you got the i20 driver update. Did you have to say any
magic words? Is it part of any release lately? What version do you use
now?
For us, lsmod doesn't show any kind of i20
We have /dev/hdi20 which is owned by the dev-3.3-4 package, but it has i21,
i22 etc
The descriptions of all the packages installed don't mention i20We have unused (no disks) Adaptec AIC7899 and then we actually use a
MegaRAID card.I would love to hear what your solution was, but am almost sure it is
related to a disk i/o issue.When it next happens we will strace -p and gdb the processes to see what
they are doing.For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to be doing in that
mode?yeah!
Sam
Thanks you very much, good advice here!
We will try this,
and may bug your personally (?) if we need clarification as it doesn't seem
to be a postgres issue.
Sam
Show quoted text
-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 19 July 2002 14:03
To: Sam Liddicott
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] Startup death!We got the i20 driver update from Adaptec's site, THEN
updated RedHat's
kernel using their up2date utility.Here's the steps:
1. Have your SCSI Raid driver disk ready
2. You need to reinstall RedHat in expert mode so it will
*not load* the
default redhat driver for your RAID (this was part of the problem).
3. Insert the SCSI Raid driver when it prompts you
4. Install Linux as necessary
5. As soon as your install is finished, run rhn_register, and
up2date to
download the latest kernels for your machine.
6. Install and run PostgresThese are the steps that we used with success.
- Ericson Smith
eric@did-it.comOn Fri, 2002-07-19 at 03:52, Sam Liddicott wrote:
-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 18 July 2002 15:34
To: Tom Lane
Cc: Postgresql General Mailing List
Subject: Re: [GENERAL] Startup death!Seems I had this same problem a while back with 7.2.1
We had I/O problems. Our RAID controller driver was acting
up. Upgrading
the i20 driver from Redhat finally and definitively solved
the problem.We're using redhat 7.3 with raid...
When was this that you got the i20 driver update. Did youhave to say any
magic words? Is it part of any release lately? What
version do you use
now?
For us, lsmod doesn't show any kind of i20
We have /dev/hdi20 which is owned by the dev-3.3-4 package,but it has i21,
i22 etc
The descriptions of all the packages installed don't mention i20We have unused (no disks) Adaptec AIC7899 and then we actually use a
MegaRAID card.I would love to hear what your solution was, but am
almost sure it is
related to a disk i/o issue.
When it next happens we will strace -p and gdb the
processes to see what
they are doing.
For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to bedoing in that
mode?
yeah!
Sam
Import Notes
Resolved by subject fallback