Startup death!

Started by Sam Liddicottover 23 years ago6 messagesgeneral
Jump to latest
#1Sam Liddicott
sam.liddicott@ananova.com

From time to time on postgres 7.2 or postgres 7.2.1 we get a case when we
have the maximum number of postgres processes all taking all available CPU
shared among themselves stuck in "startup" mode (as "ps -fwwwwu postgres"
shows).

The only cure is to is to do a shutdown (which doesn't work) and then kill
-9 one of the stuck-in-startup processes upon which they all die and it
shuts down properly within seconds.

We then restart postgres and all is well.

The only extra info I have is that under 7.2 (not 7.2.1) after such
circumstances, if I then did opened a psql process on that DB it would take
many (perhaps 10 seconds) before psql gave me a prompt. If before this time
I open the DB to many clients they all get stuck in startup again, but if I
wait till after this prompt then they do not get stuck in startup again.
In contrast 7.2.1 psql client gives the prompt right away but the first
simple query (select * from channelregion; - a few hundred row) takes maybe
5 seconds the first time.

Why are all these processes stuck in startup and taking as much cpu as they
can?

Sam
_____

Samuel Liddicott
Support Consultant
sam@ananova.com <mailto:sam@ananova.com>
Direct Dial: +44 (0)113 367 4523
Fax: +44 (0)113 367 4680
Switchboard: +44 (0)113 367 4600

Ananova Limited
Marshall Mill
Marshall Street
Leeds
LS11 9YJ

http://www.ananova.com

Registered Office:
St James Court
Great Park Road
Almondsbury Park
Bradley Stoke
Bristol BS32 4QJ
Registered in England No.2858918

The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipient is prohibited. If you receive
this in error, please contact the sender and delete the material from any
computer.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sam Liddicott (#1)
Re: Startup death!

"Sam Liddicott" <sam.liddicott@ananova.com> writes:

Why are all these processes stuck in startup and taking as much cpu as they
can?

You tell us. Attach to a few of them with gdb and get stack traces.
(It will help if you've built PG with --enable-debug.)

regards, tom lane

#3Ericson Smith
eric@did-it.com
In reply to: Tom Lane (#2)
Re: Startup death!

Seems I had this same problem a while back with 7.2.1

We had I/O problems. Our RAID controller driver was acting up. Upgrading
the i20 driver from Redhat finally and definitively solved the problem.

If you check your processlist, you will see that those "startup"
processes are in an Uninterruptible Sleep mode. We ended up having to
hard reboot the machine to shut down Postgresql. After about a week of
this we found out about the driver.

I would love to hear what your solution was, but am almost sure it is
related to a disk i/o issue.

For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to be doing in that
mode?

- Ericson Smith
eric@did-it.com

Show quoted text

On Thu, 2002-07-18 at 09:57, Tom Lane wrote:

"Sam Liddicott" <sam.liddicott@ananova.com> writes:

Why are all these processes stuck in startup and taking as much cpu as they
can?

You tell us. Attach to a few of them with gdb and get stack traces.
(It will help if you've built PG with --enable-debug.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

#4Sam Liddicott
sam.liddicott@ananova.com
In reply to: Ericson Smith (#3)
Re: Startup death!

-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 18 July 2002 15:34
To: Tom Lane
Cc: Postgresql General Mailing List
Subject: Re: [GENERAL] Startup death!

Seems I had this same problem a while back with 7.2.1

We had I/O problems. Our RAID controller driver was acting
up. Upgrading
the i20 driver from Redhat finally and definitively solved
the problem.

We're using redhat 7.3 with raid...
When was this that you got the i20 driver update. Did you have to say any
magic words? Is it part of any release lately? What version do you use
now?
For us, lsmod doesn't show any kind of i20
We have /dev/hdi20 which is owned by the dev-3.3-4 package, but it has i21,
i22 etc
The descriptions of all the packages installed don't mention i20

We have unused (no disks) Adaptec AIC7899 and then we actually use a
MegaRAID card.

I would love to hear what your solution was, but am almost sure it is
related to a disk i/o issue.

When it next happens we will strace -p and gdb the processes to see what
they are doing.

For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to be doing in that
mode?

yeah!

Sam

#5Ericson Smith
eric@did-it.com
In reply to: Sam Liddicott (#4)
Re: Startup death!

We got the i20 driver update from Adaptec's site, THEN updated RedHat's
kernel using their up2date utility.

Here's the steps:

1. Have your SCSI Raid driver disk ready
2. You need to reinstall RedHat in expert mode so it will *not load* the
default redhat driver for your RAID (this was part of the problem).
3. Insert the SCSI Raid driver when it prompts you
4. Install Linux as necessary
5. As soon as your install is finished, run rhn_register, and up2date to
download the latest kernels for your machine.
6. Install and run Postgres

These are the steps that we used with success.

- Ericson Smith
eric@did-it.com

Show quoted text

On Fri, 2002-07-19 at 03:52, Sam Liddicott wrote:

-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 18 July 2002 15:34
To: Tom Lane
Cc: Postgresql General Mailing List
Subject: Re: [GENERAL] Startup death!

Seems I had this same problem a while back with 7.2.1

We had I/O problems. Our RAID controller driver was acting
up. Upgrading
the i20 driver from Redhat finally and definitively solved
the problem.

We're using redhat 7.3 with raid...
When was this that you got the i20 driver update. Did you have to say any
magic words? Is it part of any release lately? What version do you use
now?
For us, lsmod doesn't show any kind of i20
We have /dev/hdi20 which is owned by the dev-3.3-4 package, but it has i21,
i22 etc
The descriptions of all the packages installed don't mention i20

We have unused (no disks) Adaptec AIC7899 and then we actually use a
MegaRAID card.

I would love to hear what your solution was, but am almost sure it is
related to a disk i/o issue.

When it next happens we will strace -p and gdb the processes to see what
they are doing.

For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to be doing in that
mode?

yeah!

Sam

#6Sam Liddicott
sam.liddicott@ananova.com
In reply to: Ericson Smith (#5)
Re: Startup death!

Thanks you very much, good advice here!
We will try this,
and may bug your personally (?) if we need clarification as it doesn't seem
to be a postgres issue.

Sam

Show quoted text

-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 19 July 2002 14:03
To: Sam Liddicott
Cc: pgsql-general@postgresql.org
Subject: RE: [GENERAL] Startup death!

We got the i20 driver update from Adaptec's site, THEN
updated RedHat's
kernel using their up2date utility.

Here's the steps:

1. Have your SCSI Raid driver disk ready
2. You need to reinstall RedHat in expert mode so it will
*not load* the
default redhat driver for your RAID (this was part of the problem).
3. Insert the SCSI Raid driver when it prompts you
4. Install Linux as necessary
5. As soon as your install is finished, run rhn_register, and
up2date to
download the latest kernels for your machine.
6. Install and run Postgres

These are the steps that we used with success.

- Ericson Smith
eric@did-it.com

On Fri, 2002-07-19 at 03:52, Sam Liddicott wrote:

-----Original Message-----
From: Ericson Smith [mailto:eric@did-it.com]
Sent: 18 July 2002 15:34
To: Tom Lane
Cc: Postgresql General Mailing List
Subject: Re: [GENERAL] Startup death!

Seems I had this same problem a while back with 7.2.1

We had I/O problems. Our RAID controller driver was acting
up. Upgrading
the i20 driver from Redhat finally and definitively solved
the problem.

We're using redhat 7.3 with raid...
When was this that you got the i20 driver update. Did you

have to say any

magic words? Is it part of any release lately? What

version do you use

now?
For us, lsmod doesn't show any kind of i20
We have /dev/hdi20 which is owned by the dev-3.3-4 package,

but it has i21,

i22 etc
The descriptions of all the packages installed don't mention i20

We have unused (no disks) Adaptec AIC7899 and then we actually use a
MegaRAID card.

I would love to hear what your solution was, but am

almost sure it is

related to a disk i/o issue.

When it next happens we will strace -p and gdb the

processes to see what

they are doing.

For others in the list... What does it mean when the Postgresql
processes are in startup mode? What is it supposed to be

doing in that

mode?

yeah!

Sam