postgres process got stuck in "notify interrupt waiting" status

Started by Aleksey Tsalolikhinover 13 years ago13 messagesgeneral
Jump to latest
#1Aleksey Tsalolikhin
atsaloli.tech@gmail.com

Hi.

We use LISTEN/NOTIFY quite a bit but today something unusual (bad) happened.

Number of processes waiting for a lock just started to go up up up.

I finally found the object being locked was pg_listener which
RhodiumToad on IRC kindly informed happens during LISTEN/NOTIFY. The
process that had the lock (in pg_locks it had granted = t ) was shown
by ps in status "notify interrupt waiting" and has had the lock for
over half an hour. (Usually these notifications are very quick.)

the process would not respond to kill, so I kill -9'ed

The only reference I could find to a similar problem was at
http://archives.postgresql.org/pgsql-performance/2008-02/msg00345.php
which seemed to indicate a process should not be in this state for
very long.

We are on postgres 8.4.12.

I'd like to figure out what happened.

There is a web server that talks to this database server (amongst
other clients), and the client addr and port mapped to this web
server, but there was no process on the web server matching the port
number. that's when I decided to kill the postgres process.

Anything I should know or read up on? Any suggestions?

I'd like the system to be able to recover, and for the process to
terminate if the client is no longer around.

Best,
Aleksey

#2Aleksey Tsalolikhin
atsaloli.tech@gmail.com
In reply to: Aleksey Tsalolikhin (#1)
Re: postgres process got stuck in "notify interrupt waiting" status

BTW, after I signalled TERM, the process status changed from

notify interrupt waiting

to

notify interrupt waiting waiting

which I thought looked kind of odd.

Then I signalled KILL.

Aleksey

On Tue, Sep 4, 2012 at 6:21 PM, Aleksey Tsalolikhin
<atsaloli.tech@gmail.com> wrote:

Hi.

We use LISTEN/NOTIFY quite a bit but today something unusual (bad) happened.

Number of processes waiting for a lock just started to go up up up.

I finally found the object being locked was pg_listener which
RhodiumToad on IRC kindly informed happens during LISTEN/NOTIFY. The
process that had the lock (in pg_locks it had granted = t ) was shown
by ps in status "notify interrupt waiting" and has had the lock for
over half an hour. (Usually these notifications are very quick.)

the process would not respond to kill, so I kill -9'ed

The only reference I could find to a similar problem was at
http://archives.postgresql.org/pgsql-performance/2008-02/msg00345.php
which seemed to indicate a process should not be in this state for
very long.

We are on postgres 8.4.12.

I'd like to figure out what happened.

There is a web server that talks to this database server (amongst
other clients), and the client addr and port mapped to this web
server, but there was no process on the web server matching the port
number. that's when I decided to kill the postgres process.

Anything I should know or read up on? Any suggestions?

I'd like the system to be able to recover, and for the process to
terminate if the client is no longer around.

Best,
Aleksey

--
Upcoming Trainings:
"Editing with vi" 31 Aug 2012 at LinuxCon North America in San Diego,
CA (http://lcna2012.sched.org/speaker/alekseytsalolikhin)
"Time Management for System Administrators" 28 Sep 2012 at Ohio Linux
Fest (http://ohiolinux.org/register)
"Editing with vi" 28 Sep 2012 at Ohio Linux Fest (http://ohiolinux.org/register)
"Automating System Administration with CFEngine 3" 22-25 Oct 2012 in
Palo Alto, CA (http://www.eventbrite.com/event/3388161081)

#3John R Pierce
pierce@hogranch.com
In reply to: Aleksey Tsalolikhin (#2)
Re: postgres process got stuck in "notify interrupt waiting" status

On 09/04/12 7:09 PM, Aleksey Tsalolikhin wrote:

BTW, after I signalled TERM, the process status changed from

notify interrupt waiting

to

notify interrupt waiting waiting

which I thought looked kind of odd.

Then I signalled KILL.

was this a client process or a postgres process? kill -9 on postgres
processes can easily trigger data corruption.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

#4Aleksey Tsalolikhin
atsaloli.tech@gmail.com
In reply to: John R Pierce (#3)
Re: postgres process got stuck in "notify interrupt waiting" status

On Tue, Sep 4, 2012 at 7:21 PM, John R Pierce <pierce@hogranch.com> wrote:

On 09/04/12 7:09 PM, Aleksey Tsalolikhin wrote:

BTW, after I signalled TERM, the process status changed from

notify interrupt waiting

to

notify interrupt waiting waiting

which I thought looked kind of odd.

Then I signalled KILL.

was this a client process or a postgres process? kill -9 on postgres
processes can easily trigger data corruption.

This was a postgres process. i certainly won't signal KILL anymore to
postgres processes, thanks for that warning, John.

Aleksey

#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: John R Pierce (#3)
Re: postgres process got stuck in "notify interrupt waiting" status

John R Pierce wrote:

was this a client process or a postgres process? kill -9 on postgres
processes can easily trigger data corruption.

It definitely shouldn't cause data corruption, otherwise
PostgreSQL would not be crash safe.

Yours,
Laurenz Albe

#6Craig Ringer
craig@2ndquadrant.com
In reply to: John R Pierce (#3)
Re: postgres process got stuck in "notify interrupt waiting" status

On 09/05/2012 12:21 PM, John R Pierce wrote:

was this a client process or a postgres process? kill -9 on postgres
processes can easily trigger data corruption.

It certainly shouldn't.

kill -9 of the postmaster, deletion of postmaster.pid, and re-starting
postgresql *might* but AFAIK even then you'll have to bypass the shared
memory lockout (unless you're on Windows).

--
Craig Ringer

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#6)
Re: postgres process got stuck in "notify interrupt waiting" status

Craig Ringer <ringerc@ringerc.id.au> writes:

On 09/05/2012 12:21 PM, John R Pierce wrote:

was this a client process or a postgres process? kill -9 on postgres
processes can easily trigger data corruption.

It certainly shouldn't.

kill -9 of the postmaster, deletion of postmaster.pid, and re-starting
postgresql *might* but AFAIK even then you'll have to bypass the shared
memory lockout (unless you're on Windows).

Correction on that: manually deleting postmaster.pid *does* bypass the
shared memory lock. If there are still any live backends from the old
postmaster, you can get corruption as a result of this, because the old
backends and the new ones will be modifying the database independently.

This is why we recommend that you never delete postmaster.pid manually,
and certainly not as part of an automatic startup script.

Having said that, a kill -9 on an individual backend (*not* the
postmaster) should be safe enough, if you don't mind the fact that
it'll kill all your other sessions too.

regards, tom lane

#8Aleksey Tsalolikhin
atsaloli.tech@gmail.com
In reply to: Tom Lane (#7)
Re: postgres process got stuck in "notify interrupt waiting" status

On Wed, Sep 5, 2012 at 7:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Having said that, a kill -9 on an individual backend (*not* the
postmaster) should be safe enough, if you don't mind the fact that
it'll kill all your other sessions too.

Got it, thanks.

Why will it kill all your other sessions too? Isn't there a separate backend
process for each session?

Best,
-at

#9Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Aleksey Tsalolikhin (#8)
Re: postgres process got stuck in "notify interrupt waiting" status

Aleksey Tsalolikhin <atsaloli.tech@gmail.com> wrote:

Why will it kill all your other sessions too? Isn't there a
separate backend process for each session?

When stopped that abruptly, the process has no chance to clean up
its pending state in shared memory. A fresh copy of shared memory
is needed, so it is necessary to effectively do an immediate restart
on the whole PostgreSQL instance.

-Kevin

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Kevin Grittner (#9)
Re: postgres process got stuck in "notify interrupt waiting" status

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:

Aleksey Tsalolikhin <atsaloli.tech@gmail.com> wrote:

Why will it kill all your other sessions too? Isn't there a
separate backend process for each session?

When stopped that abruptly, the process has no chance to clean up
its pending state in shared memory. A fresh copy of shared memory
is needed, so it is necessary to effectively do an immediate restart
on the whole PostgreSQL instance.

Right. On seeing one child die unexpectedly, the postmaster forcibly
SIGQUITs all its other children and initiates a crash recovery sequence.
The reason for this is exactly that we can't trust the contents of
shared memory anymore. An example is that the dying backend may have
held some critical lock, which there is no way to release, so that every
other session will shortly be stuck anyway.

regards, tom lane

#11Aleksey Tsalolikhin
atsaloli.tech@gmail.com
In reply to: Tom Lane (#10)
Re: postgres process got stuck in "notify interrupt waiting" status

Got it, thanks, Kevin, Tom.

So how about that this process that was in "notify interrupt waiting
waiting" status after I SIGTERM'ed it. Is the double "waiting"
expected?

Aleksey

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Aleksey Tsalolikhin (#11)
Re: postgres process got stuck in "notify interrupt waiting" status

Aleksey Tsalolikhin <atsaloli.tech@gmail.com> writes:

So how about that this process that was in "notify interrupt waiting
waiting" status after I SIGTERM'ed it. Is the double "waiting"
expected?

That sounded a bit fishy to me too. But unless you can reproduce it in
something newer than 8.4.x, nobody's likely to take much of an interest.
The LISTEN/NOTIFY infrastructure got completely rewritten in 9.0, so
any bugs in the legacy version are probably just going to get benign
neglect at this point ... especially if we don't know how to reproduce
them.

regards, tom lane

#13Aleksey Tsalolikhin
atsaloli.tech@gmail.com
In reply to: Tom Lane (#12)
Re: postgres process got stuck in "notify interrupt waiting" status

On Wed, Sep 5, 2012 at 10:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

That sounded a bit fishy to me too. But unless you can reproduce it in
something newer than 8.4.x, nobody's likely to take much of an interest.
The LISTEN/NOTIFY infrastructure got completely rewritten in 9.0, so
any bugs in the legacy version are probably just going to get benign
neglect at this point ... especially if we don't know how to reproduce
them.

Got it, thanks, Tom! Will urge our shop to upgrade to 9.1.

Best,
-at