Postgres processes getting stuck (bug?)

Started by Ciprian Grigorasalmost 10 years ago7 messagesgeneral
Jump to latest
#1Ciprian Grigoras
ciprian.grigoras@vitals.com

Hi guys,

I had a question. We're running Postgresql 9.0.7 , and all of a sudden we
started seeing unexpected behavior. One table got locked after we ran one
simple INSERT on one test item.
Nothing else was running against that table as far as we know, and the
query is frozen there now. After some time we tried to terminate it
forcefully (pg_terminate_backend), ran that and the return of the command
is "true" but the query still stays on, shows up on pg_stat_activity etc.
Reading from the table is fine, but we believe any other inserts / updates
are not possible.

Restarting the Postgresql server is not an option at this moment, since
it's a production box and another high-demand database is running from that.

Has anyone seen a similar issue (maybe a bug in the 9.0 version ?) where
simple statements don't finish and get locked there and can't be killed by
the pg_terminate_backed ? What is the cause of this ?
Any reasonable way to find out more details on what caused this, how to
prevent it in the future, and how it can be fixed sensitively now ? Thanks !

just fyi, checking the data in the "pg_locks" (for that process that is
frozen), shows a bunch of rows all with the same virtualtransactionid. Only
one of them has the mode of "ExclusiveLock" (the only record there with a
locktype of "virtualxid"), a few have the "RowExclusiveLock" mode and the
vast majority have the "AccessShareLock" mode.

Thanks,
Ciprian

#2Bob Lunney
blunney@meetme.com
In reply to: Ciprian Grigoras (#1)
Re: Postgres processes getting stuck (bug?)

Check the wait state for the backend process. I’ve seen this happen when a process ran a per-row trigger that tried to connect to something else, and it filled the ip_conntrack table. So, kernel level wait along with a whole bunch of locks on the table in question.

Running pg_terminate_backend() didn’t work, as the signal queued behind the kernel wait. We had to bounce the database to get rid of the problem. Immediately afterwards we disabled the trigger.

HTH,

Bob Lunney

Show quoted text

On Apr 29, 2016, at 1:30 PM, Ciprian Grigoras <ciprian.grigoras@vitals.com> wrote:

Hi guys,

I had a question. We're running Postgresql 9.0.7 , and all of a sudden we started seeing unexpected behavior. One table got locked after we ran one simple INSERT on one test item.
Nothing else was running against that table as far as we know, and the query is frozen there now. After some time we tried to terminate it forcefully (pg_terminate_backend), ran that and the return of the command is "true" but the query still stays on, shows up on pg_stat_activity etc.
Reading from the table is fine, but we believe any other inserts / updates are not possible.

Restarting the Postgresql server is not an option at this moment, since it's a production box and another high-demand database is running from that.

Has anyone seen a similar issue (maybe a bug in the 9.0 version ?) where simple statements don't finish and get locked there and can't be killed by the pg_terminate_backed ? What is the cause of this ?
Any reasonable way to find out more details on what caused this, how to prevent it in the future, and how it can be fixed sensitively now ? Thanks !

just fyi, checking the data in the "pg_locks" (for that process that is frozen), shows a bunch of rows all with the same virtualtransactionid. Only one of them has the mode of "ExclusiveLock" (the only record there with a locktype of "virtualxid"), a few have the "RowExclusiveLock" mode and the vast majority have the "AccessShareLock" mode.

Thanks,
Ciprian

#3John R Pierce
pierce@hogranch.com
In reply to: Ciprian Grigoras (#1)
Re: Postgres processes getting stuck (bug?)

On 4/29/2016 10:30 AM, Ciprian Grigoras wrote:

We're running Postgresql 9.0.7...

do note, the 9.0 series got up to 9.0.23 before it was discontinued last
year. 9.0.7 was released in February 2012, 4+ years ago.

9.0.18 fixed some index corruption bugs in GIST indexes, 9.0.15 fixed a
bunch of other data corruption problems, 9.0.13 fixed yet more GiST
index problems. of course, every one of these incremental updates
fixed dozens of relatively obscure bugs, you'd need to read the release
notes for each version between 9.0.8 and 9.0.23 for the complete list.

--
john r pierce, recycling bits in santa cruz

#4Jerry Sievers
gsievers19@comcast.net
In reply to: Ciprian Grigoras (#1)
Re: Postgres processes getting stuck (bug?)

Ciprian Grigoras <ciprian.grigoras@vitals.com> writes:

Hi guys,

I had a question. We're running Postgresql 9.0.7 , and all of a sudden we started seeing unexpected behavior. One table got locked after we ran one simple INSERT on one
test item.
Nothing else was running against that table as far as we know, and the query is frozen there now. After some time we tried to terminate it forcefully
(pg_terminate_backend), ran that and the return of the command is "true" but the query still stays on, shows up on pg_stat_activity etc.
Reading from the table is fine, but we believe any other inserts / updates are not possible.

What do you get from strace -p $pid?

Restarting the Postgresql server is not an option at this moment, since it's a production box and another high-demand database is running from that.

Has anyone seen a similar issue (maybe a bug in the 9.0 version ?) where simple statements don't finish and get locked there and can't be killed by the
pg_terminate_backed ? What is the cause of this ?
Any reasonable way to find out more details on what caused this, how to prevent it in the future, and how it can be fixed sensitively now ? Thanks !

just fyi, checking the data in the "pg_locks" (for that process that is frozen), shows a bunch of rows all with the same virtualtransactionid. Only one of them has the
mode of "ExclusiveLock" (the only record there with a locktype of "virtualxid"), a few have the "RowExclusiveLock" mode and the vast majority have the "AccessShareLock"
mode.

Thanks,
Ciprian

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Ciprian Grigoras
ciprian.grigoras@vitals.com
In reply to: Jerry Sievers (#4)
Re: Postgres processes getting stuck (bug?)

Thanks everyone, it could be that the Linux server is problematic (it's
got a long uptime).
I ran the strace thing,

1. [root@box ~]# strace -p 20825
2. Process 20825 attached - interrupt to quit
3. connect(90, {sa_family=AF_FILE, path="/var/centrifydc/daemon2"}, 25

and it kept waiting there with minimal information, nothing else was
shown...

On Fri, Apr 29, 2016 at 2:33 PM, Jerry Sievers <gsievers19@comcast.net>
wrote:

Ciprian Grigoras <ciprian.grigoras@vitals.com> writes:

Hi guys,

I had a question. We're running Postgresql 9.0.7 , and all of a sudden

we started seeing unexpected behavior. One table got locked after we ran
one simple INSERT on one

test item.
Nothing else was running against that table as far as we know, and the

query is frozen there now. After some time we tried to terminate it
forcefully

(pg_terminate_backend), ran that and the return of the command is "true"

but the query still stays on, shows up on pg_stat_activity etc.

Reading from the table is fine, but we believe any other inserts /

updates are not possible.

What do you get from strace -p $pid?

Restarting the Postgresql server is not an option at this moment, since

it's a production box and another high-demand database is running from that.

Has anyone seen a similar issue (maybe a bug in the 9.0 version ?) where

simple statements don't finish and get locked there and can't be killed by
the

pg_terminate_backed ? What is the cause of this ?
Any reasonable way to find out more details on what caused this, how to

prevent it in the future, and how it can be fixed sensitively now ? Thanks !

just fyi, checking the data in the "pg_locks" (for that process that is

frozen), shows a bunch of rows all with the same virtualtransactionid. Only
one of them has the

mode of "ExclusiveLock" (the only record there with a locktype of

"virtualxid"), a few have the "RowExclusiveLock" mode and the vast majority
have the "AccessShareLock"

mode.

Thanks,
Ciprian

--
Jerry Sievers
Postgres DBA/Development Consulting
e: postgres.consulting@comcast.net
p: 312.241.7800

--

[image: Description: vitals_logo_150.png]

*Ciprian Grigoras*

*Software Engineer *
*Vitals* | 160 Chubb Avenue, Suite 301, Lyndhurst, NJ 07071, USA

P : 201.459.6275 | M : 845.517.8710

http://www.vitals.com

Attachments:

image008.pngimage/png; name=image008.pngDownload
#6Merlin Moncure
mmoncure@gmail.com
In reply to: John R Pierce (#3)
Re: Postgres processes getting stuck (bug?)

On Fri, Apr 29, 2016 at 1:09 PM, John R Pierce <pierce@hogranch.com> wrote:

On 4/29/2016 10:30 AM, Ciprian Grigoras wrote:

We're running Postgresql 9.0.7...

do note, the 9.0 series got up to 9.0.23 before it was discontinued last
year. 9.0.7 was released in February 2012, 4+ years ago.

9.0.18 fixed some index corruption bugs in GIST indexes, 9.0.15 fixed a
bunch of other data corruption problems, 9.0.13 fixed yet more GiST index
problems. of course, every one of these incremental updates fixed
dozens of relatively obscure bugs, you'd need to read the release notes for
each version between 9.0.8 and 9.0.23 for the complete list.

+1 this -- OP is running EOL version of postgres minus 16 or so bugfix
releases.

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#7Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Ciprian Grigoras (#1)
Re: Postgres processes getting stuck (bug?)

Ciprian Grigoras wrote:

Has anyone seen a similar issue (maybe a bug in the 9.0 version ?) where
simple statements don't finish and get locked there and can't be killed by
the pg_terminate_backed ? What is the cause of this ?

Interesting. I wonder if you just wrapped around pg_multixact and the
backend is stuck in the loop there. I have never heard of anyone with
this problem, and wraparound is supposed to work fine in 9.0. (As I
recall that code is there since 8.1 so it'd be very surprising that it
would have bugs there and not notice all this time).

Since you've been evidently running with this for years and never seen
this problem, I would be surprised if you ever see it again. Still, if
it does happen, please do grab a stack trace with GDB on the blocked
processes. Make sure to have debug symbols.

--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general