BUG #14588: Postgres process blocked on semop
The following bug has been logged on the website:
Bug reference: 14588
Logged by: Chenhong Liu
Email address: diabloneo@gmail.com
PostgreSQL version: 9.2.16
Operating system: CentOS 7.2 1511
Description:
We run a distributed application which storing data in pgsql. This
application is designed to do some route works during 00:00 to 6:00 every
day. I found that, postgres processes are all blocked on semop function,
which means they are waiting for sysv semaphores. I'm not sure if this is a
bug or we misconfigured pgsql and OS.
Techinque information:
30 hosts running the application.
3 hosts running the postgresql servers, one master and two host-stanby
servers.
pgsql config:
max_connections = 1000
shared_buffers = 512MB
OS kernel.sem
250 32000 32 128
During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has happened,
all new connection will stay in state authentication.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I add some log I collected for this problem. My them be helpful.
The attachements are two logs and one picture.
*sds_20170311_1124.bug.log*
It's log I collected every 5 minutes, it contains postgres processes list,
ipcs -s output and ipcs -s -u output. I delete most of the log, just keep
necessary pieces.
*ipcs.log *
It's the output of every semaphores set used by postgresql displayed with
ipcs -s -i command, you can find out many process were waited for a
semapthore. The number of lines which ncount == 1 is 469
*The pstack screenshot*
It shows two process's stack, one is pid 186397 who were in authentication,
and the other is 188832 who where in INSERT.
Forgot to mention, I can only recovery from this situation by restart
postgresql service. If I killed any process blocked with *kill -9 * command,
the service will restart.
On Sun, Mar 12, 2017 at 11:57 AM <diabloneo@gmail.com> wrote:
Show quoted text
The following bug has been logged on the website:
Bug reference: 14588
Logged by: Chenhong Liu
Email address: diabloneo@gmail.com
PostgreSQL version: 9.2.16
Operating system: CentOS 7.2 1511
Description:We run a distributed application which storing data in pgsql. This
application is designed to do some route works during 00:00 to 6:00 every
day. I found that, postgres processes are all blocked on semop function,
which means they are waiting for sysv semaphores. I'm not sure if this is a
bug or we misconfigured pgsql and OS.Techinque information:
30 hosts running the application.
3 hosts running the postgresql servers, one master and two host-stanby
servers.pgsql config:
max_connections = 1000
shared_buffers = 512MBOS kernel.sem
250 32000 32 128During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese
working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has
happened,
all new connection will stay in state authentication.--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
diabloneo@gmail.com writes:
During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has happened,
all new connection will stay in state authentication.
And your authentication setup is ...?
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Content of my pg_hba.conf
local all all peer
host all all 0.0.0.0/0 md5
host replication demon_replicator 0.0.0.0/0 md5
diabloneo
On Sun, Mar 12, 2017 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
diabloneo@gmail.com writes:
During the daytime, there are about 300 postgres processes running, work
as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoeseworking
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this hashappened,
all new connection will stay in state authentication.
And your authentication setup is ...?
regards, tom lane
Neo Liu <diabloneo@gmail.com> writes:
Content of my pg_hba.conf
local all all peer
host all all 0.0.0.0/0 md5
host replication demon_replicator 0.0.0.0/0 md5
Which of those would the stuck processes be using?
Also, can you attach to a few of the stuck processes and get stack traces?
https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Hi,
On 2017-03-12 04:30:52 +0000, Neo Liu wrote:
*The pstack screenshot*
It shows two process's stack, one is pid 186397 who were in authentication,
and the other is 188832 who where in INSERT.
This suggest you're having quite massive contention around
ProcArrayLock. You should consider updating to 9.6. Several releases
since 9.2 considerably improved scalability around this (especially
9.6).
Regards,
Andres
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 13, 2017 at 5:03 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-03-12 04:30:52 +0000, Neo Liu wrote:
*The pstack screenshot*
It shows two process's stack, one is pid 186397 who were inauthentication,
and the other is 188832 who where in INSERT.
This suggest you're having quite massive contention around
ProcArrayLock. You should consider updating to 9.6. Several releases
since 9.2 considerably improved scalability around this (especially
9.6).
Regards,
Andres
Thanks, Andres
I think upgrading to newest version is a good way, but I can't perform it
on a production system before we doing sufficient testing.
Currently, I want to know if this is a bug of pgsql 9.2.16, and how can I
avoid this situation.
Thanks, diabloneo
On Mon, Mar 13, 2017 at 12:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Which of those would the stuck processes be using?
Also, can you attach to a few of the stuck processes and get stack traces?
https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
regards, tom lane
Hi, Tome
I can't get stack traces for you now. We already add a cron job to restart
the pgsql server every morning to solve the problem temporarily. It's a
production system in client's environment, I can't login into the system
now.
I hope the pstack output in early message can help you.
Thanks, diabloneo