BUG #14588: Postgres process blocked on semop

Started by Neo Liuabout 9 years ago8 messagesbugs
Jump to latest
#1Neo Liu
diabloneo@gmail.com

The following bug has been logged on the website:

Bug reference: 14588
Logged by: Chenhong Liu
Email address: diabloneo@gmail.com
PostgreSQL version: 9.2.16
Operating system: CentOS 7.2 1511
Description:

We run a distributed application which storing data in pgsql. This
application is designed to do some route works during 00:00 to 6:00 every
day. I found that, postgres processes are all blocked on semop function,
which means they are waiting for sysv semaphores. I'm not sure if this is a
bug or we misconfigured pgsql and OS.

Techinque information:

30 hosts running the application.
3 hosts running the postgresql servers, one master and two host-stanby
servers.

pgsql config:
max_connections = 1000
shared_buffers = 512MB

OS kernel.sem
250 32000 32 128

During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has happened,
all new connection will stay in state authentication.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#2Neo Liu
diabloneo@gmail.com
In reply to: Neo Liu (#1)
Re: BUG #14588: Postgres process blocked on semop

I add some log I collected for this problem. My them be helpful.

The attachements are two logs and one picture.

*sds_20170311_1124.bug.log*
It's log I collected every 5 minutes, it contains postgres processes list,
ipcs -s output and ipcs -s -u output. I delete most of the log, just keep
necessary pieces.

*ipcs.log *
It's the output of every semaphores set used by postgresql displayed with
ipcs -s -i command, you can find out many process were waited for a
semapthore. The number of lines which ncount == 1 is 469

*The pstack screenshot*
It shows two process's stack, one is pid 186397 who were in authentication,
and the other is 188832 who where in INSERT.

Forgot to mention, I can only recovery from this situation by restart
postgresql service. If I killed any process blocked with *kill -9 * command,
the service will restart.

On Sun, Mar 12, 2017 at 11:57 AM <diabloneo@gmail.com> wrote:

Show quoted text

The following bug has been logged on the website:

Bug reference: 14588
Logged by: Chenhong Liu
Email address: diabloneo@gmail.com
PostgreSQL version: 9.2.16
Operating system: CentOS 7.2 1511
Description:

We run a distributed application which storing data in pgsql. This
application is designed to do some route works during 00:00 to 6:00 every
day. I found that, postgres processes are all blocked on semop function,
which means they are waiting for sysv semaphores. I'm not sure if this is a
bug or we misconfigured pgsql and OS.

Techinque information:

30 hosts running the application.
3 hosts running the postgresql servers, one master and two host-stanby
servers.

pgsql config:
max_connections = 1000
shared_buffers = 512MB

OS kernel.sem
250 32000 32 128

During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese
working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has
happened,
all new connection will stay in state authentication.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Attachments:

sds_20170311_1124.bug.logtext/x-log; charset=US-ASCII; name=sds_20170311_1124.bug.logDownload
ipcs.logtext/x-log; charset=US-ASCII; name=ipcs.logDownload
pstack Screenshot from 2017-03-10 10-49-54.pngimage/png; name="pstack Screenshot from 2017-03-10 10-49-54.png"Download
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neo Liu (#1)
Re: BUG #14588: Postgres process blocked on semop

diabloneo@gmail.com writes:

During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has happened,
all new connection will stay in state authentication.

And your authentication setup is ...?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#4Neo Liu
diabloneo@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #14588: Postgres process blocked on semop

Content of my pg_hba.conf

local all all peer
host all all 0.0.0.0/0 md5
host replication demon_replicator 0.0.0.0/0 md5

diabloneo

On Sun, Mar 12, 2017 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

diabloneo@gmail.com writes:

During the daytime, there are about 300 postgres processes running, work

as

expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese

working

processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has

happened,

all new connection will stay in state authentication.

And your authentication setup is ...?

regards, tom lane

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Neo Liu (#4)
Re: BUG #14588: Postgres process blocked on semop

Neo Liu <diabloneo@gmail.com> writes:

Content of my pg_hba.conf

local all all peer
host all all 0.0.0.0/0 md5
host replication demon_replicator 0.0.0.0/0 md5

Which of those would the stuck processes be using?

Also, can you attach to a few of the stuck processes and get stack traces?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#6Andres Freund
andres@anarazel.de
In reply to: Neo Liu (#2)
Re: BUG #14588: Postgres process blocked on semop

Hi,

On 2017-03-12 04:30:52 +0000, Neo Liu wrote:

*The pstack screenshot*
It shows two process's stack, one is pid 186397 who were in authentication,
and the other is 188832 who where in INSERT.

This suggest you're having quite massive contention around
ProcArrayLock. You should consider updating to 9.6. Several releases
since 9.2 considerably improved scalability around this (especially
9.6).

Regards,

Andres

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

#7Neo Liu
diabloneo@gmail.com
In reply to: Andres Freund (#6)
Re: BUG #14588: Postgres process blocked on semop

On Mon, Mar 13, 2017 at 5:03 AM Andres Freund <andres@anarazel.de> wrote:

Hi,

On 2017-03-12 04:30:52 +0000, Neo Liu wrote:

*The pstack screenshot*
It shows two process's stack, one is pid 186397 who were in

authentication,

and the other is 188832 who where in INSERT.

This suggest you're having quite massive contention around
ProcArrayLock. You should consider updating to 9.6. Several releases
since 9.2 considerably improved scalability around this (especially
9.6).

Regards,

Andres

Thanks, Andres

I think upgrading to newest version is a good way, but I can't perform it
on a production system before we doing sufficient testing.

Currently, I want to know if this is a bug of pgsql 9.2.16, and how can I
avoid this situation.

Thanks, diabloneo

#8Neo Liu
diabloneo@gmail.com
In reply to: Tom Lane (#5)
Re: BUG #14588: Postgres process blocked on semop

On Mon, Mar 13, 2017 at 12:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Which of those would the stuck processes be using?

Also, can you attach to a few of the stuck processes and get stack traces?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane

Hi, Tome

I can't get stack traces for you now. We already add a cron job to restart
the pgsql server every morning to solve the problem temporarily. It's a
production system in client's environment, I can't login into the system
now.

I hope the pstack output in early message can help you.

Thanks, diabloneo