BUG #18469: OOM occurs and backend processes are kept in Zombie state.

Started by PG Bug reporting formabout 2 years ago2 messagesbugs

noreply@postgresql.org

about 2 years ago

The following bug has been logged on the website:

Bug reference: 18469
Logged by: song yutao
Email address: 2986538596@qq.com
PostgreSQL version: 12.16
Operating system: Linux
Description:

I was performing a lot of operations on a server deployed with postgresql
12.16. As heavy operations performed continuously. memory consumption has
been increased, the OS eventually got OOM and some background connection
processes that were taking up too much memory were killed. However, these
processes were not successfully killed and remained in Zombie state. In the
meantime, the whole database process seems to be stuck and time out happened
while connect via psql.

Below is the status after OOM happened:
Ruby 7822 0.0 0.6 4485088 110940 の May06 10:24 /usr/pgsql/bin/postmaster -D
/var/lib/pgsql/data
Ruby 7874 0.3 0.0 o o sZ May06 33:30 [postmaster] <defunct>
Ruby 7893 0.0 0.0 。。 sz May06 3:34 [postmaster] <defunct>
Ruby 7919 0.0 0.0 70592 4344 Ss May06 3:27 postgres: stats collector
Ruby 9061 0.0 0.1 4485000 17836 ? Ss May06 3:19 postgres: walwriter
Ruby 9062 0.0 0.0 4486544 2428 ? ss May06 0:03 postgres:
autovacuum launcher
Ruby 9063 0.0 0.0 66364 992 ? ss May06 1:27 postgres: archivers last was
00000002000002C5000000FB
Ruby 9064 0.0 0.0 4486384 3280 ? sS May06 00:0 postgres: logical replication
launcher
Ruby 14403 0.1 0.0 4487084 3788 ? Ss May06 18:53 postgres: walsender rdsRepl
192.168.13.78(43284) strean
Ruby Ruby 2170474 2170401 0.0 0.0 0.0 0.0 May11 0:05 0:05 [postmaster]
<defunct> [postmaster] <defunct>

I would like to know if the postmaster process is stuck because of the
process Zombie state.

Tom Lane

tgl@sss.pgh.pa.us

about 2 years ago

In reply to: PG Bug reporting form (#1)

Re: BUG #18469: OOM occurs and backend processes are kept in Zombie state.

PG Bug reporting form <noreply@postgresql.org> writes:

I was performing a lot of operations on a server deployed with postgresql
12.16. As heavy operations performed continuously. memory consumption has
been increased, the OS eventually got OOM and some background connection
processes that were taking up too much memory were killed. However, these
processes were not successfully killed and remained in Zombie state. In the
meantime, the whole database process seems to be stuck and time out happened
while connect via psql.

It sounds to me like the OOM killer decided to kill the postmaster
process, rather than the child process(es) that were actually eating
memory. That's *extremely* unhelpful behavior. There is some advice
in our manual about configuring your system to not do that.

Below is the status after OOM happened:
Ruby 7822 0.0 0.6 4485088 110940 の May06 10:24 /usr/pgsql/bin/postmaster -D /var/lib/pgsql/data

It's not clear to me where this postmaster process came from,
but it appears to be younger than the other postgres-related
processes you're showing, so they are not its children.

I'd manually nuke all of these processes and start fresh.

regards, tom lane