Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

Started by Vikas Sharmaabout 7 years ago4 messagesgeneral
Jump to latest
#1Vikas Sharma
shavikas@gmail.com

Hello All,

I have a 4 node PostgreSQL 9.6 cluster with streaming replication. we
encounter today the Out of Memory Error on the Master which resulted in
All postres processes restarted and cluster recovered itself. Please let
me know the best way to diagnose this issue.

The error seen in the postgresql log:

2019-02-12 10:55:17 GMT LOG: terminating any other active server processes
2019-02-12 10:55:17 GMT WARNING: terminating connection because of crash
of another server process
2019-02-12 10:55:17 GMT DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2019-02-12 10:55:17 GMT HINT: In a moment you should be able to reconnect
to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING: terminating connection because of crash
of another server process
2019-02-12 10:55:17 GMT DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2019-02-12 10:55:17 GMT HINT: In a moment you should be able to reconnect
to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING: terminating connection because of crash
of another server process
-----

Error from dmesg on linux:
-----------------------------------
[4331093.885622] Out of memory: Kill process nnnnn (postmaster) score nn or
sacrifice child
[4331093.890225] Killed process nnnnn (postmaster) total-vm:18905944kB,
anon-rss:1747460kB, file-rss:4kB, shmem-rss:838220kB

Thanks & Best Regards
Vikas Sharma

#2Adrian Klaver
adrian.klaver@aklaver.com
In reply to: Vikas Sharma (#1)
Re: Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

On 2/12/19 8:20 AM, Vikas Sharma wrote:

Hello All,

I have a 4 node PostgreSQL 9.6 cluster with streaming replication.  we
encounter today the Out of Memory  Error on the Master which resulted in
All postres  processes restarted and cluster recovered itself. Please
let me know the best way to diagnose this issue.

For a start look back further in the Postgres log then the below. What
is shown below is the effects of the OOM killer. What you need to look
for is the statement that caused Postgres memory to increase to the
point that the OOM killer was invoked.

The error seen in the postgresql log:

2019-02-12 10:55:17 GMT LOG:  terminating any other active server processes
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of
crash of another server process
2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of
crash of another server process
2019-02-12 10:55:17 GMT DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2019-02-12 10:55:17 GMT HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING:  terminating connection because of
crash of another server process
-----

Error from dmesg on linux:
-----------------------------------
[4331093.885622] Out of memory: Kill process nnnnn (postmaster) score nn
or sacrifice child
[4331093.890225] Killed process nnnnn (postmaster) total-vm:18905944kB,
anon-rss:1747460kB, file-rss:4kB, shmem-rss:838220kB

Thanks & Best Regards
Vikas Sharma

--
Adrian Klaver
adrian.klaver@aklaver.com

#3Vikas Sharma
shavikas@gmail.com
In reply to: Adrian Klaver (#2)
Re: Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

Thank you Adrian for the reply, I did check the postgres processes running
around the time when OOM was invoked, there were lots of high CPU consuming
postgres processes running long running selects.
I am not sure of how to interpret the memory terms appearing in linux
dmeg or /var/log/messages but I can see out of memory happened and
Postmaster invoked OOM.

Regards
Vikas Sharma

On Tue, 12 Feb 2019 at 16:39, Adrian Klaver <adrian.klaver@aklaver.com>
wrote:

Show quoted text

On 2/12/19 8:20 AM, Vikas Sharma wrote:

Hello All,

I have a 4 node PostgreSQL 9.6 cluster with streaming replication. we
encounter today the Out of Memory Error on the Master which resulted in
All postres processes restarted and cluster recovered itself. Please
let me know the best way to diagnose this issue.

For a start look back further in the Postgres log then the below. What
is shown below is the effects of the OOM killer. What you need to look
for is the statement that caused Postgres memory to increase to the
point that the OOM killer was invoked.

The error seen in the postgresql log:

2019-02-12 10:55:17 GMT LOG: terminating any other active server

processes

2019-02-12 10:55:17 GMT WARNING: terminating connection because of
crash of another server process
2019-02-12 10:55:17 GMT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2019-02-12 10:55:17 GMT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING: terminating connection because of
crash of another server process
2019-02-12 10:55:17 GMT DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2019-02-12 10:55:17 GMT HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2019-02-12 10:55:17 GMT WARNING: terminating connection because of
crash of another server process
-----

Error from dmesg on linux:
-----------------------------------
[4331093.885622] Out of memory: Kill process nnnnn (postmaster) score nn
or sacrifice child
[4331093.890225] Killed process nnnnn (postmaster) total-vm:18905944kB,
anon-rss:1747460kB, file-rss:4kB, shmem-rss:838220kB

Thanks & Best Regards
Vikas Sharma

--
Adrian Klaver
adrian.klaver@aklaver.com

#4Andreas Kretschmer
andreas@a-kretschmer.de
In reply to: Vikas Sharma (#1)
Re: Out of memory: Kill process nnnn (postmaster) score nn or sacrifice child

On 12 February 2019 17:20:09 CET, Vikas Sharma <shavikas@gmail.com> wrote:

Hello All,

I have a 4 node PostgreSQL 9.6 cluster with streaming replication. we
encounter today the Out of Memory Error on the Master which resulted
in
All postres processes restarted and cluster recovered itself. Please
let
me know the best way to diagnose this issue.

typical reason for oom-kill are too high values for work_mem.

Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company