Preventing OOM kills

Started by Yang Zhangalmost 15 years ago7 messagesgeneral
Jump to latest
#1Yang Zhang
yanghatespam@gmail.com

PG tends to be picked on by the Linux OOM killer, so lately we've been
forcing the OOM killer to kill other processes first with this script:

while true; do
for i in `pgrep postgres`; do
echo -17 > /proc/$i/oom_adj
done
sleep 60
done

Is there a Better Way? Thanks in advance.

#2Andrej Ricnik-Bay
andrej.groups@gmail.com
In reply to: Yang Zhang (#1)
Re: Preventing OOM kills

On 25 May 2011 12:32, Yang Zhang <yanghatespam@gmail.com> wrote:

PG tends to be picked on by the Linux OOM killer, so lately we've been
forcing the OOM killer to kill other processes first with this script:

while true; do
 for i in `pgrep postgres`; do
   echo -17 > /proc/$i/oom_adj
 done
 sleep 60
done

Is there a Better Way?  Thanks in advance.

Add more RAM? Look at tunables for other processes on
the machine? At the end of the day making the kernel shoot
anything out of despair shouldn't be the done thing.

Cheers,
Andrej

#3Scott Marlowe
scott.marlowe@gmail.com
In reply to: Andrej Ricnik-Bay (#2)
Re: Preventing OOM kills

On Tue, May 24, 2011 at 6:50 PM, Andrej <andrej.groups@gmail.com> wrote:

On 25 May 2011 12:32, Yang Zhang <yanghatespam@gmail.com> wrote:

PG tends to be picked on by the Linux OOM killer, so lately we've been
forcing the OOM killer to kill other processes first with this script:

while true; do
 for i in `pgrep postgres`; do
   echo -17 > /proc/$i/oom_adj
 done
 sleep 60
done

Is there a Better Way?  Thanks in advance.

Add more RAM?  Look at tunables for other processes on
the machine?  At the end of the day making the kernel shoot
anything out of despair shouldn't be the done thing.

I thought that setting vm.overcommit_memory=2 stopped the OOM killer.

#4Devrim GÜNDÜZ
devrim@gunduz.org
In reply to: Yang Zhang (#1)
Re: Preventing OOM kills

On Tue, 2011-05-24 at 17:32 -0700, Yang Zhang wrote:

PG tends to be picked on by the Linux OOM killer, so lately we've been
forcing the OOM killer to kill other processes first with this script:

while true; do
for i in `pgrep postgres`; do
echo -17 > /proc/$i/oom_adj
done
sleep 60
done

Is there a Better Way? Thanks in advance.

Why don't you start postmaster with this value? Here is what we do in
RPM init scripts.

PG_OOM_ADJ=-17
test x"$PG_OOM_ADJ" != x && echo "$PG_OOM_ADJ" > /proc/self/oom_adj
$SU -l postgres -c "$PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} &" >> "$PGLOG" 2>&1 < /dev/null

Regards,
--
Devrim GÜNDÜZ
Principal Systems Engineer @ EnterpriseDB: http://www.enterprisedb.com
PostgreSQL Danışmanı/Consultant, Red Hat Certified Engineer
Community: devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org Twitter: http://twitter.com/devrimgunduz

#5John R Pierce
pierce@hogranch.com
In reply to: Andrej Ricnik-Bay (#2)
Re: Preventing OOM kills

On 05/24/11 5:50 PM, Andrej wrote:

Add more RAM? Look at tunables for other processes on
the machine? At the end of the day making the kernel shoot
anything out of despair shouldn't be the done thing.

somehow, 'real' unix has neither a OOMkiller nor does it flat out die
under heavy loads, it just degrades gracefully. I've seen Solaris and
AIX and BSD servers happily chugging along with load factors in the
100s, significant portions of memory paging, etc, without completely
crumbling to a halt. Soimetimes I wonder why Linux even pretends to
support virtual memory, as you sure don't want it to be paging.

--
john r pierce N 37, W 123
santa cruz ca mid-left coast

#6Scott Marlowe
scott.marlowe@gmail.com
In reply to: John R Pierce (#5)
Re: Preventing OOM kills

On Tue, May 24, 2011 at 7:01 PM, John R Pierce <pierce@hogranch.com> wrote:

On 05/24/11 5:50 PM, Andrej wrote:

Add more RAM?  Look at tunables for other processes on
the machine?  At the end of the day making the kernel shoot
anything out of despair shouldn't be the done thing.

somehow, 'real' unix has neither a OOMkiller nor does it flat out die under
heavy loads, it just degrades gracefully.  I've seen Solaris and AIX and BSD
servers happily chugging along with load factors in the 100s, significant
portions of memory paging, etc, without completely crumbling to a halt.
 Soimetimes I wonder why Linux even pretends to support virtual memory, as
you sure don't want it to be paging.

I've found that on servers with multiple drives and the page file
spread across them linux does pretty well when swapping out. Even
going pretty far back, when I had 6 9G SCSI drives on an old Sparc 20
running RHEL with 256M ram the swapping was quite speedy with a 100M
or so on each drive.

#7Marco Colombo
pgsql@esiway.net
In reply to: John R Pierce (#5)
Re: Preventing OOM kills

On 05/25/2011 03:01 AM, John R Pierce wrote:

On 05/24/11 5:50 PM, Andrej wrote:

Add more RAM? Look at tunables for other processes on
the machine? At the end of the day making the kernel shoot
anything out of despair shouldn't be the done thing.

somehow, 'real' unix has neither a OOMkiller nor does it flat out die
under heavy loads, it just degrades gracefully. I've seen Solaris and
AIX and BSD servers happily chugging along with load factors in the
100s, significant portions of memory paging, etc, without completely
crumbling to a halt. Soimetimes I wonder why Linux even pretends to
support virtual memory, as you sure don't want it to be paging.

http://developers.sun.com/solaris/articles/subprocess/subprocess.html

"Some operating systems (such as Linux, IBM AIX, and HP-UX) have a
feature called memory overcommit (also known as lazy swap allocation).
In a memory overcommit mode, malloc() does not reserve swap space and
always returns a non-NULL pointer, regardless of whether there is enough
VM on the system to support it or not.

The memory overcommit feature has advantages and disadvantages."

(the page goes on with some interesting info) [*]

It appears by your definition that neither Linux, AIX nor HP-UX are
'real' Unix. Oh, wait, FreeBSD overcommits, too, so can't be 'real' either.

/me wonders now what a 'real' Unix is. :) Must be something related with
'true' SysV derivatives. If memory serves me well, that's where the word
'thrashing' originated, right? Actually in my experience nothing
'thrashes' better than a SysV, Solaris included.

The solution for the OP problem is to keep the system from reaching OOM
state in the first place. That is necessary even with overcommitting
turned off. PG not performing its job because malloc() keeps failing
isn't really a "solution".

.TM.

[*] One missing piece is that overcommitting actually prevents or delays
OOM state. The article does mention "system memory can be used more
flexibly and efficiently" w/o really elaborating further. It means that,
given the same amount of memory (RAM+swap), a non overcommitting system
reaches OOM way before than a overcommitting one. Also it is rarely a
good idea, when running low on memory, to switch to an allocation policy
that is _less_ efficient, memory wise.