Changes to Linux OOM killer in 2.6.36

Started by Greg Smithabout 15 years ago4 messages
#1Greg Smith
greg@2ndquadrant.com

Last month's new Linux kernel 2.6.36 includes a rewrite of the out of
memory killer:

http://lwn.net/Articles/391222/
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10

The new "badness" method totals the task's RSS and swap as a percentage
of RAM, where the old one scored starting with the total memory used by
the process. I *think* that this is an improvement for PostgreSQL,
based on the sort of data I see with:

ps -o pid,rss,size,vsize,args -C postgres

But I haven't tested with one of the new kernels yet to be sure.
Something to look at next time I get in that bleeding edge kernel kind
of mood.

One thing that's definitely changed is the interface used to control
turning off the OOM killer. There's a backward compatibility
translation right now that maps the current "-17" bit mask value the
PostgreSQL code sends to /proc/<pid>/oom_adj into the new units scale.
However, oom_adj is deprecated, scheduled for removal in August 2010:
http://www.mjmwired.net/kernel/Documentation/feature-removal-schedule.txt

So eventually, if the OOM disabling code is still necessary in
PostgreSQL, it will need to do this sort of thing instead:

echo -1000 > /proc/<pid>/oom_score_adj

I've seen kernel stuff get deprecated before the timeline before for
code related reasons (when the compatibility bits were judged too
obtrusive to keep around anymore), but since this translation bit is
only a few lines of code I wouldn't expect that to happen here.

I don't think it's worth doing anything to the database code until tests
on the newer kernel confirm whether this whole thing is even necessary
anymore. Wanted to pass along the info while I was staring at it
though. Thanks to Daniel Farina for pointing this out.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

#2Kevin Grittner
Kevin.Grittner@wicourts.gov
In reply to: Greg Smith (#1)
Re: Changes to Linux OOM killer in 2.6.36

Greg Smith wrote:

oom_adj is deprecated, scheduled for removal in August 2010:

That surprised me so I checked the URL. I believe you have a typo
there and it's August, 2012.

-Kevin

#3Alex Hunsaker
badalex@gmail.com
In reply to: Greg Smith (#1)
Re: Changes to Linux OOM killer in 2.6.36

On Thu, Nov 18, 2010 at 19:43, Greg Smith <greg@2ndquadrant.com> wrote:

Last month's new Linux kernel 2.6.36 includes a rewrite of the out of memory
killer:
http://lwn.net/Articles/391222/
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10

Yeah, Ive been following this somewhat closely...
Also of interest is the recent thread about reverting the new oom
(don't know if it will happen, but maybe they won't deprecate
oom_adj):
http://lkml.org/lkml/2010/11/14/5

The new "badness" method totals the task's RSS and swap as a percentage of
RAM, where the old one scored starting with the total memory used by the
process.  I *think* that this is an improvement for PostgreSQL, based on the
sort of data I see with:

Well, it seems to be an improvement. If I look at the oom_score on a
2.6.36 box ruining postgres I get:
$ cd /proc; for a in [0-9]*; do echo `cat $a/oom_score` $a `perl
-pes/'\0.*$'// < $a/cmdline`; done|grep -v ^0|sort -n |less
1 1309 supervising syslog-ng
1 1310 /usr/sbin/syslog-ng
1 1336 /usr/sbin/crond
1 1368 /usr/sbin/irqbalance
1 1485 /usr/sbin/ntpd
1 1495 /usr/local/bin/pgbouncer
1 1506 /sbin/agetty
....
1 3391 /var/lib/postgres/pgsql-9.0/bin/postgres
1 3393 postgres: writer process
1 3394 postgres: wal writer process
1 3395 postgres: autovacuum launcher process
1 3396 postgres: stats collector process
1 4110 postgres: joshua wopr [local] idle
2 4109 postgres: joshua wopr [local] idle

So in this case it should kill one of the backends *before* the
postmaster. Ignoring that backend... it looks like postmaster has the
same score as every other process on the system. It also has a has a
higher RSS than most, so I suspect it will still get killed first:
$ ps ax -o rss,pid,size,vsize,args | sort -n
...
2416 1680 588 46548 /usr/lib/postfix/master
2424 1696 640 46748 qmgr -l -t fifo -u
2956 3395 2416 244644 postgres: autovacuum launcher process
3116 2216 720 65464 sshd: alex [priv]
4096 3393 1088 243316 postgres: writer process
6592 4110 2516 246808 postgres: joshua wopr [local] idle
11756 3391 900 243128 /var/lib/postgres/pgsql-9.0/bin/postgres
32640 4109 9084 255564 postgres: joshua wopr [local] idle in transaction

So I think we will still need to protect the postmaster from OOM :(.

One thing that's definitely changed is the interface used to control turning
off the OOM killer.

Grr... Whatever happens to a stable userspace abi?

I don't think it's worth doing anything to the database code until tests on
the newer kernel confirm whether this whole thing is even necessary anymore.

+1

#4Greg Smith
greg@2ndquadrant.com
In reply to: Kevin Grittner (#2)
Re: Changes to Linux OOM killer in 2.6.36

Kevin Grittner wrote:

Greg Smith wrote:

oom_adj is deprecated, scheduled for removal in August 2010:

That surprised me so I checked the URL. I believe you have a typo
there and it's August, 2012.

This is why I include references, so that when the cold medicine hits me
in the middle of proofreading my message and I sent it anyway you aren't
mislead. Yes, 2012, only a few months before doomsday. The aproaching
end of the world then means any bugs left can be marked WONTFIX.

--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us