Really out of memory?

Started by Benalmost 17 years ago12 messagesgeneral
Jump to latest
#1Ben
bench@silentmedia.com

I have a linux postgres server in the field. Its version is:

PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)

(aka postgresql-8.2.4-1PGDG)

A few days ago, its log started showing this:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212.
May 31 03:02:40 sfmelwss postgres[31490]: [1-1] ERROR: out of memory
May 31 03:02:40 sfmelwss postgres[31490]: [1-2] DETAIL: Failed on request of size 16777212.
May 31 03:05:40 sfmelwss postgres[31913]: [1-1] ERROR: out of memory
May 31 03:05:40 sfmelwss postgres[31913]: [1-2] DETAIL: Failed on request of size 16777212.

That seems pretty self-explainitory. But I'm not so sure, because SAR
says:

02:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
02:40:01 AM 13332 1003316 98.69 130448 198188 1034572 13996 1.33 32
02:50:01 AM 17116 999532 98.32 128708 196384 1034596 13972 1.33 44
03:00:01 AM 16372 1000276 98.39 129128 196388 1034596 13972 1.33 44
03:10:01 AM 17220 999428 98.31 128268 196828 1034736 13832 1.32 132
03:20:01 AM 14416 1002232 98.58 130464 197348 1035224 13344 1.27 152
03:30:01 AM 16292 1000356 98.40 127604 196684 1035700 12868 1.23 168

...which indicates there was still plenty of space left in swap. Now, I
realize I don't want to be actually using my swap, but I'm wondering if
the out of memory messages are a red herring. Should I be looking at
something else, like the number of processes, open files, or shared memory
segments?

FWIW, I have disabled the OOM killer (but not, as I understand it, my
swap space) by setting:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Ben (#1)
Re: Really out of memory?

On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote:

I have a linux postgres server in the field. Its version is:

PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)

(aka postgresql-8.2.4-1PGDG)

A few days ago, its log started showing this:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212.

Add even more swap. By turning overcommit off you make the kernel
really pessimistic about how much memory is in use.

...which indicates there was still plenty of space left in swap. Now, I
realize I don't want to be actually using my swap, but I'm wondering if
the out of memory messages are a red herring. Should I be looking at
something else, like the number of processes, open files, or shared
memory segments?

You got as much swap as memory, try doubling it.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#3John R Pierce
pierce@hogranch.com
In reply to: Ben (#1)
Re: Really out of memory?

Ben Chobot wrote:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on
request of size 16777212.

Thats a 16MB request is that your work_mem size or something by any chance?

02:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached
kbswpfree kbswpused %swpused kbswpcad
02:40:01 AM 13332 1003316 98.69 130448 198188
1034572 13996 1.33 32

so you only have 13MB memory free. you have -do- have free swap, however.

hey, is any ULIMIT in effect for the postgres process?

#4Ben
bench@silentmedia.com
In reply to: Martijn van Oosterhout (#2)
Re: Really out of memory?

On Tue, 2 Jun 2009, Martijn van Oosterhout wrote:

On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212.

Add even more swap. By turning overcommit off you make the kernel
really pessimistic about how much memory is in use.

Is it so pessimistic that it won't try to swap out 16MB into almost 1GB of
free swap? That seems surprising to me.

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ben (#1)
Re: Really out of memory?

Ben Chobot <bench@silentmedia.com> writes:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212.

So the kernel isn't letting PG have any more memory.

That seems pretty self-explainitory. But I'm not so sure, because SAR
says:
...
...which indicates there was still plenty of space left in swap.

Which the kernel isn't letting us use. Check the "ulimit" settings
that the postmaster is being started with. On a Linux box, any of
the -d -m or -v settings might cause this.

It's possible you are running out of 32-bit address space in the backend
process, but what seems more likely is that the per-process ulimit is
unreasonably small.

regards, tom lane

#6Ben
bench@silentmedia.com
In reply to: John R Pierce (#3)
Re: Really out of memory?

On Tue, 2 Jun 2009, John R Pierce wrote:

Ben Chobot wrote:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212.

Thats a 16MB request is that your work_mem size or something by any chance?

work_mem is 1MB, but maintenance_work_mem is 16MB. So it's probably
autovacuum kicking off most of these messages.

02:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree
kbswpused %swpused kbswpcad
02:40:01 AM 13332 1003316 98.69 130448 198188 1034572
13996 1.33 32

so you only have 13MB memory free. you have -do- have free swap, however.

hey, is any ULIMIT in effect for the postgres process?

Not that I can tell. There's nothing special in /etc/init.d/postgresql or
/etc/sysconfig/pgsql/postgresql, and ulimit -a shows:

core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 16127
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
max rt priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 16127
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Is there a way to see what the limits are for a given pid? I don't see
anything obviously relevant in /proc/<pid>/....

#7Ben
bench@silentmedia.com
In reply to: Tom Lane (#5)
Re: Really out of memory?

On Tue, 2 Jun 2009, Tom Lane wrote:

It's possible you are running out of 32-bit address space in the backend
process, but what seems more likely is that the per-process ulimit is
unreasonably small.

I only have 1GB in the machine, and another 1GB of swap, so running out of
32-bit address space seems unlikely. Is there any way to rule it out?

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ben (#6)
Re: Really out of memory?

Ben Chobot <bench@silentmedia.com> writes:

hey, is any ULIMIT in effect for the postgres process?

Not that I can tell. There's nothing special in /etc/init.d/postgresql or
/etc/sysconfig/pgsql/postgresql, and ulimit -a shows:

That tells you the limits for your interactive shell, but a daemon might
be started under some other set of limits.

Is there a way to see what the limits are for a given pid? I don't see
anything obviously relevant in /proc/<pid>/....

You don't have /proc/<pid>/limits ?

regards, tom lane

#9Ben
bench@silentmedia.com
In reply to: Tom Lane (#8)
Re: Really out of memory?

On Tue, 2 Jun 2009, Tom Lane wrote:

Is there a way to see what the limits are for a given pid? I don't see
anything obviously relevant in /proc/<pid>/....

You don't have /proc/<pid>/limits ?

Nope. I'd like to believe I would consider that "obviously relevant." :)

This server is running 2.6.20-1.2962.fc6, but should be upgraded to
2.6.26.8-57.fc8 in a month or two, which does provide that file. I was
hoping to not have to wait till then to understand what's going wrong
though.

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ben (#9)
Re: Really out of memory?

Ben Chobot <bench@silentmedia.com> writes:

On Tue, 2 Jun 2009, Tom Lane wrote:

You don't have /proc/<pid>/limits ?

Nope. I'd like to believe I would consider that "obviously relevant." :)

Next best thing I can think of is to stick "ulimit -a >/tmp/mylimits"
into the postgres initscript and restart.

If the initscript is starting postgres via "su -l", it might be better
to add the command in postgres' ~/.bashrc or some such place. You
have to consider the possibility that the su is changing the ulimit
environment.

regards, tom lane

#11Martijn van Oosterhout
kleptog@svana.org
In reply to: Ben (#4)
Re: Really out of memory?

On Tue, Jun 02, 2009 at 11:45:11AM -0700, Ben Chobot wrote:

On Tue, 2 Jun 2009, Martijn van Oosterhout wrote:

On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212.

Add even more swap. By turning overcommit off you make the kernel
really pessimistic about how much memory is in use.

Is it so pessimistic that it won't try to swap out 16MB into almost 1GB
of free swap? That seems surprising to me.

It's got nothing to do with how much swap is in use. It's preventing
you from allocating memory that *hypothetically* might not be available
if every byte of allocated memory were actually used.

For example, on my desktop I have 1GB of RAM of which about 600MB is
free, yet there is 1.4GB committed. With overcommit off my machine
may not boot. As you can see, only 25% of committed memory is actually
needed, because lots of pages are blank or shared. Ofcourse, all those
copies of libc are realistically never not going to be shared so it's a
good bet.

But with overcommit off you can see that you might want to have double
or triple the amount of swap to handle the hypothetical case.

I'm not saying this is necessarily the case for you, but it's the first
thing that came to mind and relatively easy to check.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Please line up in a tree and maintain the heap invariant while
boarding. Thank you for flying nlogn airlines.

#12Ben
bench@silentmedia.com
In reply to: Martijn van Oosterhout (#11)
Re: Really out of memory?

On Tue, 2 Jun 2009, Martijn van Oosterhout wrote:

It's got nothing to do with how much swap is in use. It's preventing
you from allocating memory that *hypothetically* might not be available
if every byte of allocated memory were actually used.

For example, on my desktop I have 1GB of RAM of which about 600MB is
free, yet there is 1.4GB committed. With overcommit off my machine
may not boot. As you can see, only 25% of committed memory is actually
needed, because lots of pages are blank or shared. Ofcourse, all those
copies of libc are realistically never not going to be shared so it's a
good bet.

But with overcommit off you can see that you might want to have double
or triple the amount of swap to handle the hypothetical case.

No, sorry, I don't see why I would need more swap when I've disabled
memory overcommit. As I understand it, the kernel should be able to
allocate (swap + (physical * overcommit_ratio)), which in my case is just
swap+physical, and it seems to not want to do that.