Understand this error

Started by paulo matadralmost 17 years ago5 messagesgeneral
Jump to latest
#1paulo matadr
saddoness@yahoo.com.br

Hi all,
my database entry in mode recovery,
analyzing my pg_log I seem this:

system logger process (PID 6517) was terminated by signal 9
background writer process (PID 6519) was terminated by signal 9
terminating any other active server processes

and OS in var/logs:

kernel: [<ffffffff800ba475>] out_of_memory+0x53/0x267
kernel: [<ffffffff8000f012>] __alloc_pages+0x229/0x2b2
kernel: [<ffffffff80031f4b>] read_swap_cache_async+0x45/0xd8
kernel: [<ffffffff800bf60c>] swapin_readahead+0x60/0xd3
kernel: [<ffffffff80008f3a>] __handle_mm_fault+0x952/0xdf2
kernel: [<ffffffff800127fd>] sock_def_readable+0x34/0x5f
kernel: [<ffffffff80251481>] unix_dgram_sendmsg+0x43d/0x4cf
kernel: [<ffffffff800645a5>] do_page_fault+0x4b8/0x81d
kernel: [<ffffffff80037264>] do_sock_write+0xc4/0xce
kernel: [<ffffffff8008630f>] dequeue_task+0x18/0x37
kernel: [<ffffffff80060ab8>] thread_return+0x0/0xea
kernel: [<ffffffff8005be1d>] error_exit+0x0/0x84
kernel: [<ffffffff8008bfb3>] do_syslog+0x173/0x3ae
kernel: [<ffffffff8008bf81>] do_syslog+0x141/0x3ae
kernel: [<ffffffff8009b666>] autoremove_wake_function+0x0/0x2e
kernel: [<ffffffff800f65fd>] kmsg_read+0x3a/0x44
kernel: [<ffffffff8000b212>] vfs_read+0xcb/0x171
kernel: [<ffffffff8001145c>] sys_read+0x45/0x6e
kernel: [<ffffffff8005b2c1>] tracesys+0xd1/0xdc

kernel: Free swap = 0kB
kernel: Total swap = 2031608kB
kernel: Free swap: 0kB
kernel: 4390912 pages of RAM
kernel: 280785 reserved pages
kernel: 10222 pages shared
kernel: 4 pages swap cached
kernel: Out of memory: Killed process 6519 (postmaster).

How prenvent postgres use all memory of system?Why this happen?

Thanks for all
Paulo

Veja quais são os assuntos do momento no Yahoo! +Buscados
http://br.maisbuscados.yahoo.com

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: paulo matadr (#1)
Re: Understand this error

On Thu, Apr 30, 2009 at 7:00 AM, paulo matadr <saddoness@yahoo.com.br> wrote:

Hi all,
my database entry in mode recovery,
analyzing my pg_log I seem this:
system logger process (PID 6517) was terminated by signal 9
background writer process (PID 6519) was terminated by signal 9
terminating any other active server processes

Yeah, you're getting bitten by the OOM killer. What changes, if any,
have you made to the postgresql.conf file?

#3Craig Ringer
craig@2ndquadrant.com
In reply to: paulo matadr (#1)
Re: Understand this error

paulo matadr wrote:

Hi all,
my database entry in mode recovery,
analyzing my pg_log I seem this:

system logger process (PID 6517) was terminated by signal 9
background writer process (PID 6519) was terminated by signal 9
terminating any other active server processes

You haven't told us what OS you are on. Based on the log below, though,
it looks like Linux.

`kill -l' on Linux tells us that signal 9 is SIGKILL, a hard kill. That
should only happen if (a) you send it with `kill -9' or `kill -KILL' or
(b) the machine runs out of memory while in overcommit mode (the
default) and the OOM killer picks PostgreSQL as the process to terminate
to free memory.

You should NOT have your server in overcommit mode if you are running
PostgreSQL. See, in the PostgreSQL manual:

http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN22235

kernel: [<ffffffff800ba475>] out_of_memory+0x53/0x267

[snip]

kernel: Out of memory: Killed process 6519 (postmaster).

How prenvent postgres use all memory of system?Why this happen?

Read the link in the PostgreSQL manual, above.

Note that it's not very likely that PostgreSQL was the process that used
up all your memory. It was just unlucky enough to be picked as the one
to be killed, because the OOM killer is terrible at estimating which
process is using the most memory when programs like PostgreSQL have
allocated large blocks of shared memory.

--
Craig Ringer

#4Dennis Brakhane
brakhane@googlemail.com
In reply to: paulo matadr (#1)
Re: Understand this error

On Thu, Apr 30, 2009 at 3:00 PM, paulo matadr <saddoness@yahoo.com.br> wrote:

Hi all,
my database entry in mode recovery,
analyzing my pg_log I seem this:
system logger process (PID 6517) was terminated by signal 9
background writer process (PID 6519) was terminated by signal 9
terminating any other active server processes

You are bitten by the OOM-killer. It can lead to severy data loss if it decides
to kill the postmaster. To avoid this, you should always set overcommit_memory
to 2 (which means off). See Section 17.4.3. here:

http://www.postgresql.org/docs/8.3/interactive/kernel-resources.html

You should *never* run a production database server in overcommit_memory mode!

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#3)
Re: Understand this error

Craig Ringer <craig@postnewspapers.com.au> writes:

Note that it's not very likely that PostgreSQL was the process that used
up all your memory. It was just unlucky enough to be picked as the one
to be killed, because the OOM killer is terrible at estimating which
process is using the most memory when programs like PostgreSQL have
allocated large blocks of shared memory.

It's worse than that: the OOM killer is broken by design, because it
intentionally picks on processes that have a lot of large children
--- without reference to the fact that a lot of the "largeness" might
be the same shared memory block.  So the postmaster process very often
looks like a good target to it, even though killing the postmaster will
in fact free a negligible amount of memory.

regards, tom lane