PostgreSQL 7.3.4 gets killed by SIG_KILL

Started by Magnus Naeslund(t)about 22 years ago6 messages

mag@fbab.net

about 22 years ago

I have this big table running on an old linux install (kernel 2.2.25).
I've COPYed some tcpip logs into a table created as such:

create table ipstats (time timestamp, src inet, dst inet, npackets int8,
nbytes int8);

Big:
select count(*) from ipstats;

count
----------
99173733

When i do two selects some from that table multiple times, the the
backend doing the selects is getting killed by signal 9.

The select pair look like:
select sum(nbytes) from ipstats where dst = '10.10.10.170';
select sum(nbytes) from ipstats where src = '10.10.10.170';

This is what the serverlog says:

LOG: server process (pid 20308) was terminated by signal 9
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing shared memory and
semaphores
FATAL: The database system is starting up
LOG: database system was interrupted at 2003-12-03 23:21:49 CET
FATAL: The database system is starting up
LOG: checkpoint record is at 3/9095BC20
LOG: redo record is at 3/9095BC20; undo record is at 0/0; shutdown TRUE
LOG: next transaction id: 8716399; next oid: 141842933
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: ReadRecord: record with zero length at 3/9095BC60
LOG: redo is not required
LOG: database system is ready

When i attach a gdb to the process it doesn't help, it exits immediatly
anyways.
This i believe is because SIG_KILL is "unstoppable"...

Any ideas as of what to do?

Regards
Magnus

Doug McNaught

doug@mcnaught.org

about 22 years ago

In reply to: Magnus Naeslund(t) (#1)

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

"Magnus Naeslund(t)" <mag@fbab.net> writes:

I have this big table running on an old linux install (kernel 2.2.25).
I've COPYed some tcpip logs into a table created as such:

Linux is probably killing your process because it (the kernel) is low
on memory. Unfortunately, this happens more often with older versions
of the kernel. Add more RAM/swap or figure out how to make your query
use less memory...

-Doug

Magnus Naeslund(t)

mag@fbab.net

about 22 years ago

In reply to: Doug McNaught (#2)

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

Doug McNaught wrote:

"Magnus Naeslund(t)" <mag@fbab.net> writes:

I have this big table running on an old linux install (kernel 2.2.25).
I've COPYed some tcpip logs into a table created as such:

Linux is probably killing your process because it (the kernel) is low
on memory. Unfortunately, this happens more often with older versions
of the kernel. Add more RAM/swap or figure out how to make your query
use less memory...

-Doug

Well this just isn't the case.
There is no printout in kernel logs/dmesg (as it would be if the kernel
killed it in an OOM situation).
I have 1 GB of RAM, and 1.5 GB of swap (swap never touched).

When running the query i have about 850 MB sitting in kernel cache, the
postgres process takes about 40MB of memory, and the ipcs -m command
shows that postgresql is taking 41508864 bytes of shared memory.

There is no sorting or index lookups going on, the query is simple.
I just had an power outage, i'll check if it maybe wised up after reboot
or something, but i doubt it.

Is it possible to somehow find out what process sent the KILL (or if
it's the kernel) ?

I find this very weird to say the least...

Magnus

Jeff

threshar@torgo.978.org

about 22 years ago

In reply to: Magnus Naeslund(t) (#3)

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

On Thu, 04 Dec 2003 03:35:49 +0100
"Magnus Naeslund(t)" <mag@fbab.net> wrote:

Well this just isn't the case.
There is no printout in kernel logs/dmesg (as it would be if the
kernel killed it in an OOM situation).
I have 1 GB of RAM, and 1.5 GB of swap (swap never touched).

Do you have any system monitoring scripts that may be killing it as it
may look like a "runaway" process?

We've had this happen to us before. You tend to forget about things like
that.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/

Doug McNaught

doug@mcnaught.org

about 22 years ago

In reply to: Magnus Naeslund(t) (#3)

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL

"Magnus Naeslund(t)" <mag@fbab.net> writes:

Doug McNaught wrote:

Linux is probably killing your process because it (the kernel) is low
on memory. Unfortunately, this happens more often with older versions
of the kernel. Add more RAM/swap or figure out how to make your query
use less memory...
-Doug

Well this just isn't the case.
There is no printout in kernel logs/dmesg (as it would be if the
kernel killed it in an OOM situation).
I have 1 GB of RAM, and 1.5 GB of swap (swap never touched).

Ahh, that's an additional piece of information hat you didn't supply
earlier. ;)

Though your system memory is ample, is it possible that you're hitting
a ulimit() on the stack size or heap size or something? I'm not sure
what signal you'd get in such a case, though.

Is it possible to somehow find out what process sent the KILL (or if
it's the kernel) ?

Not that I know of, unless it's in a logfile somewhere. You could try
strace(8) on the backend running the query--that might give you some
more info.

I find this very weird to say the least...

Yah. You might also consider running a more recent kernel, especially
with such a big machine. 2.2.X never did play that well with large
amounts of RAM...

-Doug

Magnus Naeslund(t)

mag@fbab.net

about 22 years ago

In reply to: Jeff (#4)

Re: PostgreSQL 7.3.4 gets killed by SIG_KILL [SOLVED]

Jeff wrote:

Do you have any system monitoring scripts that may be killing it as it
may look like a "runaway" process?

We've had this happen to us before. You tend to forget about things like
that.

This got me thinking, and i rechecked all possibilities.
It turned out that we changed rlimit policies earlier and the "default"
cpu time limits bleeded over to postgres since it didn't have a negating
entry in the pam limits control.
Since the startup scripts use "su - postgres -c cmd" it "logged in" and
so got the now default cpu time values.

So it was only a mindbug, and thats good :)

Magnus