Trying to understand postgres crash

Started by Hanns Hartmanover 14 years ago7 messagesgeneral
Jump to latest
#1Hanns Hartman
hwhartman@gmail.com

Hi,

I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.

2011-12-21 19:08:49 UTC - LOG: shutting down
2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes

I'm having trouble understanding what could cause this. I've looked
through the postgres sources and I don't understand what is "shutting
down" since this happens on a live system where no stop has been
requested.

Also It happens very infrequently and I don't yet know how to reproduce it.

I'm happy to provide more information.

thanks in advance for the help!
-HH

#2Scott Marlowe
scott.marlowe@gmail.com
In reply to: Hanns Hartman (#1)
Re: Trying to understand postgres crash

On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote:

Hi,

I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.

2011-12-21 19:08:49 UTC - LOG:  shutting down
2011-12-21 19:08:49 UTC -s PANIC:  concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes

I'm having trouble understanding what could cause this.  I've looked
through the postgres sources and I don't understand what is "shutting
down" since this happens on a live system where no stop has been
requested.

It says that it's detecting more than one process writing to the logs.
Is there some way you're machine is launching a second postgresql
instance against the same data store? It's just a guess, I've never
seen this error before so It's a stab in the dark.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hanns Hartman (#1)
Re: Trying to understand postgres crash

Hanns Hartman <hwhartman@gmail.com> writes:

I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.

2011-12-21 19:08:49 UTC - LOG: shutting down
2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes

Best guess is that something sent the background writer process a
SIGUSR2 signal. The first two messages indicate that it received such
a signal, tried to shut down, and panicked when it noticed that other
processes were still writing to xlog. The postmaster wouldn't (or at
least shouldn't, and hasn't been observed to) send such a signal while
it still has other live children, so I think you ought to look around
for other possible sources of a signal.

regards, tom lane

#4Hanns Hartman
hwhartman@gmail.com
In reply to: Scott Marlowe (#2)
Re: Trying to understand postgres crash

On Wed, Dec 21, 2011 at 8:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote:

Hi,

I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.

2011-12-21 19:08:49 UTC - LOG:  shutting down
2011-12-21 19:08:49 UTC -s PANIC:  concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes

I'm having trouble understanding what could cause this.  I've looked
through the postgres sources and I don't understand what is "shutting
down" since this happens on a live system where no stop has been
requested.

It says that it's detecting more than one process writing to the logs.
 Is there some way you're machine is launching a second postgresql
instance against the same data store? It's just a guess, I've never
seen this error before so It's a stab in the dark.

If I was running a second postgresql then I'd have some pretty huge
problems! ;-)
Thanks for the suggestion.
-HH

#5Hanns Hartman
hwhartman@gmail.com
In reply to: Tom Lane (#3)
Re: Trying to understand postgres crash

On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hanns Hartman <hwhartman@gmail.com> writes:

I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.

2011-12-21 19:08:49 UTC - LOG:  shutting down
2011-12-21 19:08:49 UTC - PANIC:  concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG:  background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG:  terminating any other active server processes

Best guess is that something sent the background writer process a
SIGUSR2 signal.  The first two messages indicate that it received such
a signal, tried to shut down, and panicked when it noticed that other
processes were still writing to xlog.  The postmaster wouldn't (or at
least shouldn't, and hasn't been observed to) send such a signal while
it still has other live children, so I think you ought to look around
for other possible sources of a signal.

                       regards, tom lane

Thanks for the advice. I'll look around our code base.

One other thought, my current setup mainly interacts with postgresql
via libpq and very rarely via psql. I have four connections open to
postgresql, two of which are connected to one database and two
connected to another. Could that set up or some bad usage of libpq be
leading to this signal being generated?

thanks
-HH

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Hanns Hartman (#5)
Re: Trying to understand postgres crash

Hanns Hartman <hwhartman@gmail.com> writes:

On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Best guess is that something sent the background writer process a
SIGUSR2 signal.

One other thought, my current setup mainly interacts with postgresql
via libpq and very rarely via psql. I have four connections open to
postgresql, two of which are connected to one database and two
connected to another. Could that set up or some bad usage of libpq be
leading to this signal being generated?

Shouldn't. You're not running the client-side code as the postgres
user, are you? Barring a surprising kernel bug, only postgres-owned
or root-owned processes could successfully issue kill(SIGUSR2) against
the background writer process, so that set of processes is where you
need to look. I'd first try getting rid of any operations that are
running under the postgres account but don't really need to do so.

regards, tom lane

#7Hanns Hartman
hwhartman@gmail.com
In reply to: Tom Lane (#6)
Re: Trying to understand postgres crash

On Thu, Dec 22, 2011 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Hanns Hartman <hwhartman@gmail.com> writes:

On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Best guess is that something sent the background writer process a
SIGUSR2 signal.

One other thought, my current setup mainly interacts with postgresql
via libpq and very rarely via psql.  I have four connections open to
postgresql, two of which are connected to one database and two
connected to another.  Could that set up or some bad usage of libpq be
leading to this signal being generated?

Shouldn't.  You're not running the client-side code as the postgres
user, are you?  Barring a surprising kernel bug, only postgres-owned
or root-owned processes could successfully issue kill(SIGUSR2) against
the background writer process, so that set of processes is where you
need to look.  I'd first try getting rid of any operations that are
running under the postgres account but don't really need to do so.

Nope the client side code is not running as the postgres user, but it
is running as root.
Unfortunately most daemons in our system run as root. Also a quick
check or our code base yielded no usages of kill with SIGUSR2. We do
however use kill with other signals.

I will continue digging, but thanks a lot for the help! At least I
can narrow down my search a bit ;-)
-HH