Trying to understand postgres crash
Hi,
I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.
2011-12-21 19:08:49 UTC - LOG: shutting down
2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes
I'm having trouble understanding what could cause this. I've looked
through the postgres sources and I don't understand what is "shutting
down" since this happens on a live system where no stop has been
requested.
Also It happens very infrequently and I don't yet know how to reproduce it.
I'm happy to provide more information.
thanks in advance for the help!
-HH
On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote:
Hi,
I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.2011-12-21 19:08:49 UTC - LOG: shutting down
2011-12-21 19:08:49 UTC -s PANIC: concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG: terminating any other active server processesI'm having trouble understanding what could cause this. I've looked
through the postgres sources and I don't understand what is "shutting
down" since this happens on a live system where no stop has been
requested.
It says that it's detecting more than one process writing to the logs.
Is there some way you're machine is launching a second postgresql
instance against the same data store? It's just a guess, I've never
seen this error before so It's a stab in the dark.
Hanns Hartman <hwhartman@gmail.com> writes:
I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.
2011-12-21 19:08:49 UTC - LOG: shutting down
2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG: terminating any other active server processes
Best guess is that something sent the background writer process a
SIGUSR2 signal. The first two messages indicate that it received such
a signal, tried to shut down, and panicked when it noticed that other
processes were still writing to xlog. The postmaster wouldn't (or at
least shouldn't, and hasn't been observed to) send such a signal while
it still has other live children, so I think you ought to look around
for other possible sources of a signal.
regards, tom lane
On Wed, Dec 21, 2011 at 8:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Wed, Dec 21, 2011 at 5:48 PM, Hanns Hartman <hwhartman@gmail.com> wrote:
Hi,
I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.2011-12-21 19:08:49 UTC - LOG: shutting down
2011-12-21 19:08:49 UTC -s PANIC: concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG: terminating any other active server processesI'm having trouble understanding what could cause this. I've looked
through the postgres sources and I don't understand what is "shutting
down" since this happens on a live system where no stop has been
requested.It says that it's detecting more than one process writing to the logs.
Is there some way you're machine is launching a second postgresql
instance against the same data store? It's just a guess, I've never
seen this error before so It's a stab in the dark.
If I was running a second postgresql then I'd have some pretty huge
problems! ;-)
Thanks for the suggestion.
-HH
On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hanns Hartman <hwhartman@gmail.com> writes:
I'm running postgres 8.3.17 on a redhat linux mips derivative and I've
observed a postgres crash and the subsequent messages in my postgres
debug log.2011-12-21 19:08:49 UTC - LOG: shutting down
2011-12-21 19:08:49 UTC - PANIC: concurrent transaction log activity
while database system is shutting down
2011-12-21 19:08:49 UTC - LOG: background writer process (PID 1423)
was terminated by signal 6: Aborted
2011-12-21 19:08:49 UTC - LOG: terminating any other active server processesBest guess is that something sent the background writer process a
SIGUSR2 signal. The first two messages indicate that it received such
a signal, tried to shut down, and panicked when it noticed that other
processes were still writing to xlog. The postmaster wouldn't (or at
least shouldn't, and hasn't been observed to) send such a signal while
it still has other live children, so I think you ought to look around
for other possible sources of a signal.regards, tom lane
Thanks for the advice. I'll look around our code base.
One other thought, my current setup mainly interacts with postgresql
via libpq and very rarely via psql. I have four connections open to
postgresql, two of which are connected to one database and two
connected to another. Could that set up or some bad usage of libpq be
leading to this signal being generated?
thanks
-HH
Hanns Hartman <hwhartman@gmail.com> writes:
On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Best guess is that something sent the background writer process a
SIGUSR2 signal.
One other thought, my current setup mainly interacts with postgresql
via libpq and very rarely via psql. I have four connections open to
postgresql, two of which are connected to one database and two
connected to another. Could that set up or some bad usage of libpq be
leading to this signal being generated?
Shouldn't. You're not running the client-side code as the postgres
user, are you? Barring a surprising kernel bug, only postgres-owned
or root-owned processes could successfully issue kill(SIGUSR2) against
the background writer process, so that set of processes is where you
need to look. I'd first try getting rid of any operations that are
running under the postgres account but don't really need to do so.
regards, tom lane
On Thu, Dec 22, 2011 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hanns Hartman <hwhartman@gmail.com> writes:
On Thu, Dec 22, 2011 at 7:29 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Best guess is that something sent the background writer process a
SIGUSR2 signal.One other thought, my current setup mainly interacts with postgresql
via libpq and very rarely via psql. I have four connections open to
postgresql, two of which are connected to one database and two
connected to another. Could that set up or some bad usage of libpq be
leading to this signal being generated?Shouldn't. You're not running the client-side code as the postgres
user, are you? Barring a surprising kernel bug, only postgres-owned
or root-owned processes could successfully issue kill(SIGUSR2) against
the background writer process, so that set of processes is where you
need to look. I'd first try getting rid of any operations that are
running under the postgres account but don't really need to do so.
Nope the client side code is not running as the postgres user, but it
is running as root.
Unfortunately most daemons in our system run as root. Also a quick
check or our code base yielded no usages of kill with SIGUSR2. We do
however use kill with other signals.
I will continue digging, but thanks a lot for the help! At least I
can narrow down my search a bit ;-)
-HH