dblink crash on PPC
Something odd is happening on buildfarm member wombat, a PPC970MP box
running Gentoo. We're getting dblink test failures. On the one I looked
at more closely I saw this:
[4ddf2c59.7aec:153] LOG: disconnection: session time: 0:00:00.444 user=markwkm database=contrib_regression host=[local]
and then:
[4ddf2c4e.79d4:2] LOG: server process (PID 31468) was terminated by signal 11: Segmentation fault
[4ddf2c4e.79d4:3] LOG: terminating any other active server processes
which makes it look like something is failing badly in the backend cleanup code. (7aec = hex(31468))
We don't seem to have a backtrace, which is sad.
This seems to be happening on the 9.0 branch too.
I wonder what it could be?
cheers
andrew
On Fri, May 27, 2011 at 8:44 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
Something odd is happening on buildfarm member wombat, a PPC970MP box
running Gentoo. We're getting dblink test failures. On the one I looked at
more closely I saw this:[4ddf2c59.7aec:153] LOG: disconnection: session time: 0:00:00.444
user=markwkm database=contrib_regression host=[local]and then:
[4ddf2c4e.79d4:2] LOG: server process (PID 31468) was terminated by signal
11: Segmentation fault
[4ddf2c4e.79d4:3] LOG: terminating any other active server processeswhich makes it look like something is failing badly in the backend cleanup
code. (7aec = hex(31468))We don't seem to have a backtrace, which is sad.
This seems to be happening on the 9.0 branch too.
I wonder what it could be?
Around when did it start failing?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote:
Andrew Dunstan <andrew@dunslane.net> wrote:
Something odd is happening on buildfarm member wombat, a PPC970MP
box running Gentoo. We're getting dblink test failures. On the
one I << looked at more closely I saw this:[4ddf2c59.7aec:153] LOG: disconnection: session time:
0:00:00.444
user=markwkm database=contrib_regression host=[local]and then:
[4ddf2c4e.79d4:2] LOG: server process (PID 31468) was terminated
by signal 11: Segmentation fault
[4ddf2c4e.79d4:3] LOG: terminating any other active server
processeswhich makes it look like something is failing badly in the
backend cleanup code. (7aec = hex(31468))We don't seem to have a backtrace, which is sad.
This seems to be happening on the 9.0 branch too.
I wonder what it could be?
Around when did it start failing?
According to the buildfarm logs the first failure was roughly 1 day
10 hours 40 minutes before this post.
Keep in mind that PPC is a platform with weak memory ordering....
-Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
Robert Haas <robertmhaas@gmail.com> wrote:
Around when did it start failing?
According to the buildfarm logs the first failure was roughly 1 day
10 hours 40 minutes before this post.
See
http://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=wombat&br=HEAD
The problem here is that wombat has been offline for about a month
before that, so it could have broken anytime in the past month.
It's also not unlikely that the hiatus signals a change in the
underlying hardware or software, which might have been the real
cause. (Mark?)
Keep in mind that PPC is a platform with weak memory ordering....
grebe, which is also a PPC64 machine, isn't showing the bug. And I just
failed to reproduce the problem on a RHEL6 PPC64 box. About to go try
it on RHEL5, which has a gcc version much closer to what wombat says
it's using, but I'm not very hopeful about that. I think the more
likely thing to be keeping in mind is that Gentoo is a platform with
poor quality control.
regards, tom lane
I wrote:
grebe, which is also a PPC64 machine, isn't showing the bug. And I just
failed to reproduce the problem on a RHEL6 PPC64 box. About to go try
it on RHEL5, which has a gcc version much closer to what wombat says
it's using, but I'm not very hopeful about that.
Nope, no luck there either. It's going to be hard to make any progress
on this without investigation on wombat itself.
regards, tom lane
On 11-05-27 12:35 PM, Tom Lane wrote:
grebe, which is also a PPC64 machine, isn't showing the bug. And I just
failed to reproduce the problem on a RHEL6 PPC64 box. About to go try
it on RHEL5, which has a gcc version much closer to what wombat says
it's using, but I'm not very hopeful about that. I think the more
likely thing to be keeping in mind is that Gentoo is a platform with
poor quality control.regards, tom lane
As another data point, the dblink regression tests work fine for me on a
PPC32 debian (squeeze,gcc 4.4.5) based system.
On Fri, May 27, 2011 at 10:06 AM, Steve Singer <ssinger@ca.afilias.info> wrote:
As another data point, the dblink regression tests work fine for me on a
PPC32 debian (squeeze,gcc 4.4.5) based system.
Given that it's dblink my guess is that it's picking up the wrong
version of libpq somehow.
--
greg
Greg Stark <gsstark@mit.edu> writes:
On Fri, May 27, 2011 at 10:06 AM, Steve Singer <ssinger@ca.afilias.info> wrote:
As another data point, the dblink regression tests work fine for me on a
PPC32 debian (squeeze,gcc 4.4.5) based system.
Given that it's dblink my guess is that it's picking up the wrong
version of libpq somehow.
Maybe, but then why does the test only crash during backend exit, and
not while it's exercising dblink?
regards, tom lane