Re: BUG #5862: Postgres dumps core upon a connection attempt

Started by Kevin Grittnerabout 15 years ago12 messagesbugs
Jump to latest
#1Kevin Grittner
Kevin.Grittner@wicourts.gov

"Matt Zinicola" wrote:

PostgreSQL version: 9.0.3
Operating system: Linux (Fedora 14, kernel 2.6.35-10-74), 64-bit
Description: Postgres dumps core upon a connection attempt
Details:

A simple compile from source and install (as per usual) on Fedora
14 yielded crashes of client applications attempting to connect.

I first observed this with archiveopeteryx. As a sanity check, I
then attempted a connection with psql itself, which also crashed.

Please let me know if further information is needed.

Build options? Error messages? Contents of log files? Backtrace
from the core file you mentioned?

-Kevin

#2Matt Zinicola
matt@zinicola.com
In reply to: Kevin Grittner (#1)
Re: BUG #5862: Postgres dumps core upon a connection attempt

Apologies for lack of detail. Although I've been using Postgres for
years, this is the first time I've had such an issue.

Build options were only --with-perl and --with-python

Below is the output when two different applications attempt to connect
to my 9.0.3 server (note, the second is psql itself):

[root@infinity postgres]# /etc/init.d/archiveopteryx start
Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on
backend 1)
/etc/init.d/archiveopteryx: line 24: 4240 Segmentation
fault /usr/local/archiveopteryx/bin/aox start
done.

[postgres@infinity scripts]$ psql template1
Segmentation fault (core dumped)

Kevin suggested doing a 'make check'. I did so, and it ended with the
following:

mkdir ./testtablespace
./pg_regress --inputdir=. --dlpath=. --multibyte=SQL_ASCII
--temp-install=./tmp_check --top-builddir=../../..
--schedule=./parallel_schedule
make[2]: *** [check] Segmentation fault (core dumped)
make[2]: Leaving directory
`/usr/local/src/postgresql-9.0.3/src/test/regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/usr/local/src/postgresql-9.0.3/src/test'
make: *** [check] Error 2

Also, my server doesn't seem to be logging anything, either (although
I'm using the same configuration and start script as 9.0.2)

Lastly, I don't see any 'core' files in the places I would expect.
If/when I find them, I can send along.

- Matt

Show quoted text

On Wed, 2011-02-02 at 15:56 -0600, Kevin Grittner wrote:

"Matt Zinicola" wrote:

PostgreSQL version: 9.0.3
Operating system: Linux (Fedora 14, kernel 2.6.35-10-74), 64-bit
Description: Postgres dumps core upon a connection attempt
Details:

A simple compile from source and install (as per usual) on Fedora
14 yielded crashes of client applications attempting to connect.

I first observed this with archiveopeteryx. As a sanity check, I
then attempted a connection with psql itself, which also crashed.

Please let me know if further information is needed.

Build options? Error messages? Contents of log files? Backtrace
from the core file you mentioned?

-Kevin

#3Craig Ringer
craig@2ndquadrant.com
In reply to: Matt Zinicola (#2)
Re: BUG #5862: Postgres dumps core upon a connection attempt

On 03/02/11 09:53, Matt Zinicola wrote:

Apologies for lack of detail. Although I've been using Postgres for
years, this is the first time I've had such an issue.

Build options were only --with-perl and --with-python

Below is the output when two different applications attempt to connect
to my 9.0.3 server (note, the second is psql itself):

[root@infinity postgres]# /etc/init.d/archiveopteryx start
Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on
backend 1)
/etc/init.d/archiveopteryx: line 24: 4240 Segmentation
fault /usr/local/archiveopteryx/bin/aox start
done.

[postgres@infinity scripts]$ psql template1
Segmentation fault (core dumped)

OK, so it's not the PostgreSQL backend that's crashing, it's psql.

You almost certainly have conflicting libraries lurking around
somewhere, so psql was built against one libpq but lands up getting
linked to another at runtime.

--
System & Network Administrator
POST Newspapers

#4Matt Zinicola
matt@zinicola.com
In reply to: Craig Ringer (#3)
Re: BUG #5862: Postgres dumps core upon a connection attempt

Hrm.

I did see the Fedora stashed copies of libpq.so.5 and libpq.so.5.2 in
/usr/lib64. I looked everywhere on the system for libpq.so*, and saw that the
only remaining copies where those in my source directory... so I re-built
9.0.3. A 'make check' still died in the same place within the regression
tests. I did a 'make install' anyhow. I cleaned out my data directory and
attempted a new initdb with 9.0.3. That seg faulted as well:

[postgres@infinity local]$ /usr/local/pgsql/bin/initdb -D /data/postgres
Segmentation fault (core dumped)

Any other suggestions?
- Matt

Quoting Craig Ringer <craig@postnewspapers.com.au>:

Show quoted text

On 03/02/11 09:53, Matt Zinicola wrote:

Apologies for lack of detail. Although I've been using Postgres for
years, this is the first time I've had such an issue.

Build options were only --with-perl and --with-python

Below is the output when two different applications attempt to connect
to my 9.0.3 server (note, the second is psql itself):

[root@infinity postgres]# /etc/init.d/archiveopteryx start
Starting Archiveopteryx: aox: Couldn't connect to PostgreSQL. (on
backend 1)
/etc/init.d/archiveopteryx: line 24: 4240 Segmentation
fault /usr/local/archiveopteryx/bin/aox start
done.

[postgres@infinity scripts]$ psql template1
Segmentation fault (core dumped)

OK, so it's not the PostgreSQL backend that's crashing, it's psql.

You almost certainly have conflicting libraries lurking around
somewhere, so psql was built against one libpq but lands up getting
linked to another at runtime.

--
System & Network Administrator
POST Newspapers

#5Craig Ringer
craig@2ndquadrant.com
In reply to: Matt Zinicola (#4)
Re: BUG #5862: Postgres dumps core upon a connection attempt

On 03/02/11 10:33, Matt Zinicola wrote:

Hrm.

I did see the Fedora stashed copies of libpq.so.5 and libpq.so.5.2 in
/usr/lib64. I looked everywhere on the system for libpq.so*, and saw that the
only remaining copies where those in my source directory... so I re-built
9.0.3. A 'make check' still died in the same place within the regression
tests. I did a 'make install' anyhow. I cleaned out my data directory and
attempted a new initdb with 9.0.3. That seg faulted as well:

[postgres@infinity local]$ /usr/local/pgsql/bin/initdb -D /data/postgres
Segmentation fault (core dumped)

What does:

ldd /usr/local/pgsql/bin/initdb

say?

--
System & Network Administrator
POST Newspapers

#6Craig Ringer
craig@2ndquadrant.com
In reply to: Kevin Grittner (#1)
Re: BUG #5862: Postgres dumps core upon a connection attempt

On 03/02/11 11:11, Matt Zinicola wrote:

OK, it doesn't seem to be a simple problem of linking to the wrong
library then. psql is linking to the correct libpq. initdb isn't linking
to anything much at all, but still crashes for no apparent reason.
Something else may be going on. Please supply the full command line you
used to "./configure" when compiling postgres. If you're not sure what
it was, you can find it in the top of "config.log" in your compile
directory.

Is there any chance you can get us a backtrace of one of the crashing
programs? Try this:

gdb --args psql

Once it loads, it'll drop you to a

(gdb)

prompt. Enter "run" then press enter.

(gdb) run

Psql will then load for a while, crash, and drop you back to a (gdb)
prompt after printing out a message like:

Program received signal SIGSEGV, Segmentation fault.

Enter the "bt" command at the (gdb) prompt and press enter.

(gdb) bt

... then copy and paste everything from "gdb --args psql" through to the
end of the output printed by "bt", put it on http://pastebin.com/ and
send a link to that in your reply email here.

I've created a sample to give you the idea, by starting psql then
intentionally crashing it by sending it a manual SIGSEGV. See:

http://pastebin.com/b8D9i2tb

--
System & Network Administrator
POST Newspapers

#7Matt Zinicola
matt@zinicola.com
In reply to: Craig Ringer (#6)
Re: BUG #5862: Postgres dumps core upon a connection attempt

As far as the configure options -- Originally, they were merely --with-perl and
--with-python, but just to rule out problems there, I've since just been going
with a straight compile (not additional options).

I will get the backtrace, etc. within the next hour or so. Thanks!

- Matt

Quoting Craig Ringer <craig@postnewspapers.com.au>:

Show quoted text

On 03/02/11 11:11, Matt Zinicola wrote:

OK, it doesn't seem to be a simple problem of linking to the wrong
library then. psql is linking to the correct libpq. initdb isn't linking
to anything much at all, but still crashes for no apparent reason.
Something else may be going on. Please supply the full command line you
used to "./configure" when compiling postgres. If you're not sure what
it was, you can find it in the top of "config.log" in your compile
directory.

Is there any chance you can get us a backtrace of one of the crashing
programs? Try this:

gdb --args psql

Once it loads, it'll drop you to a

(gdb)

prompt. Enter "run" then press enter.

(gdb) run

Psql will then load for a while, crash, and drop you back to a (gdb)
prompt after printing out a message like:

Program received signal SIGSEGV, Segmentation fault.

Enter the "bt" command at the (gdb) prompt and press enter.

(gdb) bt

... then copy and paste everything from "gdb --args psql" through to the
end of the output printed by "bt", put it on http://pastebin.com/ and
send a link to that in your reply email here.

I've created a sample to give you the idea, by starting psql then
intentionally crashing it by sending it a manual SIGSEGV. See:

http://pastebin.com/b8D9i2tb

--
System & Network Administrator
POST Newspapers

#8Craig Ringer
craig@2ndquadrant.com
In reply to: Kevin Grittner (#1)
Re: BUG #5862: Postgres dumps core upon a connection attempt

On 02/03/2011 11:15 PM, Matt Zinicola wrote:

I re-compiled with '--enable-debug' and got the symbols. The pastebin is at
http://pastebin.com/xMhEHFdT

That's really interesting. It's getting a NULL path pointer when - I
think - it tries to determine the location of the executables.

Presumably this is something bizarre in your environment - but I have no
idea what it might be. Maybe someone else reading will have an idea.

(Please reply-to-all on further messages so the -bugs list sees things)

--
Craig Ringer

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Craig Ringer (#8)
Re: BUG #5862: Postgres dumps core upon a connection attempt

Craig Ringer <craig@postnewspapers.com.au> writes:

On 02/03/2011 11:15 PM, Matt Zinicola wrote:

I re-compiled with '--enable-debug' and got the symbols. The pastebin is at
http://pastebin.com/xMhEHFdT

That's really interesting. It's getting a NULL path pointer when - I
think - it tries to determine the location of the executables.

Hmm ... gdb is evidently lying to us to some extent, because some of
those variables can't possibly be NULL, and control wouldn't have got
to where it says if others of them were. However, it seems clear that
it's dying while trying to determine the actual location of the initdb
executable. Are there any symlinks involved in the path
/usr/local/pgsql/bin/initdb ? Is that located on an unusual filesystem?

regards, tom lane

#10Matt Zinicola
matt@zinicola.com
In reply to: Tom Lane (#9)
Re: BUG #5862: Postgres dumps core upon a connection attempt

On Thu, 2011-02-03 at 22:23 -0500, Tom Lane wrote:

Craig Ringer <craig@postnewspapers.com.au> writes:

On 02/03/2011 11:15 PM, Matt Zinicola wrote:

I re-compiled with '--enable-debug' and got the symbols. The pastebin is at
http://pastebin.com/xMhEHFdT

That's really interesting. It's getting a NULL path pointer when - I
think - it tries to determine the location of the executables.

Hmm ... gdb is evidently lying to us to some extent, because some of
those variables can't possibly be NULL, and control wouldn't have got
to where it says if others of them were. However, it seems clear that
it's dying while trying to determine the actual location of the initdb
executable. Are there any symlinks involved in the path
/usr/local/pgsql/bin/initdb ? Is that located on an unusual filesystem?

regards, tom lane

It wasn't an unusual filesystem (other than being within a logical
volume). Nothing out of the ordinary -- a local /ext3 filesystem. I
did a clean re-install of Fedora from scratch, and boom! Postgres
compiled and installed just fine.

Two interesting tidits here (perhaps of note) -- 1) Against my
judgment, I had been using Fedora's upgrade process the last two times I
updated (from F12 to F13, and from F13 to F14). I wonder if that
botched something in my environment and 2) Nothing else on the system
seemed to have trouble (at least up until that point in time). I
suspect it was definitely something underneath Postgres, as when I
deleted everything in /usr/local/pgsql and my cluster (/data/postgres)
and started anew, it still had the very same problem. Just some
additional points of info.

In any case... a "clean" install of Fedora 14 did not yield this
problem, so feel free to close this issue if/when you feel appropriate.

Thanks to everyone that was lending assistance. It's much appreciated.

- Matt

#11Mark Kirkwood
mark.kirkwood@catalyst.net.nz
In reply to: Craig Ringer (#8)
Re: BUG #5862: Postgres dumps core upon a connection attempt

On 04/02/11 15:11, Craig Ringer wrote:

On 02/03/2011 11:15 PM, Matt Zinicola wrote:

I re-compiled with '--enable-debug' and got the symbols. The
pastebin is at
http://pastebin.com/xMhEHFdT

That's really interesting. It's getting a NULL path pointer when - I
think - it tries to determine the location of the executables.

Presumably this is something bizarre in your environment - but I have
no idea what it might be. Maybe someone else reading will have an idea.

(Coming in too late, but...)

I'd be interested to see what happens if you do:

$ export PATH=/usr/local/pgsql/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/pgsql/lib
$ initdb -D /data/postgres
$ pg_ctl -D /data/postgres start;
$ psql

I'm guessing that there are older libraries or binaries earlier in your
various env paths, and these are tripping up postgres.

Cheers

Mark

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matt Zinicola (#10)
Re: BUG #5862: Postgres dumps core upon a connection attempt

Matthew Zinicola <matt@zinicola.com> writes:

It wasn't an unusual filesystem (other than being within a logical
volume). Nothing out of the ordinary -- a local /ext3 filesystem. I
did a clean re-install of Fedora from scratch, and boom! Postgres
compiled and installed just fine.

Two interesting tidits here (perhaps of note) -- 1) Against my
judgment, I had been using Fedora's upgrade process the last two times I
updated (from F12 to F13, and from F13 to F14). I wonder if that
botched something in my environment and 2) Nothing else on the system
seemed to have trouble (at least up until that point in time).

Hmm. Given that you couldn't reproduce it on a clean system, I'd have
to agree that it sounds like something was a bit wacko about the
upgraded system. One does hear of people having trouble with that
process from time to time.

regards, tom lane