Possible alpha5 SR bug

Started by Jeff Davisabout 16 years ago3 messagesbugs
Jump to latest
#1Jeff Davis
pgsql@j-davis.com

During the testing day organized a week ago, Quinn
Weaver ran into what looks like a problem. I attached the log output at
the end of this email. Note that he was running a Mac, but replicating
from a Linux machine (both 64-bit). I know this is not a supported
configuration, but a segfault seems like a problem anyway.

Quinn helpfully provided a tarball of his data directory here:

http://fairpath.com/QuinnPgBug.tar.gz

and described his machine

"My machine is a Mac with an Intel Core 2 Duo processor (64-bit)
running Mac OS X 10.6.3. It has 2 GB of RAM, which should be plenty
for the config we used."

I was trying to sort this bug out somewhat before posting, but we
weren't able to reproduce it (it happened near the end of testing, and
people were leaving), and I didn't have much chance to investigate in
the last week.

Regards,
Jeff Davis

postgres@tao:/usr/local/pgsql-9.0alpha5-build1/data/data9.0$ ../../bin/postmaster -D /usr/local/pgsql-9.0alpha5-build1/data/data9.0
LOG: database system was interrupted; last known up at 2010-04-03 16:55:20 PDT
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: entering standby mode
LOG: redo starts at 0/BC0000B8
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: unexpected pageaddr 0/9B000000 in log file 0, segment 189, offset 0
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
FATAL: could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so": dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so, 10): Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
Reason: no suitable image found. Did find:
/Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: unexpected pageaddr 0/9B000000 in log file 0, segment 189, offset 0
Warning: Identity file /root/replicationkey not accessible: No such file or directory.
Could not create directory '/var/empty/.ssh'.
ssh_askpass: exec(/opt/local/libexec/ssh-askpass): No such file or directory
Host key verification failed.
LOG: WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
postgres@tao:/usr/local/pgsql-9.0alpha5-build1/data/data9.0$ fg
bash: fg: current: no such job

#2Fujii Masao
masao.fujii@gmail.com
In reply to: Jeff Davis (#1)
Re: Possible alpha5 SR bug

Thanks for the test and report!

On Tue, Apr 13, 2010 at 1:36 PM, Jeff Davis <pgsql@j-davis.com> wrote:

FATAL:  could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so": dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so, 10): Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
         Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
         Reason: no suitable image found.  Did find:
               /Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13

Seems to have failed in loading libpq.5.dylib. I guess that "errno=13" means
"permission denied". Please ensure that the permission is appropriate.

LOG:  WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault

Oops! I guess that this happened because walrcv_disconnect() was called in
WalRcvDie() even though libpqwalreceiver.so couldn't be loaded. The attached
patch would fix the problem.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

walrcv_segv_v1.patchapplication/octet-stream; name=walrcv_segv_v1.patchDownload+3-3
#3Magnus Hagander
magnus@hagander.net
In reply to: Fujii Masao (#2)
Re: Possible alpha5 SR bug

On Tue, Apr 13, 2010 at 08:22, Fujii Masao <masao.fujii@gmail.com> wrote:

Thanks for the test and report!

On Tue, Apr 13, 2010 at 1:36 PM, Jeff Davis <pgsql@j-davis.com> wrote:

FATAL:  could not load library "/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so": dlopen(/usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so, 10): Library not loaded: /usr/local/pgsql/lib/libpq.5.dylib
         Referenced from: /usr/local/pgsql-9.0alpha5-build1/lib/libpqwalreceiver.so
         Reason: no suitable image found.  Did find:
               /Users/quinn/lib/libpq.5.dylib: stat() failed with errno=13

Seems to have failed in loading libpq.5.dylib. I guess that "errno=13" means
"permission denied". Please ensure that the permission is appropriate.

LOG:  WAL receiver process (PID 1011) was terminated by signal 11: Segmentation fault

Oops! I guess that this happened because walrcv_disconnect() was called in
WalRcvDie() even though libpqwalreceiver.so couldn't be loaded. The attached
patch would fix the problem.

Applied, thanks.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/