pgcrypto related backend crash on solaris 10/x86_64

Started by Stefan Kaltenbrunnerover 18 years ago8 messages
#1Stefan Kaltenbrunner
stefan@kaltenbrunner.cc

I brought back clownfish(still a bit dubious about the unexplained
failures which seem vmware emulation bugs but this one seems to be
easily reproduceable) onto the buildfarm and enabled --with-openssl
after the the recent openssl/pgcrypto related fixes but I'm still
getting a backend crash during the pgcrypto regression tests:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clownfish&dt=2007-09-09%2012:14:50

backtrace looks like:

program terminated by signal SEGV (no mapping at the fault address)
0xfffffd7fff241b61: AES_encrypt+0x0241: xorq (%r15,%rdx,8),%rbx
(dbx) where
=>[1] AES_encrypt(0x5, 0x39dc9a7a, 0xf560e7b50e, 0x90ca350d49,
0xf560e7b50ea90dfb, 0x6b6b6b6b), at 0xfffffd7fff241b61
[2]: 0x0(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x0

Stefan

#2Marko Kreen
markokr@gmail.com
In reply to: Stefan Kaltenbrunner (#1)
Re: pgcrypto related backend crash on solaris 10/x86_64

On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:

I brought back clownfish(still a bit dubious about the unexplained
failures which seem vmware emulation bugs but this one seems to be
easily reproduceable) onto the buildfarm and enabled --with-openssl
after the the recent openssl/pgcrypto related fixes but I'm still
getting a backend crash during the pgcrypto regression tests:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clownfish&amp;dt=2007-09-09%2012:14:50

backtrace looks like:

program terminated by signal SEGV (no mapping at the fault address)
0xfffffd7fff241b61: AES_encrypt+0x0241: xorq (%r15,%rdx,8),%rbx
(dbx) where
=>[1] AES_encrypt(0x5, 0x39dc9a7a, 0xf560e7b50e, 0x90ca350d49,
0xf560e7b50ea90dfb, 0x6b6b6b6b), at 0xfffffd7fff241b61
[2] 0x0(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x0

This is crashing because of the crippled OpenSSL on some version
of Solaris. Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

I'll try to post v03 of Zdenek's patch ASAP.

--
marko

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Marko Kreen (#2)
Re: pgcrypto related backend crash on solaris 10/x86_64

"Marko Kreen" <markokr@gmail.com> writes:

On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:

I brought back clownfish(still a bit dubious about the unexplained
failures which seem vmware emulation bugs but this one seems to be
easily reproduceable) onto the buildfarm and enabled --with-openssl
after the the recent openssl/pgcrypto related fixes but I'm still
getting a backend crash during the pgcrypto regression tests:

This is crashing because of the crippled OpenSSL on some version
of Solaris. Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

But clownfish was working fine up through Aug 2, and the only change in
pgcrypto since then could hardly have introduced this failure:
http://archives.postgresql.org/pgsql-committers/2007-08/msg00306.php

So I think there's more to it than Marko's explanation. Maybe clownfish
now has a different OpenSSL version installed than before?

regards, tom lane

#4Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Tom Lane (#3)
Re: pgcrypto related backend crash on solaris 10/x86_64

Tom Lane wrote:

"Marko Kreen" <markokr@gmail.com> writes:

On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:

I brought back clownfish(still a bit dubious about the unexplained
failures which seem vmware emulation bugs but this one seems to be
easily reproduceable) onto the buildfarm and enabled --with-openssl
after the the recent openssl/pgcrypto related fixes but I'm still
getting a backend crash during the pgcrypto regression tests:

This is crashing because of the crippled OpenSSL on some version
of Solaris. Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

But clownfish was working fine up through Aug 2, and the only change in
pgcrypto since then could hardly have introduced this failure:
http://archives.postgresql.org/pgsql-committers/2007-08/msg00306.php

So I think there's more to it than Marko's explanation. Maybe clownfish
now has a different OpenSSL version installed than before?

no clownfish was not building with openssl before because of that
"crippled openssl" issue - I was under the assumption that the above
commit was actually incorporating the complete fix from zdenek so I
added it back again only to find that it is still not working ...

Stefan

#5Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Marko Kreen (#2)
Re: pgcrypto related backend crash on solaris 10/x86_64

Marko Kreen wrote:

On 9/9/07, Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:

I brought back clownfish(still a bit dubious about the unexplained
failures which seem vmware emulation bugs but this one seems to be
easily reproduceable) onto the buildfarm and enabled --with-openssl
after the the recent openssl/pgcrypto related fixes but I'm still
getting a backend crash during the pgcrypto regression tests:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=clownfish&amp;dt=2007-09-09%2012:14:50

backtrace looks like:

program terminated by signal SEGV (no mapping at the fault address)
0xfffffd7fff241b61: AES_encrypt+0x0241: xorq (%r15,%rdx,8),%rbx
(dbx) where
=>[1] AES_encrypt(0x5, 0x39dc9a7a, 0xf560e7b50e, 0x90ca350d49,
0xf560e7b50ea90dfb, 0x6b6b6b6b), at 0xfffffd7fff241b61
[2] 0x0(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x0

This is crashing because of the crippled OpenSSL on some version
of Solaris. Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

I'll try to post v03 of Zdenek's patch ASAP.

However, I guess there still will be a problem with regression tests,
because pg_crypto will reports error in case when user tries to use
stronger cipher, but it generates diff between expected and real output.

I don't know if is possible select different output based on test if
strong crypto is installed or not. Maybe some magic in
Makefile/Configure. Test should be:

# ldd /usr/postgres/8.2/lib/pgcrypto.so | grep libcrypto_extra
# libcrypto_extra.so.0.9.8 => (file not found)

if output contains (file not found) library is not installed or not in
path (/usr/sfw/lib).

Zdenek

#6Marko Kreen
markokr@gmail.com
In reply to: Zdenek Kotala (#5)
Re: pgcrypto related backend crash on solaris 10/x86_64

On 9/11/07, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:

Marko Kreen wrote:

This is crashing because of the crippled OpenSSL on some version
of Solaris. Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

I'll try to post v03 of Zdenek's patch ASAP.

However, I guess there still will be a problem with regression tests,
because pg_crypto will reports error in case when user tries to use
stronger cipher, but it generates diff between expected and real output.

I don't know if is possible select different output based on test if
strong crypto is installed or not. Maybe some magic in
Makefile/Configure. Test should be:

# ldd /usr/postgres/8.2/lib/pgcrypto.so | grep libcrypto_extra
# libcrypto_extra.so.0.9.8 => (file not found)

if output contains (file not found) library is not installed or not in
path (/usr/sfw/lib).

Failing regression tests are fine - it is good if user can
easily see that the os is broken.

--
marko

#7Zdenek Kotala
Zdenek.Kotala@Sun.COM
In reply to: Marko Kreen (#6)
Re: pgcrypto related backend crash on solaris 10/x86_64

Marko Kreen wrote:

On 9/11/07, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:

Marko Kreen wrote:

This is crashing because of the crippled OpenSSL on some version
of Solaris. Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

I'll try to post v03 of Zdenek's patch ASAP.

However, I guess there still will be a problem with regression tests,
because pg_crypto will reports error in case when user tries to use
stronger cipher, but it generates diff between expected and real output.

I don't know if is possible select different output based on test if
strong crypto is installed or not. Maybe some magic in
Makefile/Configure. Test should be:

# ldd /usr/postgres/8.2/lib/pgcrypto.so | grep libcrypto_extra
# libcrypto_extra.so.0.9.8 => (file not found)

if output contains (file not found) library is not installed or not in
path (/usr/sfw/lib).

Failing regression tests are fine - it is good if user can
easily see that the os is broken.

But if build machine still complain about problem we can easily
overlook another problems. There are two possible solution 1) modify reg
test or 2) recommend to install crypto package on all affected build
machine.

Anyway I plan to add some mention into solaris FAQ when we will have
final patch. I also think It should be good to mention in pg_crypto
README or add comment into regression test expected output file which
will be visible in regression.diff.

Zdenek

#8Stefan Kaltenbrunner
stefan@kaltenbrunner.cc
In reply to: Zdenek Kotala (#7)
Re: pgcrypto related backend crash on solaris 10/x86_64

Zdenek Kotala wrote:

Marko Kreen wrote:

On 9/11/07, Zdenek Kotala <Zdenek.Kotala@sun.com> wrote:

Marko Kreen wrote:

This is crashing because of the crippled OpenSSL on some version
of Solaris. Zdenek Kotala posted a workaround for that, I am
cleaning it but have not found the time to finalize it.

I'll try to post v03 of Zdenek's patch ASAP.

However, I guess there still will be a problem with regression tests,
because pg_crypto will reports error in case when user tries to use
stronger cipher, but it generates diff between expected and real output.

I don't know if is possible select different output based on test if
strong crypto is installed or not. Maybe some magic in
Makefile/Configure. Test should be:

# ldd /usr/postgres/8.2/lib/pgcrypto.so | grep libcrypto_extra
# libcrypto_extra.so.0.9.8 => (file not found)

if output contains (file not found) library is not installed or not in
path (/usr/sfw/lib).

Failing regression tests are fine - it is good if user can
easily see that the os is broken.

But if build machine still complain about problem we can easily
overlook another problems. There are two possible solution 1) modify reg
test or 2) recommend to install crypto package on all affected build
machine.

Anyway I plan to add some mention into solaris FAQ when we will have
final patch. I also think It should be good to mention in pg_crypto
README or add comment into regression test expected output file which
will be visible in regression.diff.

well in my opinion we should simply fail regression(not crash like we do
now) in case we have to deal with such a crippled openssl installation.
Adding information about that issue to the Solaris FAQ seems also like a
good thing.

Stefan