Bug in ecpg lib ?

Started by Leif Jensenalmost 17 years ago9 messagesgeneral
Jump to latest
#1Leif Jensen
leif@crysberg.dk

Hi guys,

I'm using PostgreSQL in a server project that uses many forks and many threads in each forked process.

Almost everytime I do a pthread_cancel() I get a SIGSEGV. I have then linked the libmudflapth into my program to catch the problem sooner and now that reports either 'invalid pointer' or 'double free or corruption' when a thread is cancelled. Typically I have 2 database connection opened before any of the threads are created. I am pretty sure that I'm only using 1 connection in any 1 thread, i.e. only 2 of the threads are doing database access and using each their allocated connection.

After the main thread has done a pthread_cancel() I get a "mudflapth dump" with the following trace back (the abort comes from the mudflapth lib when detecting the bad pointer):

#0 0xffffe405 in __kernel_vsyscall ()
#1 0xf7ca2335 in raise () from /lib32/libc.so.6
#2 0xf7ca3cb1 in abort () from /lib32/libc.so.6
#3 0xf7cdb6ec in ?? () from /lib32/libc.so.6
#4 0xf7ce71ab in free () from /lib32/libc.so.6
#5 0xf7dec061 in free (buf=0x87ed138) at ../../../libmudflap/mf-hooks1.c:241
#6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6
#7 0xf7dcebb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
#8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
#9 0xf7d5008e in clone () from /lib32/libc.so.6

Looking in the ecpg_sqlca_key_destructor(), it seems to me that the sqlca can be deallocated several times !? (I'm not too much into the Postgres code including ecpg, so that is a novice point of view.)

I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with exactly the same result and and on many different Linux systems, mainly Slackware 10.2 and Ubuntu 7. I have on all systems configured and compiled Postgres with this configure line:

./configure --prefix=/usr/local/Packages/pgsql-8.3.5 --with-openssl --enable-thread-safety

Please help,

Leif

#2Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Leif Jensen (#1)
Re: Bug in ecpg lib ?

leif@crysberg.dk wrote:

I'm using PostgreSQL in a server project that uses many
forks and many threads in each forked process.

Almost everytime I do a pthread_cancel() I get a SIGSEGV.
I have then linked the libmudflapth into my program to catch
the problem sooner and now that reports either 'invalid
pointer' or 'double free or corruption' when a thread is
cancelled. Typically I have 2 database connection opened
before any of the threads are created. I am pretty sure that
I'm only using 1 connection in any 1 thread, i.e. only 2 of
the threads are doing database access and using each their
allocated connection.

After the main thread has done a pthread_cancel() I get a
"mudflapth dump" with the following trace back (the abort
comes from the mudflapth lib when detecting the bad pointer):

#0 0xffffe405 in __kernel_vsyscall ()
#1 0xf7ca2335 in raise () from /lib32/libc.so.6
#2 0xf7ca3cb1 in abort () from /lib32/libc.so.6
#3 0xf7cdb6ec in ?? () from /lib32/libc.so.6
#4 0xf7ce71ab in free () from /lib32/libc.so.6
#5 0xf7dec061 in free (buf=0x87ed138) at ../../../libmudflap/mf-hooks1.c:241
#6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6
#7 0xf7dcebb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
#8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
#9 0xf7d5008e in clone () from /lib32/libc.so.6

Looking in the ecpg_sqlca_key_destructor(), it seems to me
that the sqlca can be deallocated several times !? (I'm not
too much into the Postgres code including ecpg, so that is a
novice point of view.)

I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with
exactly the same result and and on many different Linux
systems, mainly Slackware 10.2 and Ubuntu 7. I have on all
systems configured and compiled Postgres with this configure line:

./configure --prefix=/usr/local/Packages/pgsql-8.3.5
--with-openssl --enable-thread-safety

Could you create a small sample program that reproduces the bug?

That would make it easier for me or somebody else to do something about it.

Yours,
Laurenz Albe

#3Leif Jensen
leif@crysberg.dk
In reply to: Laurenz Albe (#2)
Re: Bug in ecpg lib ?

Hi Laurenz,

Thanks for the suggestion. It sure wasn't easy, but I should have done that right away. It turned out not to be in the ecpg module, but somewhere in my own code (of course ;-) ). At least I haven't been able to reproduce it in a simple example and I haven't figured out where in my own code yet either.

Leif

----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

Show quoted text

leif@crysberg.dk wrote:

I'm using PostgreSQL in a server project that uses many
forks and many threads in each forked process.

Almost everytime I do a pthread_cancel() I get a SIGSEGV.
I have then linked the libmudflapth into my program to catch
the problem sooner and now that reports either 'invalid
pointer' or 'double free or corruption' when a thread is
cancelled. Typically I have 2 database connection opened
before any of the threads are created. I am pretty sure that
I'm only using 1 connection in any 1 thread, i.e. only 2 of
the threads are doing database access and using each their
allocated connection.

After the main thread has done a pthread_cancel() I get a
"mudflapth dump" with the following trace back (the abort
comes from the mudflapth lib when detecting the bad pointer):

#0 0xffffe405 in __kernel_vsyscall ()
#1 0xf7ca2335 in raise () from /lib32/libc.so.6
#2 0xf7ca3cb1 in abort () from /lib32/libc.so.6
#3 0xf7cdb6ec in ?? () from /lib32/libc.so.6
#4 0xf7ce71ab in free () from /lib32/libc.so.6
#5 0xf7dec061 in free (buf=0x87ed138) at

../../../libmudflap/mf-hooks1.c:241

#6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from

/lib32/libecpg.so.6

#7 0xf7dcebb0 in __nptl_deallocate_tsd () from

/lib32/libpthread.so.0

#8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
#9 0xf7d5008e in clone () from /lib32/libc.so.6

Looking in the ecpg_sqlca_key_destructor(), it seems to me
that the sqlca can be deallocated several times !? (I'm not
too much into the Postgres code including ecpg, so that is a
novice point of view.)

I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with
exactly the same result and and on many different Linux
systems, mainly Slackware 10.2 and Ubuntu 7. I have on all
systems configured and compiled Postgres with this configure line:

./configure --prefix=/usr/local/Packages/pgsql-8.3.5
--with-openssl --enable-thread-safety

Could you create a small sample program that reproduces the bug?

That would make it easier for me or somebody else to do something
about it.

Yours,
Laurenz Albe

#4Leif Jensen
leif@crysberg.dk
In reply to: Leif Jensen (#3)
Re: Bug in ecpg lib ?

Hi Laurenz,

I have now generate a rather small example where I experience the problem, attached. It is linked with the mudflapth library using the commands below. You may have to change the DBNAME and DBUSER. The delay just before the pthread_cancel(), i.e. sleep(10), is rather critical for the problem to appear and you might have to change it to something less. On some very slow machines I wasn't able to produce the problem.

$ ecpg crashex.pgc
$ /usr/local/Packages/gcc-4.4.0/bin/gcc -O0 -c -fmudflap -fmudflapth -fomit-frame-pointer -B/usr/local/Packages/gcc-4.4.0/bin/ -Wwrite-strings -std=gnu89 -ggdb -fPIC -Wall -I/usr/local/Packages/pgsql/include -I./Modules -I./ -o crashex.o crashex.c
$ /usr/local/Packages/gcc-4.4.0/bin/gcc -O0 -B/usr/local/Packages/gcc-4.4.0/bin/ -Wl -o crashex crashex.o -L/usr/local/Packages/pgsql/lib -lecpg -lpq -lmudflapth -lpthread -ldl

And this is the output from running the program:

leif$ LD_LIBRARY_PATH=/usr/local/Packages/gcc-4.4.0/lib/ ./crashex
Couldn't open somename@localhost:5432
2+2=0.
*** glibc detected *** /home/leif/tmp/crashex: free(): invalid pointer: 0x081f3958 ***
======= Backtrace: =========
/lib32/libc.so.6[0xf7c30615]
/lib32/libc.so.6(cfree+0x90)[0xf7c34080]
/usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0(__real_free+0x3f1)[0xf7d39061]
/lib32/libecpg.so.6[0xf7e3fb5c]
/lib32/libpthread.so.0[0xf7d1bbb0]
/lib32/libpthread.so.0[0xf7d1c509]
/lib32/libc.so.6(clone+0x5e)[0xf7c9d08e]
======= Memory map: ========
08048000-0804a000 r-xp 00000000 08:0a 1671173 /home/leif/tmp/crashex
0804a000-0804b000 rwxp 00001000 08:0a 1671173 /home/leif/tmp/crashex
0804b000-081f8000 rwxp 0804b000 00:00 0 [heap]
f71eb000-f71ec000 ---p f71eb000 00:00 0
f71ec000-f79ec000 rwxp f71ec000 00:00 0
f79ec000-f79f5000 r-xp 00000000 08:01 97934 /lib32/libnss_files-2.7.so
f79f5000-f79f7000 rwxp 00008000 08:01 97934 /lib32/libnss_files-2.7.so
f79f7000-f79ff000 r-xp 00000000 08:01 97936 /lib32/libnss_nis-2.7.so
f79ff000-f7a01000 rwxp 00007000 08:01 97936 /lib32/libnss_nis-2.7.so
f7a01000-f7a15000 r-xp 00000000 08:01 97931 /lib32/libnsl-2.7.so
f7a15000-f7a17000 rwxp 00013000 08:01 97931 /lib32/libnsl-2.7.so
f7a17000-f7a19000 rwxp f7a17000 00:00 0
f7a19000-f7a20000 r-xp 00000000 08:01 97932 /lib32/libnss_compat-2.7.so
f7a20000-f7a22000 rwxp 00006000 08:01 97932 /lib32/libnss_compat-2.7.so
f7a22000-f7a2c000 r-xp 00000000 08:08 227374 /usr/lib32/libgcc_s.so.1
f7a2c000-f7a2d000 rwxp 0000a000 08:08 227374 /usr/lib32/libgcc_s.so.1
f7a2d000-f7a2f000 rwxp f7a2d000 00:00 0
f7a2f000-f7a38000 r-xp 00000000 08:01 97927 /lib32/libcrypt-2.7.so
f7a38000-f7a3a000 rwxp 00008000 08:01 97927 /lib32/libcrypt-2.7.so
f7a3a000-f7a61000 rwxp f7a3a000 00:00 0
f7a61000-f7b4a000 r-xp 00000000 08:01 98015 /lib32/libcrypto.so.0.9.7
f7b4a000-f7b5c000 rwxp 000e8000 08:01 98015 /lib32/libcrypto.so.0.9.7
f7b5c000-f7b5f000 rwxp f7b5c000 00:00 0
f7b5f000-f7b8d000 r-xp 00000000 08:01 98021 /lib32/libssl.so.0.9.7
f7b8d000-f7b90000 rwxp 0002d000 08:01 98021 /lib32/libssl.so.0.9.7
f7b90000-f7bb3000 r-xp 00000000 08:01 97929 /lib32/libm-2.7.so
f7bb3000-f7bb5000 rwxp 00023000 08:01 97929 /lib32/libm-2.7.so
f7bb5000-f7bb6000 rwxp f7bb5000 00:00 0
f7bb6000-f7bc2000 r-xp 00000000 08:01 97971 /lib32/libpgtypes.so.3.0
f7bc2000-f7bc4000 rwxp 0000c000 08:01 97971 /lib32/libpgtypes.so.3.0
f7bc4000-f7d0d000 r-xp 00000000 08:01 97925 /lib32/libc-2.7.so
f7d0d000-f7d0e000 r-xp 00149000 08:01 97925 /lib32/libc-2.7.so
f7d0e000-f7d10000 rwxp 0014a000 08:01 97925 /lib32/libc-2.7.so
f7d10000-f7d13000 rwxp f7d10000 00:00 0
f7d13000-f7d15000 r-xp 00000000 08:01 97928 /lib32/libdl-2.7.so
f7d15000-f7d17000 rwxp 00001000 08:01 97928 /lib32/libdl-2.7.so
f7d17000-f7d2b000 r-xp 00000000 08:01 97939 /lib32/libpthread-2.7.so
f7d2b000-f7d2d000 rwxp 00013000 08:01 97939 /lib32/libpthread-2.7.so
f7d2d000-f7d2f000 rwxp f7d2d000 00:00 0
f7d2f000-f7d49000 r-xp 00000000 08:09 934128 /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0.0.0
f7d49000-f7d4c000 rwxp 0001a000 08:09 934128 /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0.0.0
f7d4c000-f7e1b000 rwxp f7d4c000 00:00 0
f7e1b000-f7e35000 r-xp 00000000 08:01 98012 /lib32/libpq.so.5.0
f7e35000-f7e36000 rwxp 0001a000 08:01 98012 /lib32/libpq.so.5.0
f7e36000-f7e44000 r-xp 00000000 08:01 97969 /lib32/libecpg.so.6.0
f7e44000-f7f05000 rwxp 0000d000 08:01 97969 /lib32/libecpg.so.6.0
f7f18000-f7f1b000 rwxp f7f18000 00:00 0
f7f1b000-f7f38000 r-xp 00000000 08:01 97922 /lib32/ld-2.7.so
f7f38000-f7f3a000 rwxp 0001c000 08:01 97922 /lib32/ld-2.7.so
fff3f000-fff59000 rwxp 7ffffffe5000 00:00 0 [stack]
ffffe000-fffff000 r-xp ffffe000 00:00 0 [vdso]
Aborted (core dumped)

leif$ gdb ~/tmp/crashex core.30920
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html&gt;
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu"...

warning: Can't read pathname for load map: Input/output error.
Reading symbols from /lib32/libecpg.so.6...done.
Loaded symbols for /lib32/libecpg.so.6
Reading symbols from /lib32/libpq.so.5...done.
Loaded symbols for /lib32/libpq.so.5
Reading symbols from /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0...done.
Loaded symbols for /usr/local/Packages/gcc-4.4.0/lib/libmudflapth.so.0
Reading symbols from /lib32/libpthread.so.0...done.
Loaded symbols for /lib32/libpthread.so.0
Reading symbols from /lib32/libdl.so.2...done.
Loaded symbols for /lib32/libdl.so.2
Reading symbols from /lib32/libc.so.6...done.
Loaded symbols for /lib32/libc.so.6
Reading symbols from /lib32/libpgtypes.so.3...done.
Loaded symbols for /lib32/libpgtypes.so.3
Reading symbols from /lib32/libm.so.6...done.
Loaded symbols for /lib32/libm.so.6
Reading symbols from /lib32/libssl.so.0...done.
Loaded symbols for /lib32/libssl.so.0
Reading symbols from /lib32/libcrypto.so.0...done.
Loaded symbols for /lib32/libcrypto.so.0
Reading symbols from /lib32/libcrypt.so.1...done.
Loaded symbols for /lib32/libcrypt.so.1
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /usr/lib32/libgcc_s.so.1...done.
Loaded symbols for /usr/lib32/libgcc_s.so.1
Reading symbols from /lib32/libnss_compat.so.2...done.
Loaded symbols for /lib32/libnss_compat.so.2
Reading symbols from /lib32/libnsl.so.1...done.
Loaded symbols for /lib32/libnsl.so.1
Reading symbols from /lib32/libnss_nis.so.2...done.
Loaded symbols for /lib32/libnss_nis.so.2
Reading symbols from /lib32/libnss_files.so.2...done.
Loaded symbols for /lib32/libnss_files.so.2

warning: Lowest section in system-supplied DSO at 0xffffe000 is .hash at ffffe0b4
Program terminated with signal 6, Aborted.
[New process 30922]
[New process 30920]
#0 0xffffe405 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe405 in __kernel_vsyscall ()
#1 0xf7bef335 in raise () from /lib32/libc.so.6
#2 0xf7bf0cb1 in abort () from /lib32/libc.so.6
#3 0xf7c286ec in ?? () from /lib32/libc.so.6
#4 0xf7c30615 in ?? () from /lib32/libc.so.6
#5 0xf7c34080 in free () from /lib32/libc.so.6
#6 0xf7d39061 in free (buf=0x81f3958) at ../../../libmudflap/mf-hooks1.c:241
#7 0xf7e3fb5c in ecpg_sqlca_key_destructor () from /lib32/libecpg.so.6
#8 0xf7d1bbb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
#9 0xf7d1c509 in start_thread () from /lib32/libpthread.so.0
#10 0xf7c9d08e in clone () from /lib32/libc.so.6
(gdb)

As you might have noticed, this particular run is on a 64bit architecture (Ubuntu 8.04) and the crashex program is generated on a 32bit machine with gcc-4.4.0. I have tried with PostgreSQL version 8.3.5 and 8.3.7. All give the same result, though the specific program addresses of course might differ from system to system.

Please help,

Leif

----- leif@crysberg.dk wrote:

Show quoted text

Hi Laurenz,

Thanks for the suggestion. It sure wasn't easy, but I should have
done that right away. It turned out not to be in the ecpg module, but
somewhere in my own code (of course ;-) ). At least I haven't been
able to reproduce it in a simple example and I haven't figured out
where in my own code yet either.

Leif

----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

leif@crysberg.dk wrote:

I'm using PostgreSQL in a server project that uses many
forks and many threads in each forked process.

Almost everytime I do a pthread_cancel() I get a SIGSEGV.
I have then linked the libmudflapth into my program to catch
the problem sooner and now that reports either 'invalid
pointer' or 'double free or corruption' when a thread is
cancelled. Typically I have 2 database connection opened
before any of the threads are created. I am pretty sure that
I'm only using 1 connection in any 1 thread, i.e. only 2 of
the threads are doing database access and using each their
allocated connection.

After the main thread has done a pthread_cancel() I get a
"mudflapth dump" with the following trace back (the abort
comes from the mudflapth lib when detecting the bad pointer):

#0 0xffffe405 in __kernel_vsyscall ()
#1 0xf7ca2335 in raise () from /lib32/libc.so.6
#2 0xf7ca3cb1 in abort () from /lib32/libc.so.6
#3 0xf7cdb6ec in ?? () from /lib32/libc.so.6
#4 0xf7ce71ab in free () from /lib32/libc.so.6
#5 0xf7dec061 in free (buf=0x87ed138) at

../../../libmudflap/mf-hooks1.c:241

#6 0xf7ef2b5c in ecpg_sqlca_key_destructor () from

/lib32/libecpg.so.6

#7 0xf7dcebb0 in __nptl_deallocate_tsd () from

/lib32/libpthread.so.0

#8 0xf7dcf509 in start_thread () from /lib32/libpthread.so.0
#9 0xf7d5008e in clone () from /lib32/libc.so.6

Looking in the ecpg_sqlca_key_destructor(), it seems to me
that the sqlca can be deallocated several times !? (I'm not
too much into the Postgres code including ecpg, so that is a
novice point of view.)

I have tried both pgsql-8.3.5 and pgsql-8.4rc1, with
exactly the same result and and on many different Linux
systems, mainly Slackware 10.2 and Ubuntu 7. I have on all
systems configured and compiled Postgres with this configure line:

./configure --prefix=/usr/local/Packages/pgsql-8.3.5
--with-openssl --enable-thread-safety

Could you create a small sample program that reproduces the bug?

That would make it easier for me or somebody else to do something
about it.

Yours,
Laurenz Albe

Attachments:

crashex.pgcapplication/octet-stream; name=crashex.pgcDownload
#5Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Leif Jensen (#4)
Re: Bug in ecpg lib ?

leif@crysberg.dk wrote:

I have now generate a rather small example where I
experience the problem, attached. It is linked with the
mudflapth library using the commands below. You may have to
change the DBNAME and DBUSER. The delay just before the
pthread_cancel(), i.e. sleep(10), is rather critical for the
problem to appear and you might have to change it to
something less. On some very slow machines I wasn't able to
produce the problem.

[...]

And this is the output from running the program:

leif$ LD_LIBRARY_PATH=/usr/local/Packages/gcc-4.4.0/lib/ ./crashex
Couldn't open somename@localhost:5432
2+2=0.
*** glibc detected *** /home/leif/tmp/crashex: free():
invalid pointer: 0x081f3958 ***

[...]

Aborted (core dumped)

leif$ gdb ~/tmp/crashex core.30920

[...]

#0 0xffffe405 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe405 in __kernel_vsyscall ()
#1 0xf7bef335 in raise () from /lib32/libc.so.6
#2 0xf7bf0cb1 in abort () from /lib32/libc.so.6
#3 0xf7c286ec in ?? () from /lib32/libc.so.6
#4 0xf7c30615 in ?? () from /lib32/libc.so.6
#5 0xf7c34080 in free () from /lib32/libc.so.6
#6 0xf7d39061 in free (buf=0x81f3958) at
../../../libmudflap/mf-hooks1.c:241
#7 0xf7e3fb5c in ecpg_sqlca_key_destructor () from
/lib32/libecpg.so.6
#8 0xf7d1bbb0 in __nptl_deallocate_tsd () from /lib32/libpthread.so.0
#9 0xf7d1c509 in start_thread () from /lib32/libpthread.so.0
#10 0xf7c9d08e in clone () from /lib32/libc.so.6
(gdb)

I ran your sample with gdb against PostgreSQL 8.4, and
ecpg_sqlca_key_destructor() was called only once, for a valid pointer,
one that was previously allocated with malloc().
Could you check if ecpg_sqlca_key_destructor() is called more than once if
you run the sample?

Are you aware that in your sample run the connection attempt failed?
It does not matter, ecpg should do the right thing anyway.

What I notice about your program is that you connect to the database
in the main thread, then start a new thread and use the connection in that
new thread.

I don't know, but I'd expect that since ecpg keeps a thread-specific
sqlca, this could cause problems. Indeed I find with the debugger that in
your sample sqlca is allocated and initialized twice, once when the
catabase connection is attempted, and once when the SQL statement is run.

I think that the "good" way to do it would be:
- start a thread
- connect to the database
- do work
- disconnect from the database
- terminate the thread

Maybe somebody who knows more about ecpg can say if what you are doing
should work or not.

Yours,
Laurenz Albe

#6Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Laurenz Albe (#5)
Re: Bug in ecpg lib ?

I wrote:

What I notice about your program is that you connect to the database
in the main thread, then start a new thread and use the connection in that
new thread.

I don't know, but I'd expect that since ecpg keeps a thread-specific
sqlca, this could cause problems. Indeed I find with the debugger that in
your sample sqlca is allocated and initialized twice, once when the
catabase connection is attempted, and once when the SQL statement is run.

I think that the "good" way to do it would be:
- start a thread
- connect to the database
- do work
- disconnect from the database
- terminate the thread

I thought some more about that, and it is obvioisly nonsense.
Why shouldn't you use a connection object in a different thread?

I'll try to come up with some more findings to help figure out
what's going on.

Yours,
Laurenz Albe

#7Leif Jensen
leif@crysberg.dk
In reply to: Laurenz Albe (#6)
Re: Bug in ecpg lib ?

Hi Laurenz,

Thank you for your effort. I appreciate it very much.

I have been trying to figure this thing out myself too, breakpointing and single stepping my way through some of the ecpg code, but without much clarification. (More that I learned new things about pthread). I have been trying to figure out whether this is a real thing or more a mudflapth "mis-judgement". Also on most (the faster ones) machines mudflap complains either about "invalid pointer in free()" or "double free() or corruption". I haven't been able to verify this yet. Specifically on one (slower) machine, I have only seen this mudflapth complaint once, though I have been both running and debugging it on that many times.

Are you sure what you suggest is nonsense ? In the light of the sqlca struct being "local" to each thread ? I tried to put the open and close connection within the thread, but I was still able to get the mudflap complaint. Theoretically, I guess one could use just 1 connection for all db access in all threads just having them enclosed within pthread_mutex_[un]lock()s !? (Not what I do, though.)

And for your previous mail: Yes, I know that my example does not make the connection, but are still doing the select ... It doesn't matter, however, if it does make a connection, it still bumps out.
And yes, I am aware that I open the connection in the "main thread" and use it another. This is the way real daemon program was designed.

Once again, thank you,

Leif

----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

Show quoted text

I wrote:

What I notice about your program is that you connect to the

database

in the main thread, then start a new thread and use the connection

in that

new thread.

I don't know, but I'd expect that since ecpg keeps a

thread-specific

sqlca, this could cause problems. Indeed I find with the debugger

that in

your sample sqlca is allocated and initialized twice, once when the
catabase connection is attempted, and once when the SQL statement is

run.

I think that the "good" way to do it would be:
- start a thread
- connect to the database
- do work
- disconnect from the database
- terminate the thread

I thought some more about that, and it is obvioisly nonsense.
Why shouldn't you use a connection object in a different thread?

I'll try to come up with some more findings to help figure out
what's going on.

Yours,
Laurenz Albe

#8Laurenz Albe
laurenz.albe@cybertec.at
In reply to: Leif Jensen (#7)
Re: Bug in ecpg lib ?

lj@crysberg.dk wrote:

I have been trying to figure this thing out myself too,
breakpointing and single stepping my way through some of the
ecpg code, but without much clarification. (More that I
learned new things about pthread). I have been trying to
figure out whether this is a real thing or more a mudflapth
"mis-judgement". Also on most (the faster ones) machines
mudflap complains either about "invalid pointer in free()" or
"double free() or corruption". I haven't been able to verify
this yet. Specifically on one (slower) machine, I have only
seen this mudflapth complaint once, though I have been both
running and debugging it on that many times.

Are you sure what you suggest is nonsense ? In the light
of the sqlca struct being "local" to each thread ? I tried to
put the open and close connection within the thread, but I
was still able to get the mudflap complaint. Theoretically, I
guess one could use just 1 connection for all db access in
all threads just having them enclosed within
pthread_mutex_[un]lock()s !? (Not what I do, though.)

The sqlca is local to each thread, but that should not be a problem.
On closer scrutiny of the source, it works like this:

Whenever a thread performs an SQL operation, it will allocate
an sqlca in its thread-specific data area (TSD) in the ECPG function
ECPGget_sqlca(). When the thread exits or is cancelled, the
sqlca is freed by pthread by calling the ECPG function
ecpg_sqlca_key_destructor(). pthread makes sure that each
destructor function is only called once per thread.

So when several threads use a connection, there will be
several sqlca's around, but that should not matter as they get
freed when the thread exits.

After some experiments, I would say that mudflap's complaint
is a mistake.

I've compiled your program against a debug-enabled PostgreSQL 8.4.0 with

$ ecpg crashex

$ gcc -Wall -O0 -g -o crashex crashex.c -I /magwien/postgres-8.4.0/include \
-L/magwien/postgres-8.4.0/lib -lecpg -Wl,-rpath,/magwien/postgres-8.4.0/lib

and run a gdb session:

$ gdb
GNU gdb Red Hat Linux (6.3.0.0-1.138.el3rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu".

Set the program to be debugged:

(gdb) file crashex
Reading symbols from /home/laurenz/ecpg/crashex...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".

This is where the source of libecpg is:

(gdb) dir /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib
Source directories searched: /home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib:$cdir:$cwd

Start the program (main thread):

(gdb) break main
Breakpoint 1 at 0x804892c: file crashex.pgc, line 54.
(gdb) run
Starting program: /home/laurenz/ecpg/crashex
[Thread debugging using libthread_db enabled]
[New Thread -1218572160 (LWP 29290)]
[Switching to Thread -1218572160 (LWP 29290)]

Breakpoint 1, main (argc=1, argv=0xbfffce44) at crashex.pgc:54
54 PerformTask( 25 );
(gdb) delete
Delete all breakpoints? (y or n) y

Set breakpoint #2 in the function where sqlca is freed:

(gdb) break ecpg_sqlca_key_destructor
Breakpoint 2 at 0x457a27: file misc.c, line 124.
(gdb) list misc.c:124
119
120 #ifdef ENABLE_THREAD_SAFETY
121 static void
122 ecpg_sqlca_key_destructor(void *arg)
123 {
124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */
125 }
126
127 static void
128 ecpg_sqlca_key_init(void)

Set breakpoint #3 where a new sqlca is allocated in ECPGget_sqlca():

(gdb) break misc.c:147
Breakpoint 3 at 0x457ad2: file misc.c, line 147.
(gdb) list misc.c:134,misc.c:149
134 struct sqlca_t *
135 ECPGget_sqlca(void)
136 {
137 #ifdef ENABLE_THREAD_SAFETY
138 struct sqlca_t *sqlca;
139
140 pthread_once(&sqlca_key_once, ecpg_sqlca_key_init);
141
142 sqlca = pthread_getspecific(sqlca_key);
143 if (sqlca == NULL)
144 {
145 sqlca = malloc(sizeof(struct sqlca_t));
146 ecpg_init_sqlca(sqlca);
147 pthread_setspecific(sqlca_key, sqlca);
148 }
149 return (sqlca);
(gdb) cont
Continuing.

Breakpoint #3 is hit when the main thread allocates an sqlca during connect:

Breakpoint 3, ECPGget_sqlca () at misc.c:147
147 pthread_setspecific(sqlca_key, sqlca);
(gdb) where
#0 ECPGget_sqlca () at misc.c:147
#1 0x00456d57 in ECPGconnect (lineno=41, c=0, name=0x9bf2008 "test@localhost:1238",
user=0x8048a31 "laureny", passwd=0x0, connection_name=0x8048a14 "dbConn", autocommit=0)
at connect.c:270
#2 0x080488a3 in PerformTask (TaskId=25) at crashex.pgc:41
#3 0x08048936 in main (argc=1, argv=0xbfffce44) at crashex.pgc:54

This is the address of the main thread's sqlca:

(gdb) print sqlca
$1 = (struct sqlca_t *) 0x9bf2028
(gdb) cont
Continuing.
[New Thread 27225008 (LWP 29343)]
[Switching to Thread 27225008 (LWP 29343)]

Breakpoint #3 is hit again when the new thread allocates its sqlca when it executes the SELECT statement:

Breakpoint 3, ECPGget_sqlca () at misc.c:147
147 pthread_setspecific(sqlca_key, sqlca);
(gdb) where
#0 ECPGget_sqlca () at misc.c:147
#1 0x004579aa in ecpg_init (con=0x0, connection_name=0x8048a14 "dbConn", lineno=22) at misc.c:107
#2 0x00451a97 in ECPGdo (lineno=22, compat=0, force_indicator=1,
connection_name=0x8048a14 "dbConn", questionmarks=0 '\0', st=0, query=0x8048a1b "select 2 + 2")
at execute.c:1470
#3 0x080487f7 in Work () at crashex.pgc:22
#4 0x00c8cdd8 in start_thread () from /lib/tls/libpthread.so.0
#5 0x003e5fca in clone () from /lib/tls/libc.so.6

This is the address of the new thread's sqlca:

(gdb) print sqlca
$2 = (struct sqlca_t *) 0x9c16ee8
(gdb) cont
Continuing.
2+2=0.

Breakpoint #2 is hit when the new thread is canceled:

Breakpoint 2, ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */
(gdb) where
#0 ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
#1 0x00c8d799 in deallocate_tsd () from /lib/tls/libpthread.so.0
#2 0x00c8cde6 in start_thread () from /lib/tls/libpthread.so.0
#3 0x003e5fca in clone () from /lib/tls/libc.so.6

The freed pointer is the sqlca of the new thread:

(gdb) print arg
$3 = (void *) 0x9c16ee8

And the program terminates with no problems.

(gdb) cont
Continuing.
[Thread 27225008 (zombie) exited]

Program exited normally.
(gdb) quit

This all looks just like it should, doesn't it?

Yours,
Laurenz Albe

#9Leif Jensen
leif@crysberg.dk
In reply to: Laurenz Albe (#8)
Re: Bug in ecpg lib ?

Hello Laurenz,

Thank you for your very thorough walk through the 'ecpg use' of threads with respect to the sqlca. It was very clear and specific. I reproduced what you did almost exactly as you have done and I could then also play around with things to see what happens 'if'... I have learned much about threads and ecpg, which I'm sure will be very helpful. Also I'm afraid I have to agree with you that it must be a mudflap flop ;-) ... unfortunately, because now I'm then back to the real problem in the larger program and how to track that error.

I'm pleased that it wasn't an ecpg bug, and I know now not to use mudflap for tracking my problem.

Thanks for your big effort on this,

Leif

----- "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

Show quoted text

lj@crysberg.dk wrote:

I have been trying to figure this thing out myself too,
breakpointing and single stepping my way through some of the
ecpg code, but without much clarification. (More that I
learned new things about pthread). I have been trying to
figure out whether this is a real thing or more a mudflapth
"mis-judgement". Also on most (the faster ones) machines
mudflap complains either about "invalid pointer in free()" or
"double free() or corruption". I haven't been able to verify
this yet. Specifically on one (slower) machine, I have only
seen this mudflapth complaint once, though I have been both
running and debugging it on that many times.

Are you sure what you suggest is nonsense ? In the light
of the sqlca struct being "local" to each thread ? I tried to
put the open and close connection within the thread, but I
was still able to get the mudflap complaint. Theoretically, I
guess one could use just 1 connection for all db access in
all threads just having them enclosed within
pthread_mutex_[un]lock()s !? (Not what I do, though.)

The sqlca is local to each thread, but that should not be a problem.
On closer scrutiny of the source, it works like this:

Whenever a thread performs an SQL operation, it will allocate
an sqlca in its thread-specific data area (TSD) in the ECPG function
ECPGget_sqlca(). When the thread exits or is cancelled, the
sqlca is freed by pthread by calling the ECPG function
ecpg_sqlca_key_destructor(). pthread makes sure that each
destructor function is only called once per thread.

So when several threads use a connection, there will be
several sqlca's around, but that should not matter as they get
freed when the thread exits.

After some experiments, I would say that mudflap's complaint
is a mistake.

I've compiled your program against a debug-enabled PostgreSQL 8.4.0
with

$ ecpg crashex

$ gcc -Wall -O0 -g -o crashex crashex.c -I
/magwien/postgres-8.4.0/include \
-L/magwien/postgres-8.4.0/lib -lecpg
-Wl,-rpath,/magwien/postgres-8.4.0/lib

and run a gdb session:

$ gdb
GNU gdb Red Hat Linux (6.3.0.0-1.138.el3rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and
you are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu".

Set the program to be debugged:

(gdb) file crashex
Reading symbols from /home/laurenz/ecpg/crashex...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".

This is where the source of libecpg is:

(gdb) dir
/home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib
Source directories searched:
/home/laurenz/rpmbuild/BUILD/postgresql-8.4.0/src/interfaces/ecpg/ecpglib:$cdir:$cwd

Start the program (main thread):

(gdb) break main
Breakpoint 1 at 0x804892c: file crashex.pgc, line 54.
(gdb) run
Starting program: /home/laurenz/ecpg/crashex
[Thread debugging using libthread_db enabled]
[New Thread -1218572160 (LWP 29290)]
[Switching to Thread -1218572160 (LWP 29290)]

Breakpoint 1, main (argc=1, argv=0xbfffce44) at crashex.pgc:54
54 PerformTask( 25 );
(gdb) delete
Delete all breakpoints? (y or n) y

Set breakpoint #2 in the function where sqlca is freed:

(gdb) break ecpg_sqlca_key_destructor
Breakpoint 2 at 0x457a27: file misc.c, line 124.
(gdb) list misc.c:124
119
120 #ifdef ENABLE_THREAD_SAFETY
121 static void
122 ecpg_sqlca_key_destructor(void *arg)
123 {
124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */
125 }
126
127 static void
128 ecpg_sqlca_key_init(void)

Set breakpoint #3 where a new sqlca is allocated in
ECPGget_sqlca():

(gdb) break misc.c:147
Breakpoint 3 at 0x457ad2: file misc.c, line 147.
(gdb) list misc.c:134,misc.c:149
134 struct sqlca_t *
135 ECPGget_sqlca(void)
136 {
137 #ifdef ENABLE_THREAD_SAFETY
138 struct sqlca_t *sqlca;
139
140 pthread_once(&sqlca_key_once, ecpg_sqlca_key_init);
141
142 sqlca = pthread_getspecific(sqlca_key);
143 if (sqlca == NULL)
144 {
145 sqlca = malloc(sizeof(struct sqlca_t));
146 ecpg_init_sqlca(sqlca);
147 pthread_setspecific(sqlca_key, sqlca);
148 }
149 return (sqlca);
(gdb) cont
Continuing.

Breakpoint #3 is hit when the main thread allocates an sqlca during
connect:

Breakpoint 3, ECPGget_sqlca () at misc.c:147
147 pthread_setspecific(sqlca_key, sqlca);
(gdb) where
#0 ECPGget_sqlca () at misc.c:147
#1 0x00456d57 in ECPGconnect (lineno=41, c=0, name=0x9bf2008
"test@localhost:1238",
user=0x8048a31 "laureny", passwd=0x0, connection_name=0x8048a14
"dbConn", autocommit=0)
at connect.c:270
#2 0x080488a3 in PerformTask (TaskId=25) at crashex.pgc:41
#3 0x08048936 in main (argc=1, argv=0xbfffce44) at crashex.pgc:54

This is the address of the main thread's sqlca:

(gdb) print sqlca
$1 = (struct sqlca_t *) 0x9bf2028
(gdb) cont
Continuing.
[New Thread 27225008 (LWP 29343)]
[Switching to Thread 27225008 (LWP 29343)]

Breakpoint #3 is hit again when the new thread allocates its sqlca
when it executes the SELECT statement:

Breakpoint 3, ECPGget_sqlca () at misc.c:147
147 pthread_setspecific(sqlca_key, sqlca);
(gdb) where
#0 ECPGget_sqlca () at misc.c:147
#1 0x004579aa in ecpg_init (con=0x0, connection_name=0x8048a14
"dbConn", lineno=22) at misc.c:107
#2 0x00451a97 in ECPGdo (lineno=22, compat=0, force_indicator=1,
connection_name=0x8048a14 "dbConn", questionmarks=0 '\0', st=0,
query=0x8048a1b "select 2 + 2")
at execute.c:1470
#3 0x080487f7 in Work () at crashex.pgc:22
#4 0x00c8cdd8 in start_thread () from /lib/tls/libpthread.so.0
#5 0x003e5fca in clone () from /lib/tls/libc.so.6

This is the address of the new thread's sqlca:

(gdb) print sqlca
$2 = (struct sqlca_t *) 0x9c16ee8
(gdb) cont
Continuing.
2+2=0.

Breakpoint #2 is hit when the new thread is canceled:

Breakpoint 2, ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
124 free(arg); /* sqlca structure allocated in ECPGget_sqlca */
(gdb) where
#0 ecpg_sqlca_key_destructor (arg=0x9c16ee8) at misc.c:124
#1 0x00c8d799 in deallocate_tsd () from /lib/tls/libpthread.so.0
#2 0x00c8cde6 in start_thread () from /lib/tls/libpthread.so.0
#3 0x003e5fca in clone () from /lib/tls/libc.so.6

The freed pointer is the sqlca of the new thread:

(gdb) print arg
$3 = (void *) 0x9c16ee8

And the program terminates with no problems.

(gdb) cont
Continuing.
[Thread 27225008 (zombie) exited]

Program exited normally.
(gdb) quit

This all looks just like it should, doesn't it?

Yours,
Laurenz Albe