ECPG still having thread problems on Linux
Hi all, it looks like Lee's ECPG (and libpq) thread-safety patches
have been applied, and configure --with-threads is also added. I
have been doing some testing.
On FreeBSD 4.8, the attached sample app runs without a problem.
However, I still encounter a threading problem on Linux (RedHat 7.3).
I have done the following:
1) cvs update
2) ./configure --with-threads && make && su -c "make install"
3) compiled cn.pgc as follows:
a) ecpg -t cn.pgc
b) gcc -I/usr/local/pgsql/include -L/usr/local/pgsql/lib \
-lecpg -lpgtypes -pthread cn.c
4) ./a.out - one thread runs to completion (inserts 5 records),
the other hangs (manages one insert, then blocks forever)
Using gdb, I attached to the thread that has locked up, and the backtrace
looks like this:
(gdb) backtrace
#0 0x420e0187 in poll () from /lib/i686/libc.so.6
#1 0x4007d8cc in pqSocketPoll () from /usr/local/pgsql/lib/libpq.so.3
#2 0x4007d7ed in pqSocketCheck () from /usr/local/pgsql/lib/libpq.so.3
#3 0x4007d71f in pqWaitTimed () from /usr/local/pgsql/lib/libpq.so.3
#4 0x4007d6f5 in pqWait () from /usr/local/pgsql/lib/libpq.so.3
#5 0x4007bb53 in PQgetResult () from /usr/local/pgsql/lib/libpq.so.3
#6 0x4007bcbb in PQexec () from /usr/local/pgsql/lib/libpq.so.3
#7 0x40026d81 in ECPGexecute () from /usr/local/pgsql/lib/libecpg.so.4
#8 0x4002724c in ECPGdo () from /usr/local/pgsql/lib/libecpg.so.4
#9 0x08048927 in ins2 ()
#10 0x40043faf in pthread_start_thread () from /lib/i686/libpthread.so.0
Can anyone shed some light on why the behaviour differs between these two
platforms?
Also, perhaps someone other there with access to a different Linux setup
(maybe a more recent build than RedHat 7.3, or a different distro) could try
this app themselves to help verify if this is something that's stuffed on
that release. I think I can rule out this problem being a quirk of my
particular setup, as 3 different machines (all running RH7.3) give identical
results.
Build env:
Linux 2.4.18-3
gcc version 2.96 20000731 (Red Hat Linux 7.3 2.96-113)
Regards, Philip Yarra.
Attachments:
Philip, both your SELECTs are using the same database connection (and
it's undefined which one it is) without any locking. You need to add
"AT clauses" to specify an explicit connection. See attached diff.
However, i've not tried it... I'll try and get some time!
L.
Philip Yarra writes:
Show quoted text
Hi all, it looks like Lee's ECPG (and libpq) thread-safety patches
have been applied, and configure --with-threads is also added. I
have been doing some testing.On FreeBSD 4.8, the attached sample app runs without a problem.
However, I still encounter a threading problem on Linux (RedHat 7.3).
I have done the following:
1) cvs update
2) ./configure --with-threads && make && su -c "make install"
3) compiled cn.pgc as follows:
a) ecpg -t cn.pgc
b) gcc -I/usr/local/pgsql/include -L/usr/local/pgsql/lib \
-lecpg -lpgtypes -pthread cn.c
4) ./a.out - one thread runs to completion (inserts 5 records),
the other hangs (manages one insert, then blocks forever)
Attachments:
cn.pgc.difftext/plainDownload
*** cn.pgc 2003-06-25 10:29:55.000000000 +0100
--- cn.pgc.new 2003-06-25 10:29:45.000000000 +0100
***************
*** 36,46 ****
EXEC SQL END DECLARE SECTION;
EXEC SQL WHENEVER sqlerror sqlprint;
EXEC SQL CONNECT TO :cs AS test1;
! EXEC SQL SET AUTOCOMMIT TO ON;
for (i = 0; i < 5; i++)
{
printf("thread1 inserting\n");
! EXEC SQL INSERT INTO foo VALUES(:bar);
printf("==>thread1 insert done\n");
}
EXEC SQL DISCONNECT test1;
--- 36,46 ----
EXEC SQL END DECLARE SECTION;
EXEC SQL WHENEVER sqlerror sqlprint;
EXEC SQL CONNECT TO :cs AS test1;
! EXEC SQL AT test1 SET AUTOCOMMIT TO ON;
for (i = 0; i < 5; i++)
{
printf("thread1 inserting\n");
! EXEC SQL AT test1 INSERT INTO foo VALUES(:bar);
printf("==>thread1 insert done\n");
}
EXEC SQL DISCONNECT test1;
***************
*** 57,67 ****
EXEC SQL END DECLARE SECTION;
EXEC SQL WHENEVER sqlerror sqlprint;
EXEC SQL CONNECT TO :cs AS test2;
! EXEC SQL SET AUTOCOMMIT TO ON;
for (i = 0; i < 5; i++)
{
printf("thread2 inserting\n");
! EXEC SQL INSERT INTO foo VALUES(:bar);
printf("==>thread2 insert done\n");
}
EXEC SQL DISCONNECT test2;
--- 57,67 ----
EXEC SQL END DECLARE SECTION;
EXEC SQL WHENEVER sqlerror sqlprint;
EXEC SQL CONNECT TO :cs AS test2;
! EXEC SQL AT test2 SET AUTOCOMMIT TO ON;
for (i = 0; i < 5; i++)
{
printf("thread2 inserting\n");
! EXEC SQL AT test2 INSERT INTO foo VALUES(:bar);
printf("==>thread2 insert done\n");
}
EXEC SQL DISCONNECT test2;
On Wed, 25 Jun 2003 07:35 pm, Lee Kindness wrote:
Philip, both your SELECTs are using the same database connection (and
it's undefined which one it is) without any locking. You need to add
"AT clauses" to specify an explicit connection. See attached diff.
Ah, that'd be it. I spent some time debugging last night, and I'd realised the
problem lay in the fact that the preproc was outputting NULL as the
connection name, but was unsure why. Your changes allowed both threads to
complete their inserts, which is great news for us!
I'll add that "AT" clause to my list of updates for the documentation - it
might be important. It's kinda.... absent... from the manual.
I might also add a section on using pthreads with ECPG, since people porting
from Informix or Sybase might require such info up front.
However, i've not tried it... I'll try and get some time!
That'd be great if you could... there appears to still be a problem occurring
at "EXEC SQL DISCONNECT con_name". I'll look into it tonight if I can.
All this does kinda raise the interesting question of why it worked at all on
FreeBSD... probably different scheduling and blind luck, I suppose.
Thanks for the reponse - I'm a happy man. By 7.4, we should be able to start
porting our apps to Postgres in earnest.
Regards, Philip.
On Fri, Jun 27, 2003 at 10:45:46AM +1000, Philip Yarra wrote:
ECPGget_connection, both of which share a mutex. Would it be okay if we did
the following:
...
As you know I have never tried using threads, so feel free to go ahead
and change this. Either commit to cvs ot send me a patch.
Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De
ICQ: 179140304, AIM: michaelmeskes, Jabber: meskes@jabber.org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
Import Notes
Reply to msg id not found: 200306271045.46789.philip@utiba.com
On Thu, 26 Jun 2003 11:19 am, Philip Yarra wrote:
there appears to still be a problem
occurring at "EXEC SQL DISCONNECT con_name". I'll look into it tonight if I
can.
I did some more poking around last night, and believe I have found the issue:
RedHat Linux 7.3 (the only distro I have access to currently) ships with a
fairly challenged pthreads inplementation. The default mutex type (which you
get from PTHREAD_MUTEX_INITIALIZER) is, according the the man page,
PTHREAD_MUTEX_FAST_NP which is not a recursive mutex. If a thread owns a
mutex and attempts to lock the mutex again, it will hang.
By replacing PTHREAD_MUTEX_INITIALIZER with PTHREAD_MUTEX_RECURSIVE_NP for the
two mutexes that are used recursively (debug_mutex and connections_mutex) I
got my sample app to work flawlessly on Linux RedHat 7.3
Sadly, the _NP suffix is used to indicate non-portable, so of course my
FreeBSD box steadfastly refused to compile it. Darn.
The correct way to do this appears to be:
pthread_mutexattr_t *mattr;
pthread_mutexattr_settype(mattr, PTHREAD_MUTEX_RECURSIVE);
(will verify this against FreeBSD when I get home, and Tru64 man page
indicates support for this too, so I'll test that later). It won't work on
RedHat Linux 7.3... I guess something like:
#ifdef DODGY_PTHREADS
#define PTHREAD_MUTEX_RECURSIVE = PTHREAD_MUTEX_RECURSIVE_NP
#endif
might do it... if we could detect the problem during configure. How is this
sort of detection handled in other cases (such as long long, etc)?
The other solution I can think of is to eradicate the two recursive locks I
found.
One is simple: ECPGlog calls ECPGdebug, which share debug_mutex - it ought to
be okay to use different mutexes for each of these functions (there's a risk
someone might call ECPGdebug while someone else is running through ECPGlog,
but I think it is less likely, since it is a debug mechanism.)
The second recursive lock I found is ECPGdisconnect calling
ECPGget_connection, both of which share a mutex. Would it be okay if we did
the following:
ECPGdisconnect() still locks connections_mutex, but calls
ECPGget_connection_nr() instead of ECPGget_connection()
ECPGget_connection() becomes a locking wrapper, which locks connections_mutex
then calls ECPGget_connection_nr()
ECPGget_connection_nr() is a non-locking function which implements what
ECPGget_connection() currently does.
I'm not sure if this sort of thing is okay (and there may be other recursive
locking scenarios that I haven't exercised yet).
What approach should I take? I'm leaning towards eradicating recursive locks,
unless someone has a good reason not to.
All this does kinda raise the interesting question of why it worked at all
on FreeBSD... probably different scheduling and blind luck, I suppose.
FreeBSD 4.8 must have PTHREAD_MUTEX_RECURSIVE as default mutex type. I'm a bit
concerned about FreeBSD 4.2 though - I noticed (before I blew it away in
favour of 4.8) that its pthreads implementation came from a package called
linuxthreads.tgz - it might have inherited the same problematic behaviour.
Could someone with access to or knowledge of FreeBSD 4.2 check what the
default mutex type is there?
Regards, Philip.
I can just see the ad for 7.3's pthreads impementation
"Fast mutexes: zero to deadlock in 6.9 milliseconds!"
According to POSIX 1003.1c-1995, no such mutex-altering function exists.
pthread_mutexattr_get/settype(...) functions are defined by X/Open XSH5
(Unix98). I would suggest writing a wrapper for OSs that don't
implement recursive locks (it's easy enough to make your own
implementation- just check pthread_self() before deciding whether to
lock the mutex- potentially again). Either that or the recursive locks
can be eliminated.
Just for the record, OS X, Solaris 5.8, FreeBSD 4.8, and LinuxThreads
support the UNIX98 version, so perhaps this isn't so important after
all.
On Thursday, June 26, 2003, at 08:45 PM, Philip Yarra wrote:
On Thu, 26 Jun 2003 11:19 am, Philip Yarra wrote:
there appears to still be a problem
occurring at "EXEC SQL DISCONNECT con_name". I'll look into it
tonight if I
can.I did some more poking around last night, and believe I have found the
issue:
RedHat Linux 7.3 (the only distro I have access to currently) ships
with a
fairly challenged pthreads inplementation. The default mutex type
(which you
get from PTHREAD_MUTEX_INITIALIZER) is, according the the man page,
PTHREAD_MUTEX_FAST_NP which is not a recursive mutex. If a thread owns
a
mutex and attempts to lock the mutex again, it will hang.By replacing PTHREAD_MUTEX_INITIALIZER with PTHREAD_MUTEX_RECURSIVE_NP
for the
two mutexes that are used recursively (debug_mutex and
connections_mutex) I
got my sample app to work flawlessly on Linux RedHat 7.3Sadly, the _NP suffix is used to indicate non-portable, so of course my
FreeBSD box steadfastly refused to compile it. Darn.The correct way to do this appears to be:
pthread_mutexattr_t *mattr;
pthread_mutexattr_settype(mattr, PTHREAD_MUTEX_RECURSIVE);(will verify this against FreeBSD when I get home, and Tru64 man page
indicates support for this too, so I'll test that later). It won't
work on
RedHat Linux 7.3... I guess something like:#ifdef DODGY_PTHREADS
#define PTHREAD_MUTEX_RECURSIVE = PTHREAD_MUTEX_RECURSIVE_NP
#endifmight do it... if we could detect the problem during configure. How is
this
sort of detection handled in other cases (such as long long, etc)?The other solution I can think of is to eradicate the two recursive
locks I
found.One is simple: ECPGlog calls ECPGdebug, which share debug_mutex - it
ought to
be okay to use different mutexes for each of these functions (there's
a risk
someone might call ECPGdebug while someone else is running through
ECPGlog,
but I think it is less likely, since it is a debug mechanism.)The second recursive lock I found is ECPGdisconnect calling
ECPGget_connection, both of which share a mutex. Would it be okay if
we did
the following:ECPGdisconnect() still locks connections_mutex, but calls
ECPGget_connection_nr() instead of ECPGget_connection()ECPGget_connection() becomes a locking wrapper, which locks
connections_mutex
then calls ECPGget_connection_nr()ECPGget_connection_nr() is a non-locking function which implements what
ECPGget_connection() currently does.I'm not sure if this sort of thing is okay (and there may be other
recursive
locking scenarios that I haven't exercised yet).What approach should I take? I'm leaning towards eradicating recursive
locks,
unless someone has a good reason not to.All this does kinda raise the interesting question of why it worked
at all
on FreeBSD... probably different scheduling and blind luck, I suppose.FreeBSD 4.8 must have PTHREAD_MUTEX_RECURSIVE as default mutex type.
I'm a bit
concerned about FreeBSD 4.2 though - I noticed (before I blew it away
in
favour of 4.8) that its pthreads implementation came from a package
called
linuxthreads.tgz - it might have inherited the same problematic
behaviour.
Could someone with access to or knowledge of FreeBSD 4.2 check what the
default mutex type is there?Regards, Philip.
I can just see the ad for 7.3's pthreads impementation
"Fast mutexes: zero to deadlock in 6.9 milliseconds!"---------------------------(end of
broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org<><><><><><><><><
AgentM
agentm@cmu.edu
On Fri, 27 Jun 2003 11:58 am, AgentM wrote:
According to POSIX 1003.1c-1995, no such mutex-altering function exists.
Thanks for the info - useful to know.
lock the mutex- potentially again). Either that or the recursive locks
can be eliminated.
Avoiding recursive locks is my preference - the only two I have found ought to
be easy to avoid.
Just for the record, OS X, Solaris 5.8, FreeBSD 4.8, and LinuxThreads
support the UNIX98 version, so perhaps this isn't so important after
all.
Add Tru64 (aka OSF1, aka DEC Unix) to that list. Just checked it.
Regards, Philip.
BSD/OS supports:
The pthreads library conforms to IEEE Std1003.1c
(``POSIX'').
How is that different from UNIX98?
---------------------------------------------------------------------------
Philip Yarra wrote:
On Fri, 27 Jun 2003 11:58 am, AgentM wrote:
According to POSIX 1003.1c-1995, no such mutex-altering function exists.
Thanks for the info - useful to know.
lock the mutex- potentially again). Either that or the recursive locks
can be eliminated.Avoiding recursive locks is my preference - the only two I have found ought to
be easy to avoid.Just for the record, OS X, Solaris 5.8, FreeBSD 4.8, and LinuxThreads
support the UNIX98 version, so perhaps this isn't so important after
all.Add Tru64 (aka OSF1, aka DEC Unix) to that list. Just checked it.
Regards, Philip.
---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, 27 Jun 2003 12:16 pm, Bruce Momjian wrote:
BSD/OS supports:
The pthreads library conforms to IEEE Std1003.1c
(``POSIX'').How is that different from UNIX98?
Just checked up on this: apparently version "g" of the standard does contain
such manipulation functions... and Tru64's man page for
pthread_mutexattr_settype claims:
Interfaces documented on this reference page conform to industry standards
as follows:
IEEE Std 1003.1c-1995, POSIX System Application Program Interface
Of course, they might be lying.
Anyway, hopefully I can just avoid these recursive locks, and avoid finding
out who supports what.
Regards, Philip.