Mixing threaded and non-threaded

Started by Steve Atkinsalmost 22 years ago17 messages
#1Steve Atkins
steve@blighty.com

(I hope this is -hackers appropriate - feel free to point me elsewhere)

I'm using 7.4.1 as the backend to several applications. Until recently,
I've been developing solely single-threaded applications.

I just rebuilt postgresql with --enable-thread-safety, to work with
some multi-threaded code.

When I rebuilt libpq to use threads, I started seeing a bunch of weird
failures in many of the older applications. The change in libpq meant
that libpthread was being dynamically linked into the non-thread-aware
applications, leading to some mutex deadlocks in their signal
handlers, hanging those applications.

There doesn't seem to be any tidy way to build and use both threaded
and non-threaded libpq on the same system (LD_LIBRARY_PATH hacks
aren't really viable for distributed code). Is there something I'm
missing?

(If it's relevant, the OS in question is RedHat Linux, but I'm
maintaining the same suite of apps on several other architectures.)

Cheers,
Steve

#2Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Steve Atkins (#1)
Re: Mixing threaded and non-threaded

Steve Atkins wrote:

(I hope this is -hackers appropriate - feel free to point me elsewhere)

I'm using 7.4.1 as the backend to several applications. Until recently,
I've been developing solely single-threaded applications.

I just rebuilt postgresql with --enable-thread-safety, to work with
some multi-threaded code.

When I rebuilt libpq to use threads, I started seeing a bunch of weird
failures in many of the older applications. The change in libpq meant
that libpthread was being dynamically linked into the non-thread-aware
applications, leading to some mutex deadlocks in their signal
handlers, hanging those applications.

There doesn't seem to be any tidy way to build and use both threaded
and non-threaded libpq on the same system (LD_LIBRARY_PATH hacks
aren't really viable for distributed code). Is there something I'm
missing?

No, there is not. We could compile two versions, and have you specify
the threaded version only when you want it, but only some operating
systems have that distinction, so then we would have to identical
libraries on some platforms, and different ones on others, and that
seemed pretty confusing. Of course, we can always revisit this.

(If it's relevant, the OS in question is RedHat Linux, but I'm
maintaining the same suite of apps on several other architectures.)

This is interesting. I had not considered that libpq's calls to
libpthread would cause problems. In fact, libpq shouldn't be doing
anything special with pthread except for a few calls used in
port/thread.c. However, the issue we always were worried about was that
linking against libpthread would cause some unexpected thread calls in
the application, and it looks like that is exactly what you are seeing.
In fact, it sounds like it is the calls to allow synchronous signals to
be delivered to the thread that generated them that might be the new
change you are seeing.

My guess is that creating applications against the non-thread libpq and
then replacing it with a threaded libpq is your problem. I guess the
question is whether you would like to have two libpq's and have to
decide at link time if you wanted threading, or just have one libpq and
make sure you recompile if you change the threading behavior of the
library. We considered the later to be clearer.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#3Steve Atkins
steve@blighty.com
In reply to: Bruce Momjian (#2)
Re: Mixing threaded and non-threaded

On Fri, Jan 23, 2004 at 10:03:30PM -0500, Bruce Momjian wrote:

Steve Atkins wrote:

When I rebuilt libpq to use threads, I started seeing a bunch of weird
failures in many of the older applications. The change in libpq meant
that libpthread was being dynamically linked into the non-thread-aware
applications, leading to some mutex deadlocks in their signal
handlers, hanging those applications.

There doesn't seem to be any tidy way to build and use both threaded
and non-threaded libpq on the same system (LD_LIBRARY_PATH hacks
aren't really viable for distributed code). Is there something I'm
missing?

No, there is not. We could compile two versions, and have you specify
the threaded version only when you want it, but only some operating
systems have that distinction, so then we would have to identical
libraries on some platforms, and different ones on others, and that
seemed pretty confusing. Of course, we can always revisit this.

(If it's relevant, the OS in question is RedHat Linux, but I'm
maintaining the same suite of apps on several other architectures.)

This is interesting. I had not considered that libpq's calls to
libpthread would cause problems. In fact, libpq shouldn't be doing
anything special with pthread except for a few calls used in
port/thread.c.

Yes, libpqs use of actual use of pthread seems pretty harmless.

However, the issue we always were worried about was that
linking against libpthread would cause some unexpected thread calls in
the application, and it looks like that is exactly what you are seeing.
In fact, it sounds like it is the calls to allow synchronous signals to
be delivered to the thread that generated them that might be the new
change you are seeing.

Exactly that, yes.

My guess is that creating applications against the non-thread libpq and
then replacing it with a threaded libpq is your problem.

Yes. It seems to make no difference whether the application is rebuilt
or not. It's pulling libpthread into a non-thread-aware application
that's the problem.

The only fix that would allow the non-threaded application to work
with a thread-safe libpq would be to rewrite it to be a threaded
application with a single active thread.

I guess the
question is whether you would like to have two libpq's and have to
decide at link time if you wanted threading, or just have one libpq and
make sure you recompile if you change the threading behavior of the
library. We considered the later to be clearer.

Recompiling doesn't neccesarily help unless the application is also
rewritten. Also, if there are dozens of non-threaded applications
using libpq on a system (possibly installed via rpms or equivalent)
then replacing the system libpq could break something else.

For now I'm just building and distributing two different libpqs and
choosing between them with rpath hacks (yes, renaming one of them
might be easier, but I'm specifying rpath explicitly anyway for other
reasons). That seems to be working just fine for me.

If there are multiple applications on the system using PostgreSQL we
really don't want to break some of them if libpq is rebuilt to support
a new one. Probably worth a mention in the documentation at least.

Cheers,
Steve

#4Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Steve Atkins (#3)
Re: Mixing threaded and non-threaded

Steve Atkins wrote:

My guess is that creating applications against the non-thread libpq and
then replacing it with a threaded libpq is your problem.

Yes. It seems to make no difference whether the application is rebuilt
or not. It's pulling libpthread into a non-thread-aware application
that's the problem.

The only fix that would allow the non-threaded application to work
with a thread-safe libpq would be to rewrite it to be a threaded
application with a single active thread.

Woh, as far as I know, any application should run fine with -lpthread,
threaded or not. What OS are you on? This is the first I have heard of
this problem.

I guess the
question is whether you would like to have two libpq's and have to
decide at link time if you wanted threading, or just have one libpq and
make sure you recompile if you change the threading behavior of the
library. We considered the later to be clearer.

Recompiling doesn't neccesarily help unless the application is also
rewritten. Also, if there are dozens of non-threaded applications
using libpq on a system (possibly installed via rpms or equivalent)
then replacing the system libpq could break something else.

Why? How would you rewrite it?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#5Steve Atkins
steve@blighty.com
In reply to: Bruce Momjian (#4)
Re: Mixing threaded and non-threaded

On Tue, Jan 27, 2004 at 02:07:44PM -0500, Bruce Momjian wrote:

Steve Atkins wrote:

My guess is that creating applications against the non-thread libpq and
then replacing it with a threaded libpq is your problem.

Yes. It seems to make no difference whether the application is rebuilt
or not. It's pulling libpthread into a non-thread-aware application
that's the problem.

The only fix that would allow the non-threaded application to work
with a thread-safe libpq would be to rewrite it to be a threaded
application with a single active thread.

Woh, as far as I know, any application should run fine with -lpthread,
threaded or not. What OS are you on? This is the first I have heard of
this problem.

Linux/i386, RedHat 7.something, gcc 2.96. Not my favorite
configuration, but nothing particularly odd.

I guess the
question is whether you would like to have two libpq's and have to
decide at link time if you wanted threading, or just have one libpq and
make sure you recompile if you change the threading behavior of the
library. We considered the later to be clearer.

Recompiling doesn't neccesarily help unless the application is also
rewritten. Also, if there are dozens of non-threaded applications
using libpq on a system (possibly installed via rpms or equivalent)
then replacing the system libpq could break something else.

Why? How would you rewrite it?

No idea. I've not looked at exactly what's going on, yet.

It's perfectly possible that the problem I'm seeing is actually a bug
in the underlying code - but it's been used in heavy production use
for two years without pthread, and deadlocked immediately when built
with pthread, so it's the sort of bug that could be elsewhere.

It's a very complex application, so I'd really need to reduce it to
a test case to narrow it down.

A hint, though, might be that it's a multiprocess application with a
single master process that controls dozens of child processes. When the
master shuts down it asks all the children to shut down, and then it
deadlocks in the SIGCHILD handler.

I'll burrow a bit deeper when I get some time.

Cheers,
Steve

#6Scott Lamb
slamb@slamb.org
In reply to: Steve Atkins (#5)
Re: Mixing threaded and non-threaded

On Jan 27, 2004, at 1:16 PM, Steve Atkins wrote:

A hint, though, might be that it's a multiprocess application with a
single master process that controls dozens of child processes. When the
master shuts down it asks all the children to shut down, and then it
deadlocks in the SIGCHILD handler.

It's not safe to do anything interesting in a SIGCHLD handler, unless
you have pretty severe restrictions on when the signal can arrive. Take
a look at
<http://www.opengroup.org/onlinepubs/007904975/functions/
xsh_chap02_04.html>. It contains a list of all the async signal-safe
functions in SUSv3. It's a pretty short list. Notably absent are
pthread_mutex_*() and malloc() (and anything that uses them).

Scott Lamb

#7Manfred Spraul
manfred@colorfullife.com
In reply to: Bruce Momjian (#4)
Re: Mixing threaded and non-threaded

Bruce Momjian wrote:

Woh, as far as I know, any application should run fine with -lpthread,
threaded or not. What OS are you on? This is the first I have heard of
this problem.

Perhaps we should try to figure out how other packages handle
multithreaded/singlethreaded libraries? I'm looking at openssl right
now, and openssl never links against libpthread: The caller is
responsible for registering the locking primitives.

--
Manfred

#8Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Manfred Spraul (#7)
Re: Mixing threaded and non-threaded

Manfred Spraul wrote:

Bruce Momjian wrote:

Woh, as far as I know, any application should run fine with -lpthread,
threaded or not. What OS are you on? This is the first I have heard of
this problem.

Perhaps we should try to figure out how other packages handle
multithreaded/singlethreaded libraries? I'm looking at openssl right
now, and openssl never links against libpthread: The caller is
responsible for registering the locking primitives.

We perhaps don't need -lpthread for creating libpq, but only for ecpg.
However, now that we have used thread locking for SIGPIPE, we are now
calling pthread from libpq, but only 7.5.

However, I still don't understand why the user is seeing a problem and
what rewrite he thinks is necessary for his application because pthread
is linked in.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#9Scott Lamb
slamb@slamb.org
In reply to: Bruce Momjian (#8)
Re: Mixing threaded and non-threaded

On Jan 30, 2004, at 3:18 AM, Bruce Momjian wrote:

Manfred Spraul wrote:

Bruce Momjian wrote:

Woh, as far as I know, any application should run fine with
-lpthread,
threaded or not. What OS are you on? This is the first I have
heard of
this problem.

Perhaps we should try to figure out how other packages handle
multithreaded/singlethreaded libraries? I'm looking at openssl right
now, and openssl never links against libpthread: The caller is
responsible for registering the locking primitives.

Some other libraries, such as boost, always link against -lpthread when
it is present.

I don't think OpenSSL's example is a good one to follow. It's way too
easy to forget to do that, and then your application is broken. You'll
have weird crashes that will be hard to figure out. I think OpenSSL was
made such because pthreads was not so common back in the day; they
probably wanted to support other threading APIs. That's unnecessary
now.

Another reason might be to avoid the expense of locks when they are
unnecessary. But also, I think that is not as necessary as it once was,
particularly with modern systems like Linux+NPTL having locks cost
virtually nothing when there is no contention.

We perhaps don't need -lpthread for creating libpq, but only for ecpg.
However, now that we have used thread locking for SIGPIPE, we are now
calling pthread from libpq, but only 7.5.

However, I still don't understand why the user is seeing a problem and
what rewrite he thinks is necessary for his application because pthread
is linked in.

I'm 99% certain that any application will work with -lpthread on RedHat
Linux. And 95% certain that's true on _any_ platform. There's no
pthread_init() or anything; the distinction he was describing between a
non-threaded application and a threaded application with only one
thread doesn't exist as far as I know.

And he mentioned that the deadlocks are occurring in a SIGCHLD handler.
Since so few functions are async signal-safe (I doubt anything in libpq
is), the code in question was broken before; the extra locking is just
making it more obvious.

Speaking of async signal-safe functions, pthread_getspecific() isn't
specified to be (and thus PQinSend() and thus
sigpipe_handler_ignore_send()). It's probably okay, but libpq is
technically using undefined behavior according to SUSv3.

Scott Lamb

#10Steve Atkins
steve@blighty.com
In reply to: Scott Lamb (#9)
Re: Mixing threaded and non-threaded

On Fri, Jan 30, 2004 at 11:10:49AM -0600, Scott Lamb wrote:

On Jan 30, 2004, at 3:18 AM, Bruce Momjian wrote:

Manfred Spraul wrote:

Bruce Momjian wrote:

Woh, as far as I know, any application should run fine with
-lpthread,
threaded or not. What OS are you on? This is the first I have
heard of
this problem.

Perhaps we should try to figure out how other packages handle
multithreaded/singlethreaded libraries? I'm looking at openssl right
now, and openssl never links against libpthread: The caller is
responsible for registering the locking primitives.

I don't think changing the linking approach is a good thing. But a
mention in the documentation might be.

We perhaps don't need -lpthread for creating libpq, but only for ecpg.
However, now that we have used thread locking for SIGPIPE, we are now
calling pthread from libpq, but only 7.5.

However, I still don't understand why the user is seeing a problem and
what rewrite he thinks is necessary for his application because pthread
is linked in.

I suspect the rewrite needed is to avoid doing Bad Things in the signal
handler.

I'm 99% certain that any application will work with -lpthread on RedHat
Linux. And 95% certain that's true on _any_ platform. There's no
pthread_init() or anything; the distinction he was describing between a
non-threaded application and a threaded application with only one
thread doesn't exist as far as I know.

That may be true for any correctly written application, but it's
certainly not true for any application. The distinction is, at the
very least, that some system calls are wrapped with mutexes.

And he mentioned that the deadlocks are occurring in a SIGCHLD handler.
Since so few functions are async signal-safe (I doubt anything in libpq
is), the code in question was broken before; the extra locking is just
making it more obvious.

I tend to agree. However, while it may have been broken before, it
worked flawlessly in multiple production environments on several
different operating systems for several years when not linked with
pthread.

Cheers,
Steve

#11Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Scott Lamb (#9)
Re: Mixing threaded and non-threaded

Scott Lamb wrote:

Speaking of async signal-safe functions, pthread_getspecific() isn't
specified to be (and thus PQinSend() and thus
sigpipe_handler_ignore_send()). It's probably okay, but libpq is
technically using undefined behavior according to SUSv3.

Yikes. I never suspected pthread_getspecific() would not be signal safe
because it is already thread safe, but I see the point that it is called
in the current thread. Any ideas how to fix this?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#12Scott Lamb
slamb@slamb.org
In reply to: Bruce Momjian (#11)
Re: Mixing threaded and non-threaded

Bruce Momjian wrote:

Scott Lamb wrote:

Speaking of async signal-safe functions, pthread_getspecific() isn't
specified to be (and thus PQinSend() and thus
sigpipe_handler_ignore_send()). It's probably okay, but libpq is
technically using undefined behavior according to SUSv3.

Yikes. I never suspected pthread_getspecific() would not be signal safe
because it is already thread safe, but I see the point that it is called
in the current thread. Any ideas how to fix this?

A few idea.

When I ran a similar situation in my own code, my approach was to just
add a comment to make the assumption explicit. It's quite possible the
standard is just overly conservative. Some specific platforms -
<http://www.qnx.com/developer/docs/qnx_6.1_docs/neutrino/lib_ref/p/pthread_getspecific.html&gt;
- mark it as being async signal-safe.

Searching for "pthread_getspecific signal" on google groups turns up a
bunch of other people who have run into this same problem. One person
notes that it's definitely not safe on LinuxThreads if you use
sigaltstack().

If your platform has SA_SIGINFO, you could - in theory - use the
ucontext argument to see if that thread is in a PostgreSQL operation.
But I doubt that's portable.

You could just do a pthread_sigmask() before and after the
pthread_setspecific() to guarantee that no SIGPIPE will arrive on that
thread in that time. I think it's pretty safe to assume that as long as
you're not doing a pthread_[gs]etspecific() on that same pthread_key_t,
it's safe.

There's one thread function that is guaranteed to be async signal-safe -
sem_post(). (Though apparently older LinuxThreads on x86 fails to meet
this assumption.) I'm not quite sure what you could do with that, but
apparently there's something or they wouldn't have gone to the effort of
making it so.

Scott

#13Scott Lamb
slamb@slamb.org
In reply to: Scott Lamb (#12)
Re: Mixing threaded and non-threaded

Scott Lamb wrote:

You could just do a pthread_sigmask() before and after the
pthread_setspecific() to guarantee that no SIGPIPE will arrive on that
thread in that time. I think it's pretty safe to assume that as long as
you're not doing a pthread_[gs]etspecific() on that same pthread_key_t,
it's safe.

Actually, thinking about this a bit more, that might not even be
necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous?
(I.e., is the SIGPIPE guaranteed to arrive during the offending system
call?) I was thinking not, but maybe yes. I can't seem to find a
straight answer. A lot of documents seem to confuse thread-directed and
synchronous, when they're not quite the same thing. SIGALRM-via-alarm()
is thread-directed but obviously asynchronous.

#14Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Scott Lamb (#12)
Re: Mixing threaded and non-threaded

Scott Lamb wrote:

You could just do a pthread_sigmask() before and after the
pthread_setspecific() to guarantee that no SIGPIPE will arrive on that
thread in that time. I think it's pretty safe to assume that as long as
you're not doing a pthread_[gs]etspecific() on that same pthread_key_t,
it's safe.

I call pthread_setspecific() in the SIGPIPE handler. How sdoes
pthread_sigmask() help me at that point?

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#15Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Scott Lamb (#13)
Re: Mixing threaded and non-threaded

Scott Lamb wrote:

Scott Lamb wrote:

You could just do a pthread_sigmask() before and after the
pthread_setspecific() to guarantee that no SIGPIPE will arrive on that
thread in that time. I think it's pretty safe to assume that as long as
you're not doing a pthread_[gs]etspecific() on that same pthread_key_t,
it's safe.

Actually, thinking about this a bit more, that might not even be
necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous?
(I.e., is the SIGPIPE guaranteed to arrive during the offending system
call?) I was thinking not, but maybe yes. I can't seem to find a
straight answer. A lot of documents seem to confuse thread-directed and
synchronous, when they're not quite the same thing. SIGALRM-via-alarm()
is thread-directed but obviously asynchronous.

SIGPIPE is a sychronous signal that is called during the read() in
libpq. I am not sure what thread-directed is.

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#16Scott Lamb
slamb@slamb.org
In reply to: Bruce Momjian (#15)
Re: Mixing threaded and non-threaded

On Jan 30, 2004, at 4:53 PM, Bruce Momjian wrote:

Actually, thinking about this a bit more, that might not even be
necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous?
(I.e., is the SIGPIPE guaranteed to arrive during the offending system
call?) I was thinking not, but maybe yes. I can't seem to find a
straight answer. A lot of documents seem to confuse thread-directed
and
synchronous, when they're not quite the same thing.
SIGALRM-via-alarm()
is thread-directed but obviously asynchronous.

SIGPIPE is a sychronous signal that is called during the read() in
libpq. I am not sure what thread-directed is.

Ahh, then the usage in libpq is safe; sorry for the false alarm. The
concerns about signal safety are really only for async signals, as the
behavior is undefined only when one async signal-unsafe function is
called from a signal interrupting another:

"In the presence of signals, all functions defined by this volume of
IEEE Std 1003.1-2001 shall behave as defined when called from or
interrupted by a signal-catching function, with a single exception:
when a signal interrupts an unsafe function and the signal-catching
function calls an unsafe function, the behavior is undefined."

thread-directed, by the way, simply means that the signal is directed
at a specific thread, not just some thread in the process that doesn't
have it masked. It's the difference between kill() and pthread_kill().
AFAIK, all synchronous signals are thread-directed, but not all
thread-directed signals are synchronous.

Here the signal is synchronous, so the signal is guaranteed to happen
at a safe point (during the read()), so there's no problem.

Thanks,
Scott Lamb

#17Bruce Momjian
pgman@candle.pha.pa.us
In reply to: Scott Lamb (#16)
Re: Mixing threaded and non-threaded

OK, thanks.

---------------------------------------------------------------------------

Scott Lamb wrote:

On Jan 30, 2004, at 4:53 PM, Bruce Momjian wrote:

Actually, thinking about this a bit more, that might not even be
necessary. Is SIGPIPE-via-(read|write) synchronous or asynchronous?
(I.e., is the SIGPIPE guaranteed to arrive during the offending system
call?) I was thinking not, but maybe yes. I can't seem to find a
straight answer. A lot of documents seem to confuse thread-directed
and
synchronous, when they're not quite the same thing.
SIGALRM-via-alarm()
is thread-directed but obviously asynchronous.

SIGPIPE is a sychronous signal that is called during the read() in
libpq. I am not sure what thread-directed is.

Ahh, then the usage in libpq is safe; sorry for the false alarm. The
concerns about signal safety are really only for async signals, as the
behavior is undefined only when one async signal-unsafe function is
called from a signal interrupting another:

"In the presence of signals, all functions defined by this volume of
IEEE?Std?1003.1-2001 shall behave as defined when called from or
interrupted by a signal-catching function, with a single exception:
when a signal interrupts an unsafe function and the signal-catching
function calls an unsafe function, the behavior is undefined."

thread-directed, by the way, simply means that the signal is directed
at a specific thread, not just some thread in the process that doesn't
have it masked. It's the difference between kill() and pthread_kill().
AFAIK, all synchronous signals are thread-directed, but not all
thread-directed signals are synchronous.

Here the signal is synchronous, so the signal is guaranteed to happen
at a safe point (during the read()), so there's no problem.

Thanks,
Scott Lamb

-- 
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073