Thread-unsafe coding in ecpg

Started by Tom Laneabout 7 years ago37 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I've found that a couple of different OpenBSD 6.4 machines fall over
badly in the ecpg regression tests, with output like

test sql/parser ... ok
test thread/thread ... stdout stderr FAILED (test process was terminated by signal 6: Abort trap)
test thread/thread_implicit ... stdout FAILED (test process was terminated by signal 10: Bus error)
test thread/prep ... ok (test process was terminated by signal 10: Bus error)
test thread/alloc ... stderr FAILED (test process was terminated by signal 6: Abort trap)
test thread/descriptor ... ok

It's somewhat variable as to which tests fail, but it's always thread
tests. Examining the core dumps shows traces like

#0 thrkill () at -:3
#1 0x00000c04f427dd6e in _libc_abort () at /usr/src/lib/libc/stdlib/abort.c:51
#2 0x00000c04f425f7e9 in wrterror (d=Variable "d" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:291
#3 0x00000c04f42628fb in find_chunknum (d=Variable "d" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:1043
#4 0x00000c04f425fe23 in ofree (argpool=Variable "argpool" is not available.
)
at /usr/src/lib/libc/stdlib/malloc.c:1359
#5 0x00000c04f425f8ec in free (ptr=0xc04df0e26e0)
at /usr/src/lib/libc/stdlib/malloc.c:1419
#6 0x00000c04f427ec83 in freegl (oldgl=0xc05a022d080)
at /usr/src/lib/libc/locale/setlocale.c:32
#7 0x00000c04f427eb49 in _libc_setlocale (category=4,
locname=0xc059b605180 "C") at /usr/src/lib/libc/locale/setlocale.c:177
#8 0x00000c0531a6f955 in ecpg_do_epilogue (stmt=0xc0587bb0c00)
at execute.c:1986
#9 0x00000c0531a6fa65 in ecpg_do (lineno=Variable "lineno" is not available.
) at execute.c:2018
#10 0x00000c0531a6fb31 in ECPGdo (lineno=Variable "lineno" is not available.
) at execute.c:2037
#11 0x00000c02a9f00b19 in test_thread (arg=Variable "arg" is not available.
) at thread.pgc:131
#12 0x00000c04b180b26e in _rthread_start (v=Variable "v" is not available.
)
at /usr/src/lib/librthread/rthread.c:96
#13 0x00000c04f42ba77b in __tfork_thread ()
at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#14 0x0000000000000000 in ?? ()

or this one:

#0 _libc_strlcpy (dst=0x2c61e133ee0 "C",
src=0xdfdfdfdfdfdfdfdf <Address 0xdfdfdfdfdfdfdfdf out of bounds>,
dsize=256) at /usr/src/lib/libc/string/strlcpy.c:36
#1 0x000002c61de99a71 in _libc_setlocale (category=4, locname=0x0)
from /usr/lib/libc.so.92.5
#2 0x000002c5ae2693a8 in ecpg_do_prologue (lineno=59, compat=0,
force_indicator=1, connection_name=0x0, questionmarks=false,
statement_type=ECPGst_execute, query=0x2c333701418 "i",
args=0x2c61a96f6d0, stmt_out=0x2c61a96f5e0) at execute.c:1776
#3 0x000002c5ae269a20 in ecpg_do (lineno=Variable "lineno" is not available.
) at execute.c:2001
#4 0x000002c5ae269b31 in ECPGdo (lineno=Variable "lineno" is not available.
) at execute.c:2037
#5 0x000002c333600b47 in fn (arg=Variable "arg" is not available.
) at prep.pgc:59
#6 0x000002c56a00b26e in _rthread_start (v=Variable "v" is not available.
)
at /usr/src/lib/librthread/rthread.c:96
#7 0x000002c61ded577b in __tfork_thread ()
at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:75
#8 0x0000000000000000 in ?? ()

The common denominator is always a call to setlocale(), and
that generally is calling malloc or free or some other libc
function that is unhappy. When output appears on stderr,
it's usually free complaining about double-frees or some such.

So my conclusion is that this version of setlocale() has some
thread-safety issues. I was all set to go file a bug report
when I noticed this in the POSIX spec for setlocale:

The setlocale() function need not be thread-safe.

as well as this:

The global locale established using setlocale() shall only be used
in threads for which no current locale has been set using
uselocale() or whose current locale has been set to the global
locale using uselocale(LC_GLOBAL_LOCALE).

IOW, not only is setlocale() not necessarily thread-safe itself,
but using it to change locales in a multithread program is seriously
unsafe because of concurrent effects on other threads.

Therefore, it's plain crazy for ecpg to be calling setlocale() inside
threaded code. It looks to me like what ecpg is doing is trying to defend
itself against non-C LC_NUMERIC settings, which is laudable, but this
implementation of that is totally unsafe.

Don't know what's the best way out of this. The simplest thing would
be to just remove that code and document that you'd better run ecpg
in LC_NUMERIC locale, but it'd be nice if we could do better.

regards, tom lane

#2Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#1)
Re: Thread-unsafe coding in ecpg

So my conclusion is that this version of setlocale() has some
thread-safety issues. I was all set to go file a bug report
when I noticed this in the POSIX spec for setlocale:

The setlocale() function need not be thread-safe.

as well as this:

The global locale established using setlocale() shall only be
used
in threads for which no current locale has been set using
uselocale() or whose current locale has been set to the global
locale using uselocale(LC_GLOBAL_LOCALE).

This one was new to me.

IOW, not only is setlocale() not necessarily thread-safe itself,
but using it to change locales in a multithread program is seriously
unsafe because of concurrent effects on other threads.

Agreed.

Therefore, it's plain crazy for ecpg to be calling setlocale() inside
threaded code. It looks to me like what ecpg is doing is trying to
defend
itself against non-C LC_NUMERIC settings, which is laudable, but this
implementation of that is totally unsafe.

Don't know what's the best way out of this. The simplest thing would
be to just remove that code and document that you'd better run ecpg
in LC_NUMERIC locale, but it'd be nice if we could do better.

How about attached patch? According to my manpages it should only
affect the calling threat. I only tested it on my own system so far.
Could you please have a look and/or test on other systems?

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL

Attachments:

ecpg_locale.difftext/x-patch; charset=UTF-8; name=ecpg_locale.diffDownload+13-6
#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Meskes (#2)
Re: Thread-unsafe coding in ecpg

Michael Meskes <meskes@postgresql.org> writes:

IOW, not only is setlocale() not necessarily thread-safe itself,
but using it to change locales in a multithread program is seriously
unsafe because of concurrent effects on other threads.

Agreed.

How about attached patch? According to my manpages it should only
affect the calling threat. I only tested it on my own system so far.
Could you please have a look and/or test on other systems?

Yeah, I was wondering about uselocale() myself. We cannot assume it's
available everywhere, but it should fix the problem where available.
On machines that don't have it, we could either

(a) have ecpg do nothing, and just hope you're not using a dangerous
locale; or

(b) consider the platform not thread-safe, forcing people to specify
--disable-thread-safety to build.

While (b) has more theoretical purity, I'm inclined to think it
doesn't really improve anybody's life compared to (a), because
--disable-thread-safety doesn't actually stop anyone from using
libpq or ecpglib in threaded environments. It just makes it
more likely to fail when they do.

The OpenBSD 6.4 platform where I found this problem has uselocale
(but the man page notes they only added it as of 6.2). I can test
out the patch there, but I think the interesting questions are all
about what to do on platforms without the function.

regards, tom lane

#4Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#3)
Re: Thread-unsafe coding in ecpg

While (b) has more theoretical purity, I'm inclined to think it
doesn't really improve anybody's life compared to (a), because
--disable-thread-safety doesn't actually stop anyone from using
libpq or ecpglib in threaded environments. It just makes it
more likely to fail when they do.

The question is, what do we do on those platforms? Use setlocale() or
fallback to (a) and document that ecpg has to run in a C locale?

We could also rewrite the parsing of numbers to not be locale
dependent.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Meskes (#4)
Re: Thread-unsafe coding in ecpg

Michael Meskes <meskes@postgresql.org> writes:

While (b) has more theoretical purity, I'm inclined to think it
doesn't really improve anybody's life compared to (a), because
--disable-thread-safety doesn't actually stop anyone from using
libpq or ecpglib in threaded environments. It just makes it
more likely to fail when they do.

The question is, what do we do on those platforms? Use setlocale() or
fallback to (a) and document that ecpg has to run in a C locale?

No, we shouldn't use setlocale(), because it clearly is hazardous
even on platforms where it doesn't fail outright. I don't see
anything so wrong with just documenting the hazard. The situation
isn't noticeably more dangerous than simple use of the C library;
sscanf, strtod, etc are all likely to do surprising things when
LC_NUMERIC isn't C.

We could also rewrite the parsing of numbers to not be locale
dependent.

Perhaps, but that seems like a giant undertaking. I'm not excited
about duplicating strtod(), for instance.

regards, tom lane

#6Andrew Gierth
andrew@tao11.riddles.org.uk
In reply to: Michael Meskes (#2)
Re: Thread-unsafe coding in ecpg

"Michael" == Michael Meskes <meskes@postgresql.org> writes:

Therefore, it's plain crazy for ecpg to be calling setlocale()
inside threaded code. It looks to me like what ecpg is doing is
trying to defend itself against non-C LC_NUMERIC settings, which is
laudable, but this implementation of that is totally unsafe.

Don't know what's the best way out of this. The simplest thing would
be to just remove that code and document that you'd better run ecpg
in LC_NUMERIC locale, but it'd be nice if we could do better.

Would it help if we had non-locale-aware functions for both
floating-point output _and_ input? i.e. import a known-working strtod()
(allowing us to remove all the hacks that have grown up around it, for
special-case input and wonky error handling) with locale functionality
removed.

--
Andrew (irc:RhodiumToad)

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Gierth (#6)
Re: Thread-unsafe coding in ecpg

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Would it help if we had non-locale-aware functions for both
floating-point output _and_ input? i.e. import a known-working strtod()
(allowing us to remove all the hacks that have grown up around it, for
special-case input and wonky error handling) with locale functionality
removed.

Dunno, is there such a thing as a platform-independent strtod()?
I'd have thought that, for instance, typical implementations would
be pretty much in bed with the details of IEEE float format ---
your example where strtof() is different from (float) strtod()
makes it hard to believe that it can be written without assumptions
about the hardware's float format.

(Note that this concern is independent of whether we adopt the Ryu
code, which IIUC also depends on IEEE floats. Our answer for anyone
wanting to run on non-IEEE hardware can be to #ifdef out Ryu and use
the existing float output code. But doing the equivalent thing on the
input side wouldn't solve ecpg's problem.)

regards, tom lane

#8Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#5)
Re: Thread-unsafe coding in ecpg

The question is, what do we do on those platforms? Use setlocale()
or
fallback to (a) and document that ecpg has to run in a C locale?

No, we shouldn't use setlocale(), because it clearly is hazardous
even on platforms where it doesn't fail outright. I don't see
anything so wrong with just documenting the hazard. The situation

Actually I meant using setlocale() and documenting that it must not be
used with threads, or document it must not be used with locales?

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Meskes (#8)
Re: Thread-unsafe coding in ecpg

Michael Meskes <meskes@postgresql.org> writes:

No, we shouldn't use setlocale(), because it clearly is hazardous
even on platforms where it doesn't fail outright. I don't see
anything so wrong with just documenting the hazard. The situation

Actually I meant using setlocale() and documenting that it must not be
used with threads, or document it must not be used with locales?

I tend to think that has more downside than upside, in situations where
people don't read the manual closely and try to do it anyway.

First, there's the probable crash if setlocale() is thread-unsafe.
(Though the lack of previous reports suggests that on most platforms,
it isn't.)

Second, if the program is indeed trying to run with non-C LC_NUMERIC,
using setlocale() will have unsynchronized, hard-to-debug side effects
on other threads. Not using it will have no downside at all if ecpg
isn't trying to read numeric data, while if it does do so, the failures
will be reproducible and easy to understand/debug.

Admittedly, removing the setlocale() call will be a net negative for
single-threaded applications, which are likely the majority. But
I don't know any safe way to tell whether the app is multi threaded.

On the third hand, the lack of previous reports suggests that maybe
this whole thing is seldom a problem in practice. Maybe we should
just use uselocale() where available and otherwise hope it's OK
to keep on doing what we were doing.

regards, tom lane

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#9)
Re: Thread-unsafe coding in ecpg

I wrote:

On the third hand, the lack of previous reports suggests that maybe
this whole thing is seldom a problem in practice. Maybe we should
just use uselocale() where available and otherwise hope it's OK
to keep on doing what we were doing.

If we go with that approach, I think we need to adapt the patch
as attached. I autoconfiscated it and fixed a portability problem
(it didn't compile on macOS, which has these decls in <xlocale.h>).

I've verified that this fixes the problem I was seeing on OpenBSD 6.4.
I've not bothered to test on a platform lacking uselocale() --- I
think it's clear by inspection that the patch doesn't change anything
in that case.

Not sure if we need to document this or not. On platforms with
uselocale(), it should fix the problem without any need for user
attention. On platforms without, there's no change, and given
the lack of previous complaints I'm not sure it's really an issue.

regards, tom lane

Attachments:

ecpg_locale-2.patchtext/x-diff; charset=us-ascii; name=ecpg_locale-2.patchDownload+45-5
#11Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tom Lane (#10)
RE: Thread-unsafe coding in ecpg

On Windows, _configthreadlocale() enables us to restrict the effect of setlocale() only to the calling thread. We can call it in ecpg_do_prolog/epilog().

https://docs.microsoft.com/en-us/cpp/parallel/multithreading-and-locales?view=vs-2017

Regards
Takayuki Tsunakawa

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tsunakawa, Takayuki (#11)
Re: Thread-unsafe coding in ecpg

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:

On Windows, _configthreadlocale() enables us to restrict the effect of setlocale() only to the calling thread. We can call it in ecpg_do_prolog/epilog().

How far back does that exist?

regards, tom lane

#13Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tom Lane (#12)
RE: Thread-unsafe coding in ecpg

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:

On Windows, _configthreadlocale() enables us to restrict the effect of

setlocale() only to the calling thread. We can call it in
ecpg_do_prolog/epilog().

How far back does that exist?

I couldn't find the relevant doc, but I've just confirmed I can use it with Visual Studio 2008 on Win7, which is my oldest combination at hand. VS 2008 is already past its EOL, and the support for Win7 will end next year, so the combination is practically enough.

Regards
Takayuki Tsunakawa

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tsunakawa, Takayuki (#13)
Re: Thread-unsafe coding in ecpg

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

How far back does that exist?

I couldn't find the relevant doc, but I've just confirmed I can use it with Visual Studio 2008 on Win7, which is my oldest combination at hand. VS 2008 is already past its EOL, and the support for Win7 will end next year, so the combination is practically enough.

Hm. Well, I suppose we can figure that the buildfarm should tell us
if there's anything too old that we still care about.

So like this ...

regards, tom lane

Attachments:

ecpg_locale-3.patchtext/x-diff; charset=us-ascii; name=ecpg_locale-3.patchDownload+61-5
#15Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tom Lane (#14)
RE: Thread-unsafe coding in ecpg

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

Hm. Well, I suppose we can figure that the buildfarm should tell us if
there's anything too old that we still care about.

So like this ...

How quick! Thank you. I've reviewed the code for both Unix and Windows, and it looks OK. I haven't built the patch, but expect the buildfarm to do the test.

Regards
Takayuki Tsunakawa

#16Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tsunakawa, Takayuki (#15)
Re: Thread-unsafe coding in ecpg

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

So like this ...

How quick! Thank you. I've reviewed the code for both Unix and Windows, and it looks OK. I haven't built the patch, but expect the buildfarm to do the test.

Thanks for reviewing! I've pushed this now (to HEAD only for the moment),
we'll see what the buildfarm thinks.

BTW, I found another spot in descriptor.c where ecpglib is using
setlocale() for the same purpose. Perhaps that one's not reachable
in threaded apps, but I didn't see any obvious reason to think so,
so I changed it too.

regards, tom lane

#17Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#16)
Re: Thread-unsafe coding in ecpg

Thanks for reviewing! I've pushed this now (to HEAD only for the
moment),
we'll see what the buildfarm thinks.

BTW, I found another spot in descriptor.c where ecpglib is using
setlocale() for the same purpose. Perhaps that one's not reachable
in threaded apps, but I didn't see any obvious reason to think so,
so I changed it too.

Thanks Tom.

Michael
--
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL

#18Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#16)
Re: Thread-unsafe coding in ecpg

On 2019-01-21 12:09:30 -0500, Tom Lane wrote:

"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:

From: Tom Lane [mailto:tgl@sss.pgh.pa.us]

So like this ...

How quick! Thank you. I've reviewed the code for both Unix and Windows, and it looks OK. I haven't built the patch, but expect the buildfarm to do the test.

Thanks for reviewing! I've pushed this now (to HEAD only for the moment),
we'll see what the buildfarm thinks.

BTW, I found another spot in descriptor.c where ecpglib is using
setlocale() for the same purpose. Perhaps that one's not reachable
in threaded apps, but I didn't see any obvious reason to think so,
so I changed it too.

Seems jacana might not have like this change?

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&amp;dt=2019-01-21%2019%3A01%3A28

Greetings,

Andres Freund

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#18)
Re: Thread-unsafe coding in ecpg

Andres Freund <andres@anarazel.de> writes:

Seems jacana might not have like this change?
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&amp;dt=2019-01-21%2019%3A01%3A28

Hmm. So mingw doesn't provide access to _configthreadlocale().
That's unfortunate, at least if we think that mingw is still a viable
production platform, because it means we can't make ecpg thread-safe
on that platform.

Is there a newer version of mingw that does have this functionality?
I'm not sure whether to install a version check or just assume that
it's never there.

regards, tom lane

#20Joshua D. Drake
jd@commandprompt.com
In reply to: Tom Lane (#19)
Re: Thread-unsafe coding in ecpg

On 1/21/19 12:05 PM, Tom Lane wrote:

Andres Freund <andres@anarazel.de> writes:

Seems jacana might not have like this change?
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&amp;dt=2019-01-21%2019%3A01%3A28

Hmm. So mingw doesn't provide access to _configthreadlocale().
That's unfortunate, at least if we think that mingw is still a viable
production platform, because it means we can't make ecpg thread-safe
on that platform.

Is there a newer version of mingw that does have this functionality?
I'm not sure whether to install a version check or just assume that
it's never there.

Apparently this can be done with thee 64bit version:

https://stackoverflow.com/questions/33647271/how-to-use-configthreadlocale-in-mingw

JD

regards, tom lane

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn and Network: https://postgresconf.org
*** A fault and talent of mine is to tell it exactly how it is. ***

#21Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#19)
#22Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joshua D. Drake (#20)
#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#21)
#24Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#22)
#25Tsunakawa, Takayuki
tsunakawa.takay@jp.fujitsu.com
In reply to: Tom Lane (#16)
#26Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#24)
#27Andres Freund
andres@anarazel.de
In reply to: Andrew Dunstan (#26)
#28Andrew Dunstan
andrew@dunslane.net
In reply to: Andrew Dunstan (#26)
#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#28)
#30Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#29)
#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#30)
#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#31)
#33Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#32)
#34Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#32)
#35Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#32)
#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Meskes (#35)
#37Michael Meskes
meskes@postgresql.org
In reply to: Tom Lane (#36)