ERRORDATA_STACK_SIZE panic crashes on Windows
Have any Windows-using hackers tried to look into the reports of
$SUBJECT on 8.3? We have two fresh reports:
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php
and this isn't the first time we've heard of it.
I spent some time chasing the theory that the conversion from
UTF8 to the client encoding was failing; but if that's the case
it should be reproducible on non-Windows machines, and I didn't
have any luck with that.
What I'm currently thinking is that maybe gettext() isn't on the
same page as us concerning what encoding it's supposed to produce
its output in. This is reinforced by the mention of changing
lc_messages in the first report above. We had had some discussions
of trying to adjust the LC_XXX values to ensure that they all
represent the same encoding choice, but I don't believe that got done.
It might also be significant that both complainants are using UTF8
database encoding; IIRC that has some weird status in the Windows
locale world.
I am also toying with the idea of disabling gettext usage once we
get past two or so levels of error nesting, in order to prevent the
recursion panic in this type of scenario --- but it would be real
nice to know for certain what is happening before we try to fix it.
regards, tom lane
I wrote:
Have any Windows-using hackers tried to look into the reports of
$SUBJECT on 8.3? We have two fresh reports:
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php
and this isn't the first time we've heard of it.
And now we have another report:
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00159.php
for which the complainant was kind enough to supply the requested
details, and they seem to confirm my previous theory:
What I'm currently thinking is that maybe gettext() isn't on the
same page as us concerning what encoding it's supposed to produce
its output in.
After digging around in the source code a bit, the reason why (and why
it's only seen on Windows) became blindingly obvious :-(. On Windows
we specifically allow UTF-8 database encoding to be selected regardless
of the system locale settings. This is (AFAIK) safe for the basic
locale-dependent stuff we do, because that all goes through special
UTF-16-using code paths. But it leaves gettext utterly misinformed
about what it is supposed to do.
Fortunately there is a way to tell gettext what to do, and accordingly
I propose the attached patch. I am not in a position to test it
however. Would somebody replicate the failure and confirm this
fixes it?
regards, tom lane
Tom Lane wrote:
I wrote:
Have any Windows-using hackers tried to look into the reports of
$SUBJECT on 8.3? We have two fresh reports:
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php
and this isn't the first time we've heard of it.And now we have another report:
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00159.php
for which the complainant was kind enough to supply the requested
details, and they seem to confirm my previous theory:What I'm currently thinking is that maybe gettext() isn't on the
same page as us concerning what encoding it's supposed to produce
its output in.After digging around in the source code a bit, the reason why (and why
it's only seen on Windows) became blindingly obvious :-(. On Windows
we specifically allow UTF-8 database encoding to be selected
regardless of the system locale settings. This is (AFAIK) safe for
the basic locale-dependent stuff we do, because that all goes through
special UTF-16-using code paths. But it leaves gettext utterly
misinformed about what it is supposed to do.Fortunately there is a way to tell gettext what to do, and accordingly
I propose the attached patch. I am not in a position to test it
however. Would somebody replicate the failure and confirm this
fixes it?
After some work, I've managed to reproduce it in my test environment for
Swedish, and I can confirm that the patch fixes the issue.
Just for kicks, I've applied this patch so you, so you get to be on the
receiving side of that ;-)
//Magnus
//Magnus
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
Fortunately there is a way to tell gettext what to do, and accordingly
I propose the attached patch. I am not in a position to test it
however. Would somebody replicate the failure and confirm this
fixes it?
After some work, I've managed to reproduce it in my test environment for
Swedish, and I can confirm that the patch fixes the issue.
Thanks.
Just for kicks, I've applied this patch so you, so you get to be on the
receiving side of that ;-)
No objection here.
I noticed that you applied the patch to 8.2 as well. It should be
harmless enough, but we weren't having the problem in 8.2 were we?
Or am I just confused?
regards, tom lane
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
Fortunately there is a way to tell gettext what to do, and accordingly
I propose the attached patch. I am not in a position to test it
however. Would somebody replicate the failure and confirm this
fixes it?After some work, I've managed to reproduce it in my test environment for
Swedish, and I can confirm that the patch fixes the issue.Thanks.
Just for kicks, I've applied this patch so you, so you get to be on the
receiving side of that ;-)No objection here.
I noticed that you applied the patch to 8.2 as well. It should be
harmless enough, but we weren't having the problem in 8.2 were we?
Or am I just confused?
I am seeing a compile falure after this patch on BSD/OS 4.3.1. The
failure is during link of the backend binary:
-lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgres
utils/mb/mbutils.o: In function `SetDatabaseEncoding':
utils/mb/mbutils.o(.text+0xbbc): undefined reference to `bind_textdomain_codeset'
gmake: *** [postgres] Error 1
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
I am seeing a compile falure after this patch on BSD/OS 4.3.1. The
failure is during link of the backend binary:
-lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgres
utils/mb/mbutils.o: In function `SetDatabaseEncoding':
utils/mb/mbutils.o(.text+0xbbc): undefined reference to `bind_textdomain_codeset'
gmake: *** [postgres] Error 1
Hm, I assume you used --enable-nls ... why isn't libintl mentioned
in the link?
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
I am seeing a compile falure after this patch on BSD/OS 4.3.1. The
failure is during link of the backend binary:-lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgres
utils/mb/mbutils.o: In function `SetDatabaseEncoding':
utils/mb/mbutils.o(.text+0xbbc): undefined reference to `bind_textdomain_codeset'
gmake: *** [postgres] Error 1Hm, I assume you used --enable-nls ... why isn't libintl mentioned
in the link?
It was cut off --- the libraries are:
../../src/port/libpgport_srv.a -lintl -lssl -lcrypto -lgetopt -ldl
-lutil -lm -o postgres
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
Hm, I assume you used --enable-nls ... why isn't libintl mentioned
in the link?
It was cut off --- the libraries are:
../../src/port/libpgport_srv.a -lintl -lssl -lcrypto -lgetopt -ldl
-lutil -lm -o postgres
OK, so your version of libintl doesn't have bind_textdomain_codeset?
Is that functionality missing altogether, or just named differently?
regards, tom lane
I wrote:
OK, so your version of libintl doesn't have bind_textdomain_codeset?
Some digging in the gettext changelog suggests that
bind_textdomain_codeset was added in gettext-0.10.36, released
2001-03-29.
We can either add a configure test or say that we don't support
such old versions of gettext ...
regards, tom lane
We can either add a configure test or say that we don't support
such old versions of gettext ...
We don't support seems like a very reasonable response considering the
age.
Sincerely,
Joshua D. Drake
Show quoted text
regards, tom lane
Tom Lane wrote:
I wrote:
OK, so your version of libintl doesn't have bind_textdomain_codeset?
Some digging in the gettext changelog suggests that
bind_textdomain_codeset was added in gettext-0.10.36, released
2001-03-29.We can either add a configure test or say that we don't support
such old versions of gettext ...
Or we could just #ifdef the whole thing to win32, since it's not
really needed on other platforms, pushing that decision to later...
(when that version of gettext will be even more obsolete)
//Magnus
Tom Lane wrote:
Just for kicks, I've applied this patch so you, so you get to be on
the receiving side of that ;-)No objection here.
I noticed that you applied the patch to 8.2 as well. It should be
harmless enough, but we weren't having the problem in 8.2 were we?
Or am I just confused?
I think the underlying problem is still there, it's just that it
doesn't manifest itself in a crash here. I figured it couldn't hurt to
fix it in that case.
//Magnus
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
We can either add a configure test or say that we don't support
such old versions of gettext ...
Or we could just #ifdef the whole thing to win32, since it's not
really needed on other platforms, pushing that decision to later...
(when that version of gettext will be even more obsolete)
That would work for the moment, but we're almost certainly going to
have to insist on bind_textdomain_codeset being available eventually;
AFAICS there's no hope of multi-locale/multi-encoding support without it.
I was considering either:
1. Add a probe for bind_textdomain_codeset to configure, and
conditionalize the new patch on HAVE_BIND_TEXTDOMAIN_CODESET.
2. Adjust the AC_SEARCH_LIBS call to probe for bind_textdomain_codeset()
instead of gettext() as it does now. This would have the effect of
rejecting pre-0.10.36 versions of the library.
Magnus' suggestion gives a third possibility.
I notice that the PGAC_CHECK_GETTEXT macro already contains the comment
dnl FIXME: We should probably check for version >=0.10.36.
So depending on what that's about, there might be some other good
reasons to go with choice #2. Peter, it appears you put that comment in
when you first added the macro, on 2001-06-02. Do you remember why?
regards, tom lane
Tom Lane wrote:
Magnus Hagander <magnus@hagander.net> writes:
Tom Lane wrote:
We can either add a configure test or say that we don't support
such old versions of gettext ...Or we could just #ifdef the whole thing to win32, since it's not
really needed on other platforms, pushing that decision to later...
(when that version of gettext will be even more obsolete)That would work for the moment, but we're almost certainly going to
have to insist on bind_textdomain_codeset being available eventually;
AFAICS there's no hope of multi-locale/multi-encoding support without
it.
Yes, that's why I said it would only push the decision to the future.
Perhaps doing this #ifdef would be a good idea for back-branches, and
then we look at one of the other solutions for 8.4?
I was considering either:
1. Add a probe for bind_textdomain_codeset to configure, and
conditionalize the new patch on HAVE_BIND_TEXTDOMAIN_CODESET.2. Adjust the AC_SEARCH_LIBS call to probe for
bind_textdomain_codeset() instead of gettext() as it does now. This
would have the effect of rejecting pre-0.10.36 versions of the
library.
Depending on the buildfarm issue it may be that the software is antique
enough that almost only Bruce runs such an old version. If so, I think
#2 is just fine (except in back branches, of course)
Magnus' suggestion gives a third possibility.
I notice that the PGAC_CHECK_GETTEXT macro already contains the
comment dnl FIXME: We should probably check for version >=0.10.36.
So depending on what that's about, there might be some other good
reasons to go with choice #2. Peter, it appears you put that comment
in when you first added the macro, on 2001-06-02. Do you remember
why?
Could it possibly be for this very reason?
//Magnus
Magnus Hagander wrote:
2. Adjust the AC_SEARCH_LIBS call to probe for
bind_textdomain_codeset() instead of gettext() as it does now. This
would have the effect of rejecting pre-0.10.36 versions of the
library.Depending on the buildfarm issue it may be that the software is antique
enough that almost only Bruce runs such an old version. If so, I think
#2 is just fine (except in back branches, of course)
You don't have to fix it just for me --- I can remove --enable-nls; the
big question is who else is going to hit this. I think the buildfarm
has safe-enough checking for 8.4, but I am concerned about the back
branches where testing isn't as complete.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote:
Magnus Hagander wrote:
2. Adjust the AC_SEARCH_LIBS call to probe for
bind_textdomain_codeset() instead of gettext() as it does now.
This would have the effect of rejecting pre-0.10.36 versions of
the library.Depending on the buildfarm issue it may be that the software is
antique enough that almost only Bruce runs such an old version. If
so, I think #2 is just fine (except in back branches, of course)You don't have to fix it just for me --- I can remove --enable-nls;
the big question is who else is going to hit this. I think the
buildfarm has safe-enough checking for 8.4, but I am concerned about
the back branches where testing isn't as complete.
This is why I'm suggesting the #ifdef WIN32 for back branches.
//Magnus
On Tue, May 27, 2008 at 8:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
You don't have to fix it just for me --- I can remove --enable-nls; the
big question is who else is going to hit this. I think the buildfarm
has safe-enough checking for 8.4, but I am concerned about the back
branches where testing isn't as complete.
We'll find out in a few hours.
My guess is that anyone happy to be running such an old version of
gettext is probably running old versions of everything, including PG.
Your case is obviously a little different.
--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com
Tom Lane wrote:
I notice that the PGAC_CHECK_GETTEXT macro already contains the comment
��������dnl FIXME: We should probably check for version >=0.10.36.
So depending on what that's about, there might be some other good
reasons to go with choice #2. �Peter, it appears you put that comment in
when you first added the macro, on 2001-06-02. �Do you remember why?
I think that was the first version that worked sanely in general.
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane wrote:
I notice that the PGAC_CHECK_GETTEXT macro already contains the comment
dnl FIXME: We should probably check for version >=0.10.36.
So depending on what that's about, there might be some other good
reasons to go with choice #2. Peter, it appears you put that comment in
when you first added the macro, on 2001-06-02. Do you remember why?
I think that was the first version that worked sanely in general.
Hmm. Bruce, what gettext version are you running exactly, and how much
have you ever tested the localization behavior with it?
regards, tom lane
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
Tom Lane wrote:
I notice that the PGAC_CHECK_GETTEXT macro already contains the comment
dnl FIXME: We should probably check for version >=0.10.36.
So depending on what that's about, there might be some other good
reasons to go with choice #2. Peter, it appears you put that comment in
when you first added the macro, on 2001-06-02. Do you remember why?I think that was the first version that worked sanely in general.
Hmm. Bruce, what gettext version are you running exactly, and how much
have you ever tested the localization behavior with it?
Uh, I can't seem to find the libintl version --- my bet is that it was
installed as part of another package but I am not sure which one.
I have never used it to test localization support so turning it off is
fine --- I just enabled it to catch compile failures.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I think that was the first version that worked sanely in general.
Hmm. Bruce, what gettext version are you running exactly, and how much
have you ever tested the localization behavior with it?
Uh, I can't seem to find the libintl version --- my bet is that it was
installed as part of another package but I am not sure which one.
I have never used it to test localization support so turning it off is
fine --- I just enabled it to catch compile failures.
I dug through the gettext changelogs a bit more. There were *three
years* and a lot of fixes between 0.10.35 and 0.10.36; the one that
is most directly relevant to us is
* Locales which differ only in the character encoding, for example ja_JP and
ja_JP.UTF-8, can now share the same message catalogs. gettext converts
the messages to the appropriate character encoding on the fly.
Since we supply only one message catalog per language, it's probably
fair to say that gettext versions without this ability are broken for
our purposes.
So I'm thinking that the correct fix is to require
bind_textdomain_codeset to be present when --enable-nls is requested.
I will go make that happen in the versions that have the new patch
(ie, 8.2 and up).
regards, tom lane