"could not adopt C locale" failure at startup on Windows
We've seen several reports of $SUBJECT lately. I have no idea what's
going on, but it occurred to me that it could be informative to tweak
init_locale() so that it reports which category failed to be set.
Any objections to squeezing that into today's releases?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2015-06-09 11:20:06 -0400, Tom Lane wrote:
We've seen several reports of $SUBJECT lately. I have no idea what's
going on, but it occurred to me that it could be informative to tweak
init_locale() so that it reports which category failed to be set.
Any objections to squeezing that into today's releases?
No objection, +1. Seems fairly low risk.
The error seems odd. The only even remotely related change between 9.4.1
and .2 seems to be ca325941. Could also be a build environment change.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
On 2015-06-09 11:20:06 -0400, Tom Lane wrote:
We've seen several reports of $SUBJECT lately. I have no idea what's
going on, but it occurred to me that it could be informative to tweak
init_locale() so that it reports which category failed to be set.
Any objections to squeezing that into today's releases?
No objection, +1. Seems fairly low risk.
The error seems odd. The only even remotely related change between 9.4.1
and .2 seems to be ca325941. Could also be a build environment change.
Yeah, my first instinct was to blame ca325941 as well, but I don't think
any of that code executes during init_locale(). Also,
/messages/by-id/20150326040321.2492.24716@wrigleys.postgresql.org
says it's been seen in 9.4.1. Of course, it might be premature to assume
that report had an identical cause to the later ones.
What I plan to do is this:
static void
init_locale(const char *categoryname, int category, const char *locale)
{
if (pg_perm_setlocale(category, locale) == NULL &&
pg_perm_setlocale(category, "C") == NULL)
elog(FATAL, "could not adopt either \"%s\" locale or C locale for %s", locale, categoryname);
}
with the obvious changes to the call sites to pass the string names of
the categories not just their numeric codes. (We could just log the
numbers, but it'd be much harder to interpret.) This might be overkill,
but when you don't know what you're looking for, every little bit helps ...
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 09, 2015 at 12:24:02PM -0400, Tom Lane wrote:
Andres Freund <andres@anarazel.de> writes:
The error seems odd. The only even remotely related change between 9.4.1
and .2 seems to be ca325941. Could also be a build environment change.Yeah, my first instinct was to blame ca325941 as well, but I don't think
any of that code executes during init_locale(). Also,
/messages/by-id/20150326040321.2492.24716@wrigleys.postgresql.org
says it's been seen in 9.4.1.
The return value check and error message were new in 9.4.1. I suspect the
underlying problem has been present far longer, undetected.
I can reproduce this with "initdb --locale=C",
postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the
Windows ANSI code page set to CP936. (Choose "Chinese (Simplified, PRC)" in
Control Panel -> Region and Language -> Administrative -> Language for
non-Unicode programs.) It is neither necessary nor sufficient to change
Control Panel -> Region and Language -> Formats -> Format. Binaries from
postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem. Note
that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding.
What I plan to do is this:
static void
init_locale(const char *categoryname, int category, const char *locale)
{
if (pg_perm_setlocale(category, locale) == NULL &&
pg_perm_setlocale(category, "C") == NULL)
elog(FATAL, "could not adopt either \"%s\" locale or C locale for %s", locale, categoryname);
}
Good to have.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
On Tue, Jun 09, 2015 at 12:24:02PM -0400, Tom Lane wrote:
Yeah, my first instinct was to blame ca325941 as well, but I don't think
any of that code executes during init_locale(). Also,
/messages/by-id/20150326040321.2492.24716@wrigleys.postgresql.org
says it's been seen in 9.4.1.
The return value check and error message were new in 9.4.1. I suspect the
underlying problem has been present far longer, undetected.
Oooh ... I'd forgotten that 6fdba8ceb was so recent. Agreed, what we are
seeing is probably a situation that's been there for a long time, but we
were ignoring the failure up to now (and, evidently, it wasn't really
creating any problems).
I can reproduce this with "initdb --locale=C",
postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the
Windows ANSI code page set to CP936. (Choose "Chinese (Simplified, PRC)" in
Control Panel -> Region and Language -> Administrative -> Language for
non-Unicode programs.) It is neither necessary nor sufficient to change
Control Panel -> Region and Language -> Formats -> Format. Binaries from
postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem. Note
that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding.
Hm. I could understand getting encoding difficulties in that environment,
but it's hard to see why they'd manifest like this. Can you trace through
pg_perm_setlocale and figure out why it's reporting failure?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 10, 2015 at 10:09:38AM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
I can reproduce this with "initdb --locale=C",
postgresql-9.4.3-1-windows-binaries.zip (32-bit), Windows 7 x64, and the
Windows ANSI code page set to CP936. (Choose "Chinese (Simplified, PRC)" in
Control Panel -> Region and Language -> Administrative -> Language for
non-Unicode programs.) It is neither necessary nor sufficient to change
Control Panel -> Region and Language -> Formats -> Format. Binaries from
postgresql-9.4.3-1-windows-x64-binaries.zip do not exhibit the problem. Note
that CP936 is a PG_ENCODING_IS_CLIENT_ONLY() encoding.Hm. I could understand getting encoding difficulties in that environment,
but it's hard to see why they'd manifest like this. Can you trace through
pg_perm_setlocale and figure out why it's reporting failure?
A faster test is to set LC_CTYPE=C in the environment and run "postgres
--version". The root cause is a bug my commit 5f538ad introduced at the start
of the 9.4 cycle. pg_perm_setlocale() now calls pg_bind_textdomain_codeset(),
which calls setlocale(LC_CTYPE, NULL). POSIX permits that to clobber all
previous setlocale() return values, which it did here[1]It does so in 32-bit "release" (non-debug), NLS builds done under Visual Studio 2012 and Visual Studio 2013. The official binaries used VS2013. The symptoms are slightly different under VS2012. I did not test earlier versions. Debug builds and 64-bit builds were unaffected.. The ensuing
putenv("LC_CTYPE=<garbage bytes>") at the end of pg_perm_setlocale() fails
under Windows ANSI code page 936, because the garbage bytes often aren't a
valid CP936 string. I would expect the same symptom on other multibyte
Windows locales.
While Windows was the bellwether, harm potential is greater on non-Windows
systems. pg_perm_setlocale() sets the LC_CTYPE environment variable to help
PL/Perl avoid clobbering the process locale; see plperl_init_interp()
comments. However, that function has bespoke code for Windows, on which
setting the environment variable doesn't help. I don't know which other
platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE,
NULL). Therefore, I propose committing the attached diagnostic patch and
reverting it after about one buildfarm cycle. It will make affected
configurations fail hard, and then I'll have a notion about the prevalence of
damage to expect in the field.
The actual fix is trivial, attached second. This is for back-patch to 9.4.
[1]: It does so in 32-bit "release" (non-debug), NLS builds done under Visual Studio 2012 and Visual Studio 2013. The official binaries used VS2013. The symptoms are slightly different under VS2012. I did not test earlier versions. Debug builds and 64-bit builds were unaffected.
Studio 2012 and Visual Studio 2013. The official binaries used VS2013. The
symptoms are slightly different under VS2012. I did not test earlier
versions. Debug builds and 64-bit builds were unaffected.
Attachments:
pg_perm_setlocale-clobber-v1.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 4be735e..84215e0 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -183,6 +183,12 @@ pg_perm_setlocale(int category, const char *locale)
*/
if (category == LC_CTYPE)
{
+ static char save_lc_ctype[LC_ENV_BUFSIZE];
+
+ /* copy setlocale() return value before callee invokes it again */
+ strlcpy(save_lc_ctype, result, sizeof(save_lc_ctype));
+ result = save_lc_ctype;
+
#ifdef ENABLE_NLS
SetMessageEncoding(pg_bind_textdomain_codeset(textdomain(NULL)));
#else
diagnose-setlocale-clobber-v1.patchtext/plain; charset=us-asciiDownload
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 4be735e..d33081b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -58,6 +58,7 @@
#include "catalog/pg_collation.h"
#include "catalog/pg_control.h"
#include "mb/pg_wchar.h"
+#include "utils/builtins.h"
#include "utils/hsearch.h"
#include "utils/memutils.h"
#include "utils/pg_locale.h"
@@ -148,6 +149,7 @@ pg_perm_setlocale(int category, const char *locale)
char *result;
const char *envvar;
char *envbuf;
+ char orig_result[LC_ENV_BUFSIZE];
#ifndef WIN32
result = setlocale(category, locale);
@@ -173,6 +175,7 @@ pg_perm_setlocale(int category, const char *locale)
if (result == NULL)
return result; /* fall out immediately on failure */
+ strlcpy(orig_result, result, sizeof(orig_result));
/*
* Use the right encoding in translated messages. Under ENABLE_NLS, let
@@ -231,6 +234,15 @@ pg_perm_setlocale(int category, const char *locale)
}
snprintf(envbuf, LC_ENV_BUFSIZE - 1, "%s=%s", envvar, result);
+ if (strcmp(orig_result, result) != 0)
+ {
+ char hex[2 * LC_ENV_BUFSIZE + 1];
+
+ hex_encode(result, Min(1 + strlen(result), LC_ENV_BUFSIZE), hex);
+ hex[sizeof(hex) - 1] = '\0';
+ elog(FATAL, "setlocale() result %s clobbered to 0x%s",
+ orig_result, hex);
+ }
if (putenv(envbuf))
return NULL;
Noah Misch <noah@leadboat.com> writes:
On Wed, Jun 10, 2015 at 10:09:38AM -0400, Tom Lane wrote:
Hm. I could understand getting encoding difficulties in that environment,
but it's hard to see why they'd manifest like this. Can you trace through
pg_perm_setlocale and figure out why it's reporting failure?
A faster test is to set LC_CTYPE=C in the environment and run "postgres
--version". The root cause is a bug my commit 5f538ad introduced at the start
of the 9.4 cycle. pg_perm_setlocale() now calls pg_bind_textdomain_codeset(),
which calls setlocale(LC_CTYPE, NULL). POSIX permits that to clobber all
previous setlocale() return values, which it did here[1].
Ah-hah.
While Windows was the bellwether, harm potential is greater on non-Windows
systems. pg_perm_setlocale() sets the LC_CTYPE environment variable to help
PL/Perl avoid clobbering the process locale; see plperl_init_interp()
comments. However, that function has bespoke code for Windows, on which
setting the environment variable doesn't help. I don't know which other
platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE,
NULL). Therefore, I propose committing the attached diagnostic patch and
reverting it after about one buildfarm cycle. It will make affected
configurations fail hard, and then I'll have a notion about the prevalence of
damage to expect in the field.
I doubt this will teach us anything; if any buildfarm systems were
exhibiting the issue, they'd have been failing all along, no? This should
break at least the bootstrap/initdb case on any affected system.
The actual fix is trivial, attached second. This is for back-patch to 9.4.
Looks sane to me.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
A faster test is to set LC_CTYPE=C in the environment and run "postgres
--version". The root cause is a bug my commit 5f538ad introduced at the start
of the 9.4 cycle. pg_perm_setlocale() now calls pg_bind_textdomain_codeset(),
which calls setlocale(LC_CTYPE, NULL). POSIX permits that to clobber all
previous setlocale() return values, which it did here[1].
After further thought, ISTM that this bug is evidence that 5f538ad
was badly designed, and the proposed fix has blinkers on. If
pg_bind_textdomain_codeset() is looking at the locale environment,
we should not be calling it from inside pg_perm_setlocale() at all,
but from higher level code *after* we're done setting up the whole libc
locale environment --- thus, after the unsetenv("LC_ALL") call in main.c,
and somewhere near the bottom of CheckMyDatabase() in postinit.c.
It's mere chance that the order of calls to pg_perm_setlocale() is
such that the code works now; and it's not hard to imagine future
changes in gettext, or reordering of our own code, that would break it.
So I think having to duplicate that call is a reasonable price to pay
for not having surprising entanglements in what should be a very thin
wrapper around setlocale(3).
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 15, 2015 at 08:47:12AM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
While Windows was the bellwether, harm potential is greater on non-Windows
systems. pg_perm_setlocale() sets the LC_CTYPE environment variable to help
PL/Perl avoid clobbering the process locale; see plperl_init_interp()
comments. However, that function has bespoke code for Windows, on which
setting the environment variable doesn't help. I don't know which other
platforms invalidate previous setlocale() return values on setlocale(LC_CTYPE,
NULL). Therefore, I propose committing the attached diagnostic patch and
reverting it after about one buildfarm cycle. It will make affected
configurations fail hard, and then I'll have a notion about the prevalence of
damage to expect in the field.I doubt this will teach us anything; if any buildfarm systems were
exhibiting the issue, they'd have been failing all along, no?
No; most systems let environment variables carry arbitrary strings of non-nul
bytes, so they don't see $SUBJECT. I want to probe for all systems that are
currently issuing putenv("LC_CTYPE=<garbage>"), not just the ones where a
picky putenv() illuminates it.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote:
After further thought, ISTM that this bug is evidence that 5f538ad
was badly designed, and the proposed fix has blinkers on. If
pg_bind_textdomain_codeset() is looking at the locale environment,
we should not be calling it from inside pg_perm_setlocale() at all,
but from higher level code *after* we're done setting up the whole libc
locale environment --- thus, after the unsetenv("LC_ALL") call in main.c,
and somewhere near the bottom of CheckMyDatabase() in postinit.c.
It's mere chance that the order of calls to pg_perm_setlocale() is
such that the code works now; and it's not hard to imagine future
changes in gettext, or reordering of our own code, that would break it.
pg_bind_textdomain_codeset() looks at one piece of the locale environment,
namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls
does not affect it. There's nothing acutely bad about the alternative you
identify here, but it's no better equipped to forestall mistakes. Moving the
call later would slightly expand the window during which translated messages
are garbled.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote:
It's mere chance that the order of calls to pg_perm_setlocale() is
such that the code works now; and it's not hard to imagine future
changes in gettext, or reordering of our own code, that would break it.
pg_bind_textdomain_codeset() looks at one piece of the locale environment,
namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls
does not affect it.
Well, my point is that that is a larger assumption about the behavior of
pg_bind_textdomain_codeset() than I think we ought to make, even if it
happens to be true today.
There's nothing acutely bad about the alternative you
identify here, but it's no better equipped to forestall mistakes. Moving the
call later would slightly expand the window during which translated messages
are garbled.
I'm not exactly sure that they wouldn't be garbled anyway during the
interval where we're setting these things up. Until DatabaseEncoding,
ClientEncoding, and gettext's internal notions are all in sync, we
are at risk of that type of issue no matter what.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 17, 2015 at 01:43:55PM -0400, Tom Lane wrote:
Noah Misch <noah@leadboat.com> writes:
On Mon, Jun 15, 2015 at 12:37:43PM -0400, Tom Lane wrote:
It's mere chance that the order of calls to pg_perm_setlocale() is
such that the code works now; and it's not hard to imagine future
changes in gettext, or reordering of our own code, that would break it.pg_bind_textdomain_codeset() looks at one piece of the locale environment,
namely setlocale(LC_CTYPE, NULL), so the order of pg_perm_setlocale() calls
does not affect it.Well, my point is that that is a larger assumption about the behavior of
pg_bind_textdomain_codeset() than I think we ought to make, even if it
happens to be true today.
Perhaps it's just me, but I can envision changes of similar plausibility that
break under each approach and not the other. Without a way to distinguish on
that basis, I'm left shrugging about a proposal to switch. For that matter,
if pg_bind_textdomain_codeset() starts to act on other facets of the locale,
that's liable to be a bug independent of our choice here. However locale
facets conflict, we expect LC_CTYPE to control the message codeset.
There's nothing acutely bad about the alternative you
identify here, but it's no better equipped to forestall mistakes. Moving the
call later would slightly expand the window during which translated messages
are garbled.I'm not exactly sure that they wouldn't be garbled anyway during the
interval where we're setting these things up. Until DatabaseEncoding,
ClientEncoding, and gettext's internal notions are all in sync, we
are at risk of that type of issue no matter what.
True; the window exists and is small enough both ways. This is merely one
more reason to fix the bug without fixing what ain't broke.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
On Wed, Jun 17, 2015 at 01:43:55PM -0400, Tom Lane wrote:
I'm not exactly sure that they wouldn't be garbled anyway during the
interval where we're setting these things up. Until DatabaseEncoding,
ClientEncoding, and gettext's internal notions are all in sync, we
are at risk of that type of issue no matter what.
True; the window exists and is small enough both ways. This is merely one
more reason to fix the bug without fixing what ain't broke.
[ shrug... ] I'm not planning to waste a whole lot more breath on this
topic, but to my mind the current definition of pg_perm_setlocale() *is*
broke. Previously, that function only interacted with the standard libc
locale facilities. Now it's also entangled with gettext(), as well as
SetMessageEncoding and GetDatabaseEncoding. IMO that extra cross-module
entanglement is the exact reason we have a bug here. It's a layering
violation and I think we should get rid of it.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers