localization problem (and solution)

Started by Manuel Sugawaraabout 20 years ago24 messages
#1Manuel Sugawara
masm@fciencias.unam.mx

Here is a test case for a previously reported bug (see
http://archives.postgresql.org/pgsql-general/2005-11/msg01235.php):

initdb using es_MX.ISO-8859-1, start postgres using es_MX.UTF-8 and
execute:

create procedural language plperl;
create or replace function foo() returns int as 'return 1' language 'plperl';
create table persona (nombre text check (nombre ~ '^[[:upper:]][[:lower:]]*([-''. [:alpha:]]+)?$'::text));
copy persona (nombre) from stdin;
José
\.

It will error out saying:

ERROR: new row for relation "persona" violates check constraint "persona_nombre_check"
CONTEXT: COPY persona, line 1: "José"

Commenting the creation of the plperl function (or moving it after the copy
command) this script runs without errors. Also applying this patch solves
the problem:

*** src/backend/access/transam/xlog.c~	2005-11-22 12:23:05.000000000 -0600
--- src/backend/access/transam/xlog.c	2005-12-19 20:34:22.000000000 -0600
***************
*** 3626,3631 ****
--- 3626,3632 ----
  					   " which is not recognized by setlocale().",
  					   ControlFile->lc_collate),
  			 errhint("It looks like you need to initdb or install locale support.")));
+         setenv("LC_COLLATE", ControlFile->lc_collate, 1);
  	if (setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL)
  		ereport(FATAL,
  			(errmsg("database files are incompatible with operating system"),
***************
*** 3633,3638 ****
--- 3634,3640 ----
  				  " which is not recognized by setlocale().",
  				  ControlFile->lc_ctype),
  			 errhint("It looks like you need to initdb or install locale support.")));
+         setenv("LC_CTYPE", ControlFile->lc_ctype, 1);

/* Make the fixed locale settings visible as GUC variables, too */
SetConfigOption("lc_collate", ControlFile->lc_collate,

Some fprintf's around the regex code shows that someone is changing
the localization parameters by those found in the enviroment, at least
for the LC_CTYPE and LC_COLLATE categories, and plperl seems to be the
culprit. Needless to say that this bug might lead to index corruption
beside other problems. It also explains some very wired (and very
difficult to reproduce) anomalies I have seen.

Regards,
Manuel.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Manuel Sugawara (#1)
Re: localization problem (and solution)

Manuel Sugawara <masm@fciencias.unam.mx> writes:

Some fprintf's around the regex code shows that someone is changing
the localization parameters by those found in the enviroment, at least
for the LC_CTYPE and LC_COLLATE categories, and plperl seems to be the
culprit.

Indeed. Please file a bug with the Perl people asking what right
libperl has to fool with the localization environment of its host
application.

(Your proposed fix seems entirely useless ... maybe we could fix it
by resetting the LC_FOO variables after every call to libperl, but
I bet that would break libperl instead.)

regards, tom lane

#3Manuel Sugawara
masm@fciencias.unam.mx
In reply to: Tom Lane (#2)
Re: localization problem (and solution)

Tom Lane <tgl@sss.pgh.pa.us> writes:

(Your proposed fix seems entirely useless ...

While there are reasons to argue that's Perl fault, IMO, an
environment that reflects the current state of the host program is a
good compromise, and behave environment-consistent is also a good
compromise for libperl (I think some applications of libperl will get
really upset if this compromise is broken by the library.)

Regards,
Manuel.

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Manuel Sugawara (#3)
Re: localization problem (and solution)

Manuel Sugawara <masm@fciencias.unam.mx> writes:

While there are reasons to argue that's Perl fault, IMO, an
environment that reflects the current state of the host program is a
good compromise, and behave environment-consistent is also a good
compromise for libperl (I think some applications of libperl will get
really upset if this compromise is broken by the library.)

I looked into this a bit more, and it seems the issue is that libperl
will do
setlocale(LC_ALL, "");
the first time any locale-related Perl function is invoked. To defend
ourselves against that, we'd have to set more environment variables than
just LC_COLLATE and LC_CTYPE.

What I'm thinking about is:
* during startup, putenv("LC_ALL=C") and unsetenv any other LC_ variables
that may be lurking, except LC_MESSAGES.
* copy LC_COLLATE and LC_CTYPE into the environment when we get them
from pg_control, as Manuel suggested.
* in locale_messages_assign(), set the environment variable on all
platforms not just Windows.

You could still break the backend by doing setlocale explicitly in
plperlu functions, but that's why it's an untrusted language ...

Comments?

regards, tom lane

#5Andreas Seltenreich
andreas+pg@gate450.dyndns.org
In reply to: Tom Lane (#4)
Re: localization problem (and solution)

Tom Lane writes:

I looked into this a bit more, and it seems the issue is that libperl
will do
setlocale(LC_ALL, "");
the first time any locale-related Perl function is invoked. To defend
ourselves against that, we'd have to set more environment variables than
just LC_COLLATE and LC_CTYPE.

What I'm thinking about is:
* during startup, putenv("LC_ALL=C") and unsetenv any other LC_ variables
that may be lurking, except LC_MESSAGES.
* copy LC_COLLATE and LC_CTYPE into the environment when we get them
from pg_control, as Manuel suggested.

I'm afraid having LC_ALL in the environment at this time would still
do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL
environment variable overrides the other categories. Maybe setting
LANG instead would be a better choice?

regards,
Andreas
--

#6Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andreas Seltenreich (#5)
Re: localization problem (and solution)

Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes:

I'm afraid having LC_ALL in the environment at this time would still
do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL
environment variable overrides the other categories.

Doh, of course, I was misremembering the precedence. So we need
LANG=C
LC_ALL unset (probably LANGUAGE too, for glibc)
others as stated

regards, tom lane

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#6)
Re: localization problem (and solution)

"Andrew Dunstan" <andrew@dunslane.net> writes:

We need to test any solution carefully on Windows, which deals with locales
very differently from *nix, and where we still have some known locale issues

Right, of course. I was thinking that this change might actually bring
the Windows and Unix code closer together --- at least for LC_MESSAGES
it seems it would do so.

If I prepare a patch, do you want to test it on Windows before it goes
in, or is it easier just to commit and then test CVS tip?

regards, tom lane

#8Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#6)
Re: localization problem (and solution)

Tom Lane said:

Andreas Seltenreich <andreas+pg@gate450.dyndns.org> writes:

I'm afraid having LC_ALL in the environment at this time would still
do the wrong thing on setlocale(LC_ALL, ""); since a LC_ALL
environment variable overrides the other categories.

Doh, of course, I was misremembering the precedence. So we need
LANG=C
LC_ALL unset (probably LANGUAGE too, for glibc)
others as stated

We need to test any solution carefully on Windows, which deals with locales
very differently from *nix, and where we still have some known locale issues
(see recent discussion).

I wonder if the complained of behaviour is triggered by our recent changes
to support utf8 in pl/perl?

cheers

andrew

#9Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#7)
Re: localization problem (and solution)

Tom Lane said:

"Andrew Dunstan" <andrew@dunslane.net> writes:

We need to test any solution carefully on Windows, which deals with
locales very differently from *nix, and where we still have some known
locale issues

Right, of course. I was thinking that this change might actually bring
the Windows and Unix code closer together --- at least for LC_MESSAGES
it seems it would do so.

If I prepare a patch, do you want to test it on Windows before it goes
in, or is it easier just to commit and then test CVS tip?

Can't do anything for cvs tip until the md5 mess is fixed.

I don't have much time to spare for testing till at least next week - maybe
someone else does.

cheers

andrew

#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#8)
Re: localization problem (and solution)

"Andrew Dunstan" <andrew@dunslane.net> writes:

We need to test any solution carefully on Windows, which deals with locales
very differently from *nix, and where we still have some known locale issues
(see recent discussion).

I've committed a proposed change in HEAD --- would you check out the
Windows behavior at your convenience? If it seems to work, I'll
back-patch, but let's test first.

regards, tom lane

#11Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#10)
Re: localization problem (and solution)

Tom Lane wrote:

"Andrew Dunstan" <andrew@dunslane.net> writes:

We need to test any solution carefully on Windows, which deals with locales
very differently from *nix, and where we still have some known locale issues
(see recent discussion).

I've committed a proposed change in HEAD --- would you check out the
Windows behavior at your convenience? If it seems to work, I'll
back-patch, but let's test first.

Will try. Not quite sure how, though. Any suggestions?

cheers

andrew

#12Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#11)
Re: localization problem (and solution)

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I've committed a proposed change in HEAD --- would you check out the
Windows behavior at your convenience? If it seems to work, I'll
back-patch, but let's test first.

Will try. Not quite sure how, though. Any suggestions?

Well, one thing to try is whether you can reproduce the plperl-induced
breakage I posted this morning on Windows; and if so whether the patch
fixes it.

Also, what were those "known locale issues" you were referring to?

regards, tom lane

#13Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#12)
Re: localization problem (and solution)

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

Tom Lane wrote:

I've committed a proposed change in HEAD --- would you check out the
Windows behavior at your convenience? If it seems to work, I'll
back-patch, but let's test first.

Will try. Not quite sure how, though. Any suggestions?

Well, one thing to try is whether you can reproduce the plperl-induced
breakage I posted this morning on Windows; and if so whether the patch
fixes it.

We have a build failure to fix first:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&amp;dt=2005-12-29%2000:44:52

Also, what were those "known locale issues" you were referring to?

The issue is that if I set my machine's locale to Turkish or French,
say, it doesn't matter what locale I set during initdb or in
postgresql.conf, the server's log messages always seem to come out in
the machine's locale.

cheers

andrew

#14Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#13)
Re: localization problem (and solution)

Andrew Dunstan <andrew@dunslane.net> writes:

We have a build failure to fix first:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&amp;dt=2005-12-29%2000:44:52

Weird. It seems to be choking on linking to check_function_bodies,
but plpgsql does that exactly the same way, and there's no problem
there. I wonder whether all those warnings in the perl header files
mean anything ...

The issue is that if I set my machine's locale to Turkish or French,
say, it doesn't matter what locale I set during initdb or in
postgresql.conf, the server's log messages always seem to come out in
the machine's locale.

Is this possibly related to the fact that we don't even try to do
setlocale() for LC_MESSAGES?

regards, tom lane

#15Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: localization problem (and solution)

Tom Lane said:

Andrew Dunstan <andrew@dunslane.net> writes:

We have a build failure to fix first:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&amp;dt=2005-12-29%2000:44:52&gt;

Weird. It seems to be choking on linking to check_function_bodies, but
plpgsql does that exactly the same way, and there's no problem there.
I wonder whether all those warnings in the perl header files mean
anything ...

We always get those - see
http://www.pgbuildfarm.org/cgi-bin/show_stage_log.pl?nm=loris&amp;dt=2005-12-23%2019%3A56%3A12&amp;stg=makefor example. One day when I get time I want to clean them up.

The issue is that if I set my machine's locale to Turkish or French,
say, it doesn't matter what locale I set during initdb or in
postgresql.conf, the server's log messages always seem to come out in
the machine's locale.

Is this possibly related to the fact that we don't even try to do
setlocale() for LC_MESSAGES

We can't on Windows - it doesn't define LC_MESSAGES. But libintl does some
stuff, I believe.

cheers

andrew

#16Magnus Hagander
mha@sollentuna.net
In reply to: Andrew Dunstan (#15)
Re: localization problem (and solution)

The issue is that if I set my machine's locale to Turkish or
French, say, it doesn't matter what locale I set during
initdb or in postgresql.conf, the server's log messages
always seem to come out in the machine's locale.

Does this happen only for those locales? And how specifically do you set
the locale?

I just installed to verify, and my server goes up in english no problem,
even though my locale is set to swedish. The client tools (psql, for
example) come up in swedish, so it's definitly swedish locale. And by
donig "set LANG=en" before I start psql, it comes up in english just
fine.

//Magnus

#17Magnus Hagander
mha@sollentuna.net
In reply to: Magnus Hagander (#16)
Re: localization problem (and solution)

The issue is that if I set my machine's locale to Turkish

or French,

say, it doesn't matter what locale I set during initdb or in
postgresql.conf, the server's log messages always seem to

come out in

the machine's locale.

Does this happen only for those locales? And how specifically
do you set the locale?

I just installed to verify, and my server goes up in english
no problem, even though my locale is set to swedish. The
client tools (psql, for
example) come up in swedish, so it's definitly swedish
locale. And by donig "set LANG=en" before I start psql, it
comes up in english just fine.

I should probably say this is 8.1.1, not cvs head, but I don't recall
any changes around this.

//Magnus

#18Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#14)
Re: localization problem (and solution)

Tom Lane wrote:

Andrew Dunstan <andrew@dunslane.net> writes:

We have a build failure to fix first:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&amp;dt=2005-12-29%2000:44:52

Weird. It seems to be choking on linking to check_function_bodies,
but plpgsql does that exactly the same way, and there's no problem
there. I wonder whether all those warnings in the perl header files
mean anything ...

I have committed a fix - the perl headers were mangling DLLIMPORT so I
moved the declaration above the perl includes.

I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top
suppress at least some of those warnings.

cheers

andrew

#19Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#18)
Re: localization problem (and solution)

Andrew Dunstan <andrew@dunslane.net> writes:

I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top
suppress at least some of those warnings.

Why don't you complain to the Perl people, instead? The fact that no
such warnings occur on Unix Perl installations makes these seem pretty
suspicious.

regards, tom lane

#20Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#18)
Re: localization problem (and solution)

Andrew Dunstan <andrew@dunslane.net> writes:

I have committed a fix - the perl headers were mangling DLLIMPORT so I
moved the declaration above the perl includes.

BTW, probably a cleaner answer is to put check_function_bodies into some
header file instead of having an "extern" in the PLs' .c files. I was
thinking about that yesterday, but couldn't decide where was a good
place to put it.

regards, tom lane

#21Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#19)
Re: localization problem (and solution)

Tom Lane said:

Andrew Dunstan <andrew@dunslane.net> writes:

I would also like to add -Wno-comment to the CFLAGS for win32/gcc, top
suppress at least some of those warnings.

Why don't you complain to the Perl people, instead? The fact that no
such warnings occur on Unix Perl installations makes these seem pretty
suspicious.

Well, it's probably not even the Perl people - perl's config_h.SH seems to
do the right thing and put a space between the second / and *, so that the
compiler won't complain, so it could be ActiveState's doing. Maybe I'll just
make a tiny script to fix config.h in my perl distro.

There is a more serious problem, though, in these warnings. Perl is
apparently trying to hijack the *printf functions, just as libintl tries to
do. There's a #define we can set to inhibit that, and I think we should.
That would leave 2 lots of warnings to fix - one about uid_t/gid_t and one
about isnan.

cheers

andrew

#22Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#20)
Re: localization problem (and solution)

Tom Lane said:

Andrew Dunstan <andrew@dunslane.net> writes:

I have committed a fix - the perl headers were mangling DLLIMPORT so I
moved the declaration above the perl includes.

BTW, probably a cleaner answer is to put check_function_bodies into
some header file instead of having an "extern" in the PLs' .c files. I
was thinking about that yesterday, but couldn't decide where was a good
place to put it.

miscadmin.h ?

cheers

andrew

#23Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andrew Dunstan (#22)
Re: localization problem (and solution)

"Andrew Dunstan" <andrew@dunslane.net> writes:

Tom Lane said:

BTW, probably a cleaner answer is to put check_function_bodies into
some header file instead of having an "extern" in the PLs' .c files. I
was thinking about that yesterday, but couldn't decide where was a good
place to put it.

miscadmin.h ?

Ugh :-( I was thinking about pg_proc.h, because the variable itself is
in pg_proc.c, but that seems pretty ugly too. Another possibility is to
move the variable someplace else...

regards, tom lane

#24Andrew Dunstan
andrew@dunslane.net
In reply to: Tom Lane (#23)
Re: localization problem (and solution)

Tom Lane said:

"Andrew Dunstan" <andrew@dunslane.net> writes:

Tom Lane said:

BTW, probably a cleaner answer is to put check_function_bodies into
some header file instead of having an "extern" in the PLs' .c files.
I was thinking about that yesterday, but couldn't decide where was a
good place to put it.

miscadmin.h ?

Ugh :-( I was thinking about pg_proc.h, because the variable itself is
in pg_proc.c, but that seems pretty ugly too. Another possibility is
to move the variable someplace else...

I trust whatever choice you make.

cheers

andrew