BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

Started by PG Bug reporting formalmost 3 years ago38 messageshackersbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org
hackersbugs

The following bug has been logged on the website:

Bug reference: 17946
Logged by: Guido Brugnara
Email address: gdo@leader.it
PostgreSQL version: 12.15
Operating system: Ubuntu 20.04
Description:

After upgrading an application using Postgresql from version 10 to 12,
fields of type "money" are no longer generated with the € symbol but with
$.
I identified the problem that occurs when making use of functions with
"LANGUAGE plperl," see with the following queries to be executed in order:
# from shell ...
sudo su -c psql\ postgres postgres <<'__SQL__';
SET lc_monetary TO 'C';
SELECT 12.34::money AS price;
SET lc_monetary TO 'it_IT.UTF-8';
SELECT 12.34::money AS price;
SET lc_monetary TO 'en_GB.UTF-8';
SELECT 12.34::money AS price;
CREATE EXTENSION plperl;
SET lc_monetary TO 'C';
SELECT 12.34::money AS price;
DO LANGUAGE 'plperl' $$ my $rv = spi_exec_query(q{SELECT 12.34::money AS
price;}, 1);elog(NOTICE, $rv->{rows}[0]->{price});$$;
SET lc_monetary TO 'it_IT.UTF-8';
SELECT 12.34::money AS price;
DO LANGUAGE 'plperl' $$ my $rv = spi_exec_query(q{SELECT 12.34::money AS
price;}, 1);elog(NOTICE, $rv->{rows}[0]->{price});$$;
SET lc_monetary TO 'en_GB.UTF-8';
SELECT 12.34::money AS price;
DO LANGUAGE 'plperl' $$ my $rv = spi_exec_query(q{SELECT 12.34::money AS
price;}, 1);elog(NOTICE, $rv->{rows}[0]->{price});$$;
__SQL__
#end.

The first three SELECTs generate content with the currencies Dollar, Euro &
Pound, as expected, while the last three only with Dollar.
It would appear that after first DO LANGUAGE 'plper' call, LC_MONETARY even
if it is varied, has no effect in subsequent queries.
Any suggestions?

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

PG Bug reporting form <noreply@postgresql.org> writes:

After upgrading an application using Postgresql from version 10 to 12,
fields of type "money" are no longer generated with the € symbol but with
$.

Hmm, seems to work for me:

$ psql
psql (12.15)
Type "help" for help.

postgres=# SET lc_monetary TO 'en_GB.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
--------
£12.34
(1 row)

postgres=# DO LANGUAGE 'plperl' $$ my $rv = spi_exec_query(q{SELECT 12.34::money AS
price;}, 1);elog(NOTICE, $rv->{rows}[0]->{price});$$;
NOTICE: £12.34
DO
postgres=# SET lc_monetary TO 'it_IT.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
---------
€ 12,34
(1 row)

postgres=# DO LANGUAGE 'plperl' $$ my $rv = spi_exec_query(q{SELECT 12.34::money AS
price;}, 1);elog(NOTICE, $rv->{rows}[0]->{price});$$;
NOTICE: € 12,34
DO

IIRC, we've seen trouble in the past with some versions of libperl
clobbering the host application's locale settings. Maybe you
have a plperl.on_init or plperl.on_plperl_init action that is
causing that to happen? In any case, I'd call it a Perl bug not
a Postgres bug.

regards, tom lane

#3Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Tom Lane (#2)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 25/05/2023 15:33, Tom Lane wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

After upgrading an application using Postgresql from version 10 to 12,
fields of type "money" are no longer generated with the € symbol but with
$.

Hmm, seems to work for me:

I can reproduce this:

psql (16beta1)
Type "help" for help.

postgres=# DO LANGUAGE 'plperl' $$ elog(NOTICE, 'foo') $$;
NOTICE: foo
DO
postgres=# SET lc_monetary TO 'en_GB.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
--------
$12.34
(1 row)

If I don't call the plperl function, it works as expected:

sql (16beta1)
Type "help" for help.

postgres=# SET lc_monetary TO 'en_GB.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
--------
£12.34
(1 row)

I should note that 'en_GB.UTF-8' is the default locale in my system, and
that's what I used in initdb. I don't know if it makes a difference.

IIRC, we've seen trouble in the past with some versions of libperl
clobbering the host application's locale settings. Maybe you
have a plperl.on_init or plperl.on_plperl_init action that is
causing that to happen? In any case, I'd call it a Perl bug not
a Postgres bug

I did some debugging, initializing the perl interpreter calls uselocale():

#0 __GI___uselocale (newloc=0x7f9f47ff0940 <_nl_C_locobj>) at
./locale/uselocale.c:31
#1 0x00007f9f373bd069 in ?? () from
/usr/lib/x86_64-linux-gnu/libperl.so.5.36
#2 0x00007f9f373bce74 in ?? () from
/usr/lib/x86_64-linux-gnu/libperl.so.5.36
#3 0x00007f9f373bfc15 in Perl_init_i18nl10n () from
/usr/lib/x86_64-linux-gnu/libperl.so.5.36
#4 0x00007f9f48b74cfb in plperl_init_interp () at plperl.c:809
#5 0x00007f9f48b78adc in _PG_init () at plperl.c:483
#6 0x000055c98b8e9b63 in internal_load_library (libname=0x55c98bebaf90
"/home/heikki/pgsql.fsmfork/lib/plperl.so") at dfmgr.c:289
#7 0x000055c98b8ea1c2 in load_external_function
(filename=filename@entry=0x55c98bebb1c0 "$libdir/plperl",
funcname=funcname@entry=0x55c98beba378 "plperl_inline_handler",
signalNotFound=signalNotFound@entry=true,
filehandle=filehandle@entry=0x7ffd20942b48) at dfmgr.c:116
#8 0x000055c98b8ea864 in fmgr_info_C_lang (functionId=129304,
procedureTuple=0x7f9f4778ccb8, finfo=0x7ffd20942bf0) at fmgr.c:386
#9 fmgr_info_cxt_security (functionId=129304, finfo=0x7ffd20942bf0,
mcxt=<optimized out>, ignore_security=<optimized out>) at fmgr.c:246
#10 0x000055c98b8eba72 in fmgr_info (finfo=0x7ffd20942bf0,
functionId=<optimized out>) at fmgr.c:129
#11 OidFunctionCall1Coll (functionId=<optimized out>,
collation=collation@entry=0, arg1=94324124262840) at fmgr.c:1386
#12 0x000055c98b5e1385 in ExecuteDoStmt
(pstate=pstate@entry=0x55c98beba0b0, stmt=stmt@entry=0x55c98be90858,
atomic=atomic@entry=false) at functioncmds.c:2144
#13 0x000055c98b7c24ce in standard_ProcessUtility (pstmt=0x55c98be908e0,
queryString=0x55c98be8fd50 "DO LANGUAGE 'plperl' $$ elog(NOTICE, 'foo')
$$;", readOnlyTree=<optimized out>,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55c98be90b80, qc=0x7ffd20942f30) at utility.c:714
#14 0x000055c98b7c0d9f in PortalRunUtility
(portal=portal@entry=0x55c98bf0b710, pstmt=pstmt@entry=0x55c98be908e0,
isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=0x55c98be90b80,
qc=0x7ffd20942f30) at pquery.c:1158
#15 0x000055c98b7c0ecb in PortalRunMulti
(portal=portal@entry=0x55c98bf0b710, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false,
dest=dest@entry=0x55c98be90b80,
altdest=altdest@entry=0x55c98be90b80, qc=qc@entry=0x7ffd20942f30)
at pquery.c:1322
#16 0x000055c98b7c139d in PortalRun (portal=portal@entry=0x55c98bf0b710,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true,
dest=dest@entry=0x55c98be90b80,
altdest=altdest@entry=0x55c98be90b80, qc=0x7ffd20942f30) at pquery.c:791
#17 0x000055c98b7bd85d in exec_simple_query (query_string=0x55c98be8fd50
"DO LANGUAGE 'plperl' $$ elog(NOTICE, 'foo') $$;") at postgres.c:1274
#18 0x000055c98b7bf978 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4632
#19 0x000055c98b73f743 in BackendRun (port=<optimized out>,
port=<optimized out>) at postmaster.c:4461
#20 BackendStartup (port=<optimized out>) at postmaster.c:4189
#21 ServerLoop () at postmaster.c:1779
#22 0x000055c98b74077a in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x55c98be88fc0) at postmaster.c:1463
#23 0x000055c98b4a96be in main (argc=3, argv=0x55c98be88fc0) at main.c:198

I think the uselocale() call renders ineffective the setlocale() calls
that we make later. Maybe we should replace our setlocale() calls with
uselocale(), too.

--
Heikki Linnakangas
Neon (https://neon.tech)

#4Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#3)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On Mon Jun 5, 2023 at 11:00 AM CDT, Heikki Linnakangas wrote:

On 25/05/2023 15:33, Tom Lane wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

After upgrading an application using Postgresql from version 10 to 12,
fields of type "money" are no longer generated with the € symbol but with
$.

Hmm, seems to work for me:

I can reproduce this:

psql (16beta1)
Type "help" for help.

postgres=# DO LANGUAGE 'plperl' $$ elog(NOTICE, 'foo') $$;
NOTICE: foo
DO
postgres=# SET lc_monetary TO 'en_GB.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
--------
$12.34
(1 row)

If I don't call the plperl function, it works as expected:

sql (16beta1)
Type "help" for help.

postgres=# SET lc_monetary TO 'en_GB.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
--------
£12.34
(1 row)

I should note that 'en_GB.UTF-8' is the default locale in my system, and
that's what I used in initdb. I don't know if it makes a difference.

I am looking into this bug. I have also reproduced it.

--
Tristan Partin
Neon (https://neon.tech)

#5Joe Conway
mail@joeconway.com
In reply to: Tristan Partin (#4)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/9/23 11:31, Tristan Partin wrote:

On Mon Jun 5, 2023 at 11:00 AM CDT, Heikki Linnakangas wrote:

On 25/05/2023 15:33, Tom Lane wrote:

PG Bug reporting form <noreply@postgresql.org> writes:

After upgrading an application using Postgresql from version 10 to 12,
fields of type "money" are no longer generated with the € symbol but with
$.

Hmm, seems to work for me:

I can reproduce this:

psql (16beta1)
Type "help" for help.

postgres=# DO LANGUAGE 'plperl' $$ elog(NOTICE, 'foo') $$;
NOTICE: foo
DO
postgres=# SET lc_monetary TO 'en_GB.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
--------
$12.34
(1 row)

If I don't call the plperl function, it works as expected:

sql (16beta1)
Type "help" for help.

postgres=# SET lc_monetary TO 'en_GB.UTF-8';
SET
postgres=# SELECT 12.34::money AS price;
price
--------
£12.34
(1 row)

I should note that 'en_GB.UTF-8' is the default locale in my system, and
that's what I used in initdb. I don't know if it makes a difference.

I am looking into this bug. I have also reproduced it.

It reproduces for me on both pg16beta1 and pg10. I wonder if it isn't a
behavior change in libperl itself. It seems that merely doing "load
'plperl';" is enough to cause the issue as long as it is done prior to
doing "SET lc_monetary TO 'en_GB.UTF-8'; SELECT 12.34::money AS price;".
When done in the opposite order the problem does not occur.

8<------------------------------
# On pg10 with perl v5.34.0
# note that on my system
# LC_NUMERIC=""
# LC_ALL=""
# LANG="en_US.UTF-8"
#
# this works correctly
psql nmx << EOF
SET lc_monetary TO 'en_GB.UTF-8';
SELECT 12.34::money AS price;
load 'plperl';
SELECT 12.34::money AS price;
EOF
SET
price
--------
£12.34
(1 row)

LOAD
price
--------
£12.34
(1 row)

# this does not
psql nmx << EOF
SET lc_monetary TO 'en_GB.UTF-8';
load 'plperl';
SELECT 12.34::money AS price;
EOF
SET
LOAD
price
--------
$12.34
(1 row)
8<------------------------------

Since I am also seeing this on pg10, I wonder if it is a change in
perl.I found this[1]"locale changes in 5.19.1 break LC_NUMERIC handling" https://github.com/Perl/perl5/issues/13089 -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com:

"What did change is that perl space code no
longer pays attention to the LC_NUMERIC
category outside 'use locale'. This is the way
it has always worked, AFAIK, for LC_COLLATE
and, mostly, LC_CTYPE, and for some uses of
LC_NUMERIC."

[1]: "locale changes in 5.19.1 break LC_NUMERIC handling" https://github.com/Perl/perl5/issues/13089 -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
handling"
https://github.com/Perl/perl5/issues/13089
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#6Joe Conway
mail@joeconway.com
In reply to: Joe Conway (#5)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/9/23 15:05, Joe Conway wrote:

I wonder if it isn't a behavior change in libperl itself. It seems
that merely doing "load 'plperl';" is enough to cause the issue

I can reproduce with a simple test program by linking libperl:

8<-------- test.c ----------------
#include <locale.h>
#include <stdio.h>

#define off64_t __off64_t
#include <EXTERN.h>
#include <perl.h>

int
main(int argc, char *argv[])
{
struct lconv *extlconv;
#ifdef WITH_PERL
PerlInterpreter *plperl;
plperl = perl_alloc();
perl_construct(plperl);
#endif
setlocale(LC_MONETARY, "en_GB.UTF-8");
extlconv = localeconv();
printf("currency symbol = \"%s\"\n",
extlconv->currency_symbol);
return 0;
}
8<-------- test.c ----------------

Adjust the perl paths to suit:

8<------------------------
gcc -O0 -ggdb3 -o test \
-I /usr/lib64/perl5/CORE \
-lperl \
test.c

./test
currency symbol = "£"

gcc -O0 -ggdb3 -o test \
-I /usr/lib64/perl5/CORE \
-lperl -DWITH_PERL \
test.c

./test
currency symbol = "$"
8<------------------------

It happens because somehow loading libperl prevents localeconv() from
returning the correct values, even though libperl only seems to call
"setlocale(LC_ALL, NULL)" which ought not change anything.

8<------------------------
gdb ./test

Reading symbols from ./test...
(gdb) b setlocale
Breakpoint 1 at 0x10f0
(gdb) r
Starting program: /opt/src/pgsql-

Breakpoint 1, __GI_setlocale (category=6, locale=0x0) at
./locale/setlocale.c:218
218 ./locale/setlocale.c: No such file or directory.
(gdb) bt
#0 __GI_setlocale (category=6, locale=0x0) at ./locale/setlocale.c:218
#1 0x00007ffff7d96b97 in Perl_init_i18nl10n () from
/lib/x86_64-linux-gnu/libperl.so.5.34
#2 0x0000555555555225 in main (argc=1, argv=0x7fffffffe1d8) at test.c:18
(gdb) c
Continuing.

Breakpoint 1, __GI_setlocale (category=4, locale=0x55555555602e
"en_GB.UTF-8") at ./locale/setlocale.c:218
218 in ./locale/setlocale.c
(gdb) bt
#0 __GI_setlocale (category=4, locale=0x55555555602e "en_GB.UTF-8") at
./locale/setlocale.c:218
#1 0x0000555555555239 in main (argc=1, argv=0x7fffffffe1d8) at test.c:20

main (argc=1, argv=0x7fffffffe1d8) at test.c:21
21 extlconv = localeconv();
(gdb)
22 printf("currency symbol = \"%s\"\n",
(gdb)
currency symbol = "$"
24 return 0;
(gdb)
8<------------------------

Will continue to dig in the morning.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#7Joe Conway
mail@joeconway.com
In reply to: Joe Conway (#6)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/9/23 22:10, Joe Conway wrote:

On 6/9/23 15:05, Joe Conway wrote:

I wonder if it isn't a behavior change in libperl itself. It seems
that merely doing "load 'plperl';" is enough to cause the issue

I can reproduce with a simple test program by linking libperl:

8<-------- test.c ----------------

A bit more spelunking leads me to the following observations and
conclusions:

1/ On RHEL7 with perl v5.16.3 the problem does not occur

2/ On RHEL9 with perl v5.32.1 the problem does occur

3/ The difference in behavior is triggered by the newer perl doing a
bunch of newlocale/uselocale calls not done by the older perl, combined
with a glibc behavior which seems surprising at best.

From localeinfo.h in glibc source tree:
8<------------------------
/* This fetches the thread-local locale_t pointer, either one set with
uselocale or &_nl_global_locale. */
#define _NL_CURRENT_LOCALE (__libc_tsd_get (locale_t, LOCALE))
8<------------------------

4/ I successfully tested a fix in the simplified reproducer program sent
earlier. It amounts to adding:

8<------------------------
uselocale(LC_GLOBAL_LOCALE);
8<------------------------

prior to calling

8<------------------------
extlconv = localeconv();
8<------------------------

5/ The attached fixes the issue for me on pg10 and passes check-world.

Comments?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

20230610-plperl_locale_issue-000.patchtext/x-patch; charset=UTF-8; name=20230610-plperl_locale_issue-000.patchDownload+13-0
#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#7)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

Joe Conway <mail@joeconway.com> writes:

5/ The attached fixes the issue for me on pg10 and passes check-world.
Comments?

The call in PGLC_localeconv seems *very* oddly placed. Why not
do that before it does any other locale calls? Otherwise you don't
really have reason to believe you're saving the appropriate
values to restore later.

regards, tom lane

#9Joe Conway
mail@joeconway.com
In reply to: Tom Lane (#8)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/10/23 14:42, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

5/ The attached fixes the issue for me on pg10 and passes check-world.
Comments?

The call in PGLC_localeconv seems *very* oddly placed. Why not
do that before it does any other locale calls? Otherwise you don't
really have reason to believe you're saving the appropriate
values to restore later.

As far as I can tell it really only affects localeconv(), so I tried to
place it close to those. But I am fine with moving it up.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#10Joe Conway
mail@joeconway.com
In reply to: Joe Conway (#9)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/10/23 15:07, Joe Conway wrote:

On 6/10/23 14:42, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

5/ The attached fixes the issue for me on pg10 and passes check-world.
Comments?

The call in PGLC_localeconv seems *very* oddly placed. Why not
do that before it does any other locale calls? Otherwise you don't
really have reason to believe you're saving the appropriate
values to restore later.

As far as I can tell it really only affects localeconv(), so I tried to
place it close to those. But I am fine with moving it up.

This version is against pg16 (rather than pg10), moves up that hunk,
mentions localeconv() in the comment as the reason for the call, and
fixes some whitespace sloppiness. I will plan to apply to all supported
branches.

Better?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

20230610-plperl_locale_issue-001.patchtext/x-patch; charset=UTF-8; name=20230610-plperl_locale_issue-001.patchDownload+13-0
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Joe Conway (#10)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

Joe Conway <mail@joeconway.com> writes:

This version is against pg16 (rather than pg10), moves up that hunk,
mentions localeconv() in the comment as the reason for the call, and
fixes some whitespace sloppiness. I will plan to apply to all supported
branches.

Better?

WFM.

regards, tom lane

#12Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Joe Conway (#10)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 10/06/2023 22:28, Joe Conway wrote:

On 6/10/23 15:07, Joe Conway wrote:

On 6/10/23 14:42, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

5/ The attached fixes the issue for me on pg10 and passes check-world.
Comments?

The call in PGLC_localeconv seems *very* oddly placed. Why not
do that before it does any other locale calls? Otherwise you don't
really have reason to believe you're saving the appropriate
values to restore later.

As far as I can tell it really only affects localeconv(), so I tried to
place it close to those. But I am fine with moving it up.

This version is against pg16 (rather than pg10), moves up that hunk,
mentions localeconv() in the comment as the reason for the call, and
fixes some whitespace sloppiness. I will plan to apply to all supported
branches.

Better?

The man page for uselocale(LC_GLOBAL_LOCALE) says: "The calling thread's
current locale is set to the global locale determined by setlocale(3)."
Does that undo the effect of calling uselocale() previously, so if you
later call setlocale(), the new locale takes effect in the thread too?
Or is it equivalent to "uselocale(LC_ALL, setlocale(NULL))", so that it
sets the thread's locale to the current global locale, but later
setlocale() calls have no effect on it?

In any case, this still doesn't feel like the right place. We have many
more setlocale() calls. Shouldn't we sprinkle them all with
uselocale(LC_GLOBAL_LOCALE)? cache_locale_time() for example. Or rather,
all the places where we use any functions that depend on the current locale.

How about we replace all setlocale() calls with uselocale()?

Shouldn't we restore the old thread-specific locale after the calls? I'm
not sure why libperl calls uselocale(), but we are now overwriting the
locale that it sets. We have a few other uselocale() calls in
pg_locale.c, and we take care to restore the old locale in those.

There are a few uselocale() calls in ecpg, and they are protected by
HAVE_USELOCALE. Interestingly, the calls in pg_locale.c are not, but
they are protected by HAVE_LOCALE_T. Seems a little inconsistent.

--
Heikki Linnakangas
Neon (https://neon.tech)

#13Joe Conway
mail@joeconway.com
In reply to: Heikki Linnakangas (#12)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

(moving to hackers)

On 6/12/23 05:13, Heikki Linnakangas wrote:

On 10/06/2023 22:28, Joe Conway wrote:

On 6/10/23 15:07, Joe Conway wrote:

On 6/10/23 14:42, Tom Lane wrote:

Joe Conway <mail@joeconway.com> writes:

5/ The attached fixes the issue for me on pg10 and passes check-world.
Comments?

The call in PGLC_localeconv seems *very* oddly placed. Why not
do that before it does any other locale calls? Otherwise you don't
really have reason to believe you're saving the appropriate
values to restore later.

As far as I can tell it really only affects localeconv(), so I tried to
place it close to those. But I am fine with moving it up.

This version is against pg16 (rather than pg10), moves up that hunk,
mentions localeconv() in the comment as the reason for the call, and
fixes some whitespace sloppiness. I will plan to apply to all supported
branches.

Better?

The man page for uselocale(LC_GLOBAL_LOCALE) says: "The calling thread's
current locale is set to the global locale determined by setlocale(3)."
Does that undo the effect of calling uselocale() previously, so if you
later call setlocale(), the new locale takes effect in the thread too?
Or is it equivalent to "uselocale(LC_ALL, setlocale(NULL))", so that it
sets the thread's locale to the current global locale, but later
setlocale() calls have no effect on it?

setlocale() changes the global locale, but uselocale() changes the
locale that is currently active, as I understand it.

Also note that uselocale man page says "Unlike setlocale(3), uselocale()
does not allow selective replacement of individual locale categories.
To employ a locale that differs in only a few categories from the
current locale, use calls to duplocale(3) and newlocale(3) to obtain a
locale object equivalent to the current locale and modify the desired
categories in that object."

In any case, this still doesn't feel like the right place. We have many
more setlocale() calls. Shouldn't we sprinkle them all with
uselocale(LC_GLOBAL_LOCALE)? cache_locale_time() for example. Or rather,
all the places where we use any functions that depend on the current locale.

How about we replace all setlocale() calls with uselocale()?

I don't see us backpatching something that invasive. It might be the
right thing to do for pg17, or even pg16, but I think that is a
different discussion

Shouldn't we restore the old thread-specific locale after the calls? I'm
not sure why libperl calls uselocale(), but we are now overwriting the
locale that it sets.

That is a good question. Though arguably perl is doing the wrong thing
by not resetting the global locale when it is being used embedded.

We have a few other uselocale() calls in pg_locale.c, and we take
care to restore the old locale in those.

I think as long as we are relying on setlocale rather than uselocale in
general (see above), the global locale is where we want things left.

There are a few uselocale() calls in ecpg, and they are protected by
HAVE_USELOCALE. Interestingly, the calls in pg_locale.c are not, but
they are protected by HAVE_LOCALE_T. Seems a little inconsistent.

Possibly something we should clean up, but I think that is separate from
this fix.

In general I think we have 2 or possibly three distinct things here:

1/ how do we fix the misbehavior reported due to libperl in existing
stable branches

2/ what makes most sense going forward (and does that mean pg16 or pg17)

3/ misc code cleanups

I was mostly trying to concentrate on #1, but 2 & 3 are worthy of
discussion.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#14Joe Conway
mail@joeconway.com
In reply to: Joe Conway (#13)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/12/23 10:44, Joe Conway wrote:

1/ how do we fix the misbehavior reported due to libperl in existing
stable branches

<snip>

I was mostly trying to concentrate on #1, but 2 & 3 are worthy of
discussion.

Hmm, browsing through the perl source I came across a reference to this
(from https://perldoc.perl.org/perllocale):

---------------
PERL_SKIP_LOCALE_INIT

This environment variable, available starting in Perl v5.20, if set
(to any value), tells Perl to not use the rest of the environment
variables to initialize with. Instead, Perl uses whatever the current
locale settings are. This is particularly useful in embedded
environments, see "Using embedded Perl with POSIX locales" in perlembed.
---------------

Seems we ought to be using that.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#15Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#12)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On Mon Jun 12, 2023 at 4:13 AM CDT, Heikki Linnakangas wrote:

There are a few uselocale() calls in ecpg, and they are protected by
HAVE_USELOCALE. Interestingly, the calls in pg_locale.c are not, but
they are protected by HAVE_LOCALE_T. Seems a little inconsistent.

Patch is attached. CC-ing hackers.

--
Tristan Partin
Neon (https://neon.tech)

Attachments:

v1-0001-Make-uselocale-protection-more-consistent.patchtext/x-patch; charset=utf-8; name=v1-0001-Make-uselocale-protection-more-consistent.patchDownload+8-9
#16Joe Conway
mail@joeconway.com
In reply to: Joe Conway (#14)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/12/23 17:28, Joe Conway wrote:

On 6/12/23 10:44, Joe Conway wrote:

1/ how do we fix the misbehavior reported due to libperl in existing
stable branches

<snip>

I was mostly trying to concentrate on #1, but 2 & 3 are worthy of
discussion.

Hmm, browsing through the perl source I came across a reference to this
(from https://perldoc.perl.org/perllocale):

---------------
PERL_SKIP_LOCALE_INIT

This environment variable, available starting in Perl v5.20, if set
(to any value), tells Perl to not use the rest of the environment
variables to initialize with. Instead, Perl uses whatever the current
locale settings are. This is particularly useful in embedded
environments, see "Using embedded Perl with POSIX locales" in perlembed.
---------------

Seems we ought to be using that.

Turns out that that does nothing useful as far as I can tell.

So I am back to proposing the attached against pg16beta1, to be
backpatched to pg11.

Since much of the discussion happened on pgsql-bugs, the background
summary for hackers is this:

When plperl is first loaded, the init function eventually works its way
to calling Perl_init_i18nl10n(). In versions of perl >= 5.20, that ends
up at S_emulate_setlocale() which does a series of uselocale() calls.
For reference, RHEL 7 is perl 5.16.3 while RHEL 9 is perl 5.32.1. Older
versions of perl do not have this behavior.

The problem with uselocale() is that it changes the active locale away
from the default global locale. Subsequent uses of setlocale() affect
the global locale, but if that is not the active locale, it does not
control the results of locale dependent functions such as localeconv(),
which is what we depend on in PGLC_localeconv().

The result is illustrated in this example:
8<------------
psql test
psql (16beta1)
Type "help" for help.

test=# show lc_monetary;
lc_monetary
-------------
en_GB.UTF-8
(1 row)

test=# SELECT 12.34::money AS price;
price
--------
£12.34
(1 row)

test=# \q
8<------------
psql test
psql (16beta1)
Type "help" for help.

test=# load 'plperl';
LOAD
test=# show lc_monetary;
lc_monetary
-------------
en_GB.UTF-8
(1 row)

test=# SELECT 12.34::money AS price;
price
--------
$12.34
(1 row)
8<------------

Notice that merely loading plperl makes the currency symbol wrong.

I have proposed a targeted fix that I believe is safe to backpatch --
attached.

IIUC, Tom was +1, but Heikki was looking for a more general solution.

My issue with the more general solution is that it will likely be too
invasive to backpatch, and at the moment at least, there are no other
confirmed bugs related to all of this (even if the current code is more
fragile than we would prefer).

I would like to commit this to all supported branches in the next few
days, unless there are other suggestions or objections.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

20230610-plperl_locale_issue-002.patchtext/x-patch; charset=UTF-8; name=20230610-plperl_locale_issue-002.patchDownload+13-0
#17Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Joe Conway (#16)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 18/06/2023 21:27, Joe Conway wrote:

I have proposed a targeted fix that I believe is safe to backpatch --
attached.

IIUC, Tom was +1, but Heikki was looking for a more general solution.

My issue with the more general solution is that it will likely be too
invasive to backpatch, and at the moment at least, there are no other
confirmed bugs related to all of this (even if the current code is more
fragile than we would prefer).

Ok, I agree switching to uselocale() everywhere is too much to
backpatch. We should consider it for master though.

With the patch you're proposing, do we now have a coding rule that you
must call "uselocale(LC_GLOBAL_LOCALE)" before every and any call to
setlocale()? If so, you missed a few spots: pg_perm_setlocale,
pg_bind_textdomain_codeset, and cache_locale_time.

The current locale affects a lot of other things than localeconv()
calls. For example, LC_MESSAGES affects all strerror() calls. Do we need
to call "uselocale(LC_GLOBAL_LOCALE)" before all possible strerror()
calls too?

I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.

I just found out about perl's "switch_to_global_locale" function
(https://perldoc.perl.org/perlapi#switch_to_global_locale). Should we
use that?

Testing the patch, I bumped into this:

postgres=# create or replace function finnish_to_number() returns
numeric as $$ select to_number('1,23', '9D99'); $$ language sql set
lc_numeric to 'fi_FI.utf8';
CREATE FUNCTION
postgres=# DO LANGUAGE 'plperlu' $$
use POSIX qw(setlocale LC_NUMERIC);
use locale;

setlocale LC_NUMERIC, "fi_FI.utf8";

$n = 5/2; # Assign numeric 2.5 to $n

spi_exec_query('SELECT finnish_to_number()');

$a = " $n"; # Locale-dependent conversion to string
elog(NOTICE, "half five is $n"); # Locale-dependent output
$$;
NOTICE: half five is 2,5
DO
postgres=# select to_char(now(), 'Day');
WARNING: could not determine encoding for locale "en_GB.UTF-8": codeset
is "ANSI_X3.4-1968"
to_char
-----------
Tuesday
(1 row)

--
Heikki Linnakangas
Neon (https://neon.tech)

#18Joe Conway
mail@joeconway.com
In reply to: Heikki Linnakangas (#17)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/19/23 19:30, Heikki Linnakangas wrote:

On 18/06/2023 21:27, Joe Conway wrote:

I have proposed a targeted fix that I believe is safe to backpatch --
attached.

IIUC, Tom was +1, but Heikki was looking for a more general solution.

My issue with the more general solution is that it will likely be too
invasive to backpatch, and at the moment at least, there are no other
confirmed bugs related to all of this (even if the current code is more
fragile than we would prefer).

Ok, I agree switching to uselocale() everywhere is too much to
backpatch. We should consider it for master though.

Makes sense

With the patch you're proposing, do we now have a coding rule that you
must call "uselocale(LC_GLOBAL_LOCALE)" before every and any call to
setlocale()? If so, you missed a few spots: pg_perm_setlocale,
pg_bind_textdomain_codeset, and cache_locale_time.

Well I was not proposing such a rule (trying to stay narrowly focused on
the demonstrated issue) but I suppose it might make sense. Anywhere we
use setlocale() we are depending on subsequent locale operations to use
the global locale. And uselocale(LC_GLOBAL_LOCALE) itself looks like it
ought to be pretty cheap.

The current locale affects a lot of other things than localeconv()
calls. For example, LC_MESSAGES affects all strerror() calls. Do we need
to call "uselocale(LC_GLOBAL_LOCALE)" before all possible strerror()
calls too?

That seems heavy handed

I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.

I don't see how that is practical, or at least it does not really
address the issue. I think any loaded shared library could cause the
same problem by running newlocale() + uselocale() on init. Perhaps I
should go test that theory though.

I just found out about perl's "switch_to_global_locale" function
(https://perldoc.perl.org/perlapi#switch_to_global_locale). Should we
use that?

Maybe, although it does not seem to exist on the older perl version on
RHEL7. And same comment as above -- while it might solve the problem
with libperl, it doesn't address similar problems with other loaded
shared libraries.

Testing the patch, I bumped into this:

postgres=# create or replace function finnish_to_number() returns
numeric as $$ select to_number('1,23', '9D99'); $$ language sql set
lc_numeric to 'fi_FI.utf8';
CREATE FUNCTION
postgres=# DO LANGUAGE 'plperlu' $$
use POSIX qw(setlocale LC_NUMERIC);
use locale;

setlocale LC_NUMERIC, "fi_FI.utf8";

$n = 5/2; # Assign numeric 2.5 to $n

spi_exec_query('SELECT finnish_to_number()');

$a = " $n"; # Locale-dependent conversion to string
elog(NOTICE, "half five is $n"); # Locale-dependent output
$$;
NOTICE: half five is 2,5
DO
postgres=# select to_char(now(), 'Day');
WARNING: could not determine encoding for locale "en_GB.UTF-8": codeset
is "ANSI_X3.4-1968"
to_char
-----------
Tuesday
(1 row)

Do you think that is because uselocale(LC_GLOBAL_LOCALE) pulls out the
rug from under perl?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

#19Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Joe Conway (#18)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 21/06/2023 01:02, Joe Conway wrote:

On 6/19/23 19:30, Heikki Linnakangas wrote:

On 18/06/2023 21:27, Joe Conway wrote:
With the patch you're proposing, do we now have a coding rule that you
must call "uselocale(LC_GLOBAL_LOCALE)" before every and any call to
setlocale()? If so, you missed a few spots: pg_perm_setlocale,
pg_bind_textdomain_codeset, and cache_locale_time.

Well I was not proposing such a rule (trying to stay narrowly focused on
the demonstrated issue) but I suppose it might make sense. Anywhere we
use setlocale() we are depending on subsequent locale operations to use
the global locale. And uselocale(LC_GLOBAL_LOCALE) itself looks like it
ought to be pretty cheap.

The current locale affects a lot of other things than localeconv()
calls. For example, LC_MESSAGES affects all strerror() calls. Do we need
to call "uselocale(LC_GLOBAL_LOCALE)" before all possible strerror()
calls too?

That seems heavy handed

Yet I think that's exactly where this is heading. See this case (for
gettext() rather than strerror()):

postgres=# set lc_messages ='sv_SE.UTF8';
SET
postgres=# this prints syntax error in Swedish;
FEL: syntaxfel vid eller nära "this"
LINE 1: this prints syntax error in Swedish;
^
postgres=# load 'plperl';
LOAD
postgres=# set lc_messages ='en_GB.utf8';
SET
postgres=# this *should* print syntax error in English;
FEL: syntaxfel vid eller nära "this"
LINE 1: this *should* print syntax error in English;
^

I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.

I don't see how that is practical, or at least it does not really
address the issue. I think any loaded shared library could cause the
same problem by running newlocale() + uselocale() on init. Perhaps I
should go test that theory though.

Any shared library could do that, that's true. Any shared library could
also call 'chdir'. But most shared libraries don't. I think it's the
responsibility of the extension that loads the shared library, plperl in
this case, to make sure it doesn't mess up the environment for the
postgres backend.

Testing the patch, I bumped into this:

postgres=# create or replace function finnish_to_number() returns
numeric as $$ select to_number('1,23', '9D99'); $$ language sql set
lc_numeric to 'fi_FI.utf8';
CREATE FUNCTION
postgres=# DO LANGUAGE 'plperlu' $$
use POSIX qw(setlocale LC_NUMERIC);
use locale;

setlocale LC_NUMERIC, "fi_FI.utf8";

$n = 5/2; # Assign numeric 2.5 to $n

spi_exec_query('SELECT finnish_to_number()');

$a = " $n"; # Locale-dependent conversion to string
elog(NOTICE, "half five is $n"); # Locale-dependent output
$$;
NOTICE: half five is 2,5
DO
postgres=# select to_char(now(), 'Day');
WARNING: could not determine encoding for locale "en_GB.UTF-8": codeset
is "ANSI_X3.4-1968"
to_char
-----------
Tuesday
(1 row)

Do you think that is because uselocale(LC_GLOBAL_LOCALE) pulls out the
rug from under perl?

libperl is fine in this case. But cache_locale_time() also calls
setlocale(), and your patch didn't add the "uselocale(LC_GLOBAL_LOCALE)"
there.

It's a valid concern that "uselocale(LC_GLOBAL_LOCALE)" could pull the
rug from under perl. I tried to find issues like that, by calling
locale-dependent functions in plperl, with SQL functions that call
"uselocale(LC_GLOBAL_LOCALE)" via PGLC_localeconv() in between. But I
couldn't find any case where the perl code would misbehave. I guess
libperl calls uselocale() before any locale-dependent function, but I
didn't look very closely.

--
Heikki Linnakangas
Neon (https://neon.tech)

#20Joe Conway
mail@joeconway.com
In reply to: Heikki Linnakangas (#19)
hackersbugs
Re: BUG #17946: LC_MONETARY & DO LANGUAGE plperl - BUG

On 6/22/23 03:26, Heikki Linnakangas wrote:

On 21/06/2023 01:02, Joe Conway wrote:

On 6/19/23 19:30, Heikki Linnakangas wrote:

I think we should call "uselocale(LC_GLOBAL_LOCALE)" immediately after
returning from the perl interpreter, instead of before setlocale()
calls, if we want all Postgres code to run with the global locale. Not
sure how much performance overhead that would have.

I don't see how that is practical, or at least it does not really
address the issue. I think any loaded shared library could cause the
same problem by running newlocale() + uselocale() on init. Perhaps I
should go test that theory though.

Any shared library could do that, that's true. Any shared library could
also call 'chdir'. But most shared libraries don't. I think it's the
responsibility of the extension that loads the shared library, plperl in
this case, to make sure it doesn't mess up the environment for the
postgres backend.

Ok, fair enough.

The attached fixes all of the issues raised on this thread by
specifically patching plperl.

8<------------
create or replace function finnish_to_number()
returns numeric as
$$
select to_number('1,23', '9D99')
$$ language sql set lc_numeric to 'fi_FI.utf8';

pl_regression=# show lc_monetary;
lc_monetary
-------------
C
(1 row)

DO LANGUAGE 'plperlu'
$$
use POSIX qw(setlocale LC_NUMERIC);
use locale;
setlocale LC_NUMERIC, "fi_FI.utf8";
$n = 5/2; # Assign numeric 2.5 to $n
spi_exec_query('SELECT finnish_to_number()');
# Locale-dependent conversion to string
$a = " $n";
# Locale-dependent output
elog(NOTICE, "half five is $n");
$$;
NOTICE: half five is 2,5
DO

set lc_messages ='sv_SE.UTF8';
this prints syntax error in Swedish;
FEL: syntaxfel vid eller nära "this"
LINE 1: this prints syntax error in Swedish;
^

set lc_messages ='en_GB.utf8';
this *should* print syntax error in English;
ERROR: syntax error at or near "this"
LINE 1: this *should* print syntax error in English;
^
set lc_monetary ='sv_SE.UTF8';
SELECT 12.34::money AS price;
price
----------
12,34 kr
(1 row)

set lc_monetary ='en_GB.UTF8';
SELECT 12.34::money AS price;
price
--------
£12.34
(1 row)

set lc_monetary ='en_US.UTF8';
SELECT 12.34::money AS price;
price
--------
$12.34
(1 row)
8<------------

This works correctly from what I can see -- tested against pg16beta1 on
Linux Mint with perl v5.34.0 as well as against pg15.2 on RHEL 7 with
perl v5.16.3.

Although I have not looked yet, presumably we could have similar
problems with plpython. I would like to get agreement on this approach
against plperl before diving into that though.

Thoughts?

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachments:

plperl_locale_issue-004.patchtext/x-patch; charset=UTF-8; name=plperl_locale_issue-004.patchDownload+121-0
#21Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#3)
hackersbugs
#22Joe Conway
mail@joeconway.com
In reply to: Tristan Partin (#21)
hackersbugs
#23Tristan Partin
tristan@neon.tech
In reply to: Joe Conway (#22)
hackersbugs
#24Tristan Partin
tristan@neon.tech
In reply to: Joe Conway (#22)
hackersbugs
#25Tristan Partin
tristan@neon.tech
In reply to: Joe Conway (#20)
hackersbugs
#26Joe Conway
mail@joeconway.com
In reply to: Tristan Partin (#24)
hackersbugs
#27Joe Conway
mail@joeconway.com
In reply to: Tristan Partin (#25)
hackersbugs
#28Tristan Partin
tristan@neon.tech
In reply to: Heikki Linnakangas (#3)
hackersbugs
#29Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#23)
hackersbugs
#30Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#29)
hackersbugs
#31Tristan Partin
tristan@neon.tech
In reply to: Tristan Partin (#30)
hackersbugs
#32Joe Conway
mail@joeconway.com
In reply to: Tristan Partin (#25)
hackersbugs
#33Tristan Partin
tristan@neon.tech
In reply to: Joe Conway (#32)
hackersbugs
#34Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Joe Conway (#32)
hackersbugs
#35Joe Conway
mail@joeconway.com
In reply to: Heikki Linnakangas (#34)
hackersbugs
#36Heikki Linnakangas
heikki.linnakangas@enterprisedb.com
In reply to: Joe Conway (#35)
hackersbugs
#37Robert Haas
robertmhaas@gmail.com
In reply to: Heikki Linnakangas (#36)
hackersbugs
#38Joe Conway
mail@joeconway.com
In reply to: Robert Haas (#37)
hackersbugs