Re: Postgresql 8.1: plperl code works with LATIN1, fail

Started by Nonameabout 19 years ago3 messagesgeneral

Matthias.Pitzl@izb.de

about 19 years ago

In an 8.1.6 UTF-8 database this example returns false; in 8.2.1 it
returns true. See the following commit message and the related bug
report regarding PL/Perl and UTF-8:

http://archives.postgresql.org/pgsql-committers/2006-10/msg00277.php
http://archives.postgresql.org/pgsql-bugs/2006-10/msg00077.php

If you can't upgrade to 8.2 then you might be able to work around
the problem by creating the function as plperlu and adding 'use utf8;'.

--
Michael Fuhr

Hello Michael!

As fas as i know 'use utf8;' normally just tells Perl that the source code
is written in UTF-8 and noting more.
For converting from and to UTF-8 in data usually the Encode modul is used.
Or is this different for plperlu?

Greetings,
Matthias

Michael Fuhr

mike@fuhr.org

about 19 years ago

In reply to: Noname (#1)

On Mon, Jan 29, 2007 at 01:34:47PM +0100, Matthias.Pitzl@izb.de wrote:

If you can't upgrade to 8.2 then you might be able to work around
the problem by creating the function as plperlu and adding 'use utf8;'.

As fas as i know 'use utf8;' normally just tells Perl that the source code
is written in UTF-8 and noting more.

The string literals in the PL/Perl function body are UTF-8 but Perl
isn't treating them as such. Isn't "use utf8" or "use encoding 'utf8'"
the way to tell Perl to do so? The perluniintro manual page says this:

Only one case remains where an explicit "use utf8" is needed: if
your Perl script itself is encoded in UTF-8, you can use UTF-8
in your identifier names, and in string and regular expression
literals, by saying "use utf8".

Isn't that the situation here? The PL/Perl function body is a
string encoded in the database's encoding, which in this case is
UTF-8.

For converting from and to UTF-8 in data usually the Encode modul is used.
Or is this different for plperlu?

Isn't the Encode module used for doing explicit conversions? I think
the goal is not to have to do so, i.e., to have PL/Perl treat string
literals as UTF-8 if the database encoding is UTF-8. PostgreSQL 8.2
does so but earlier versions don't.

--
Michael Fuhr

Randal L. Schwartz

merlyn@stonehenge.com

about 19 years ago

In reply to: Michael Fuhr (#2)

"Michael" == Michael Fuhr <mike@fuhr.org> writes:

Michael> Isn't that the situation here? The PL/Perl function body is a
Michael> string encoded in the database's encoding, which in this case is
Michael> UTF-8.

If that's always the case, then the embedded Perl interpreter should
be started in that mode, perhaps by adding "-Mutf8" to the arg list
of the embedded interpreter.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!