BUG #13638: Exception texts from plperl has bad encoding
The following bug has been logged on the website:
Bug reference: 13638
Logged by: Michal Leinweber
Email address: lei@aswsyst.cz
PostgreSQL version: 9.4.4
Operating system: Debian 8
Description:
I have UTF-8 database and using UTF-8 text in plpgsql and plperl functions.
Everything works ok, only exceptions from plperl functions have bad encoding
(maybe double encoded).
Compare output of these 2 functions:
create or replace function perl_test() returns text
language plperl as $$
return "Český text ěščřžýáíé";
$$;
create or replace function perl_test_err() returns text
language plperl as $$
elog(ERROR, "Česká chyba ěščřžýáíé");
$$;
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
lei@aswsyst.cz writes:
I have UTF-8 database and using UTF-8 text in plpgsql and plperl functions.
Everything works ok, only exceptions from plperl functions have bad encoding
(maybe double encoded).
Compare output of these 2 functions:
create or replace function perl_test() returns text
language plperl as $$
return "Český text ěščřžýáíé";
$$;
create or replace function perl_test_err() returns text
language plperl as $$
elog(ERROR, "Česká chyba ěščřžýáíé");
$$;
I traced through this example to the extent of finding that:
* The string passed to elog() in do_util_elog() seems correctly
encoded.
* So does the string passed to croak() after elog does its longjmp,
which is unsurprising since elog.c doesn't really do anything to it.
* Back in plperl_call_perl_func, we use sv2cstr(ERRSV) to get the
error string. sv2cstr calls "SvPVutf8(sv, len)", and that is what
is giving back the bogusly-encoded data.
I suspect the root problem is that instead of baldly doing
croak("%s", edata->message);
in do_util_elog(), we need to do something to inform Perl what encoding
the message string is in. This is beyond my Perl-fu, however.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
I wrote:
I suspect the root problem is that instead of baldly doing
croak("%s", edata->message);
in do_util_elog(), we need to do something to inform Perl what encoding
the message string is in. This is beyond my Perl-fu, however.
BTW, if the answer to that involves turning the message back into an SV,
I think we should not do that but just use the SV that was passed in
originally. Then the TRY/CATCH could go away entirely, ie the code
would become roughly like
if (level < ERROR)
{
cmsg = sv2cstr(msg);
elog(level, "%s", cmsg);
pfree(cmsg);
}
else
{
croak-with-SV(msg);
}
which knows slightly more about the meaning of "level" than before,
but otherwise seems much less entangled with anything.
It's still beyond my Perl-fu ...
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tom Lane wrote:
It's still beyond my Perl-fu ...
Maybe Alex or Tim can shed some light? Please see:
/messages/by-id/20150925112345.26913.28407@wrigleys.postgresql.org
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
The problem is not only with elog call, but it also fires if plpgsql
code called from plperl function raises exception.
Best Regards,
Michal Leinweber
On 2015-09-25 21:33, Tom Lane wrote:
I wrote:
I suspect the root problem is that instead of baldly doing
croak("%s", edata->message);
in do_util_elog(), we need to do something to inform Perl what
encoding
the message string is in. This is beyond my Perl-fu, however.BTW, if the answer to that involves turning the message back into an
SV,
I think we should not do that but just use the SV that was passed in
originally. Then the TRY/CATCH could go away entirely, ie the code
would become roughly likeif (level < ERROR)
{
cmsg = sv2cstr(msg);
elog(level, "%s", cmsg);
pfree(cmsg);
}
else
{
croak-with-SV(msg);
}which knows slightly more about the meaning of "level" than before,
but otherwise seems much less entangled with anything.It's still beyond my Perl-fu ...
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Michal Leinweber <lei@aswsyst.cz> writes:
The problem is not only with elog call, but it also fires if plpgsql
code called from plperl function raises exception.
Hmm, yeah, there are a whole bunch of instances of that croak("%s",
edata->message) pattern, and most of them don't have an ancestral SV
to work from. So we'll need a more general solution anyway, and it
may not be worth changing do_util_elog() to work in a fundamentally
different way than the rest.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Fri, Sep 25, 2015 at 04:51:20PM -0300, Alvaro Herrera wrote:
Tom Lane wrote:
It's still beyond my Perl-fu ...
Maybe Alex or Tim can shed some light? Please see:
/messages/by-id/20150925112345.26913.28407@wrigleys.postgresql.org
I think you want to call croak_sv(SV) with an SV containing the error
message and flagged as UTF-8.
Yim.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Tim Bunce <Tim.Bunce@pobox.com> writes:
I think you want to call croak_sv(SV) with an SV containing the error
message and flagged as UTF-8.
I found croak_sv() in the headers for Perl 5.18 (present on OS X Yosemite)
but not in Perl 5.10 (present on RHEL6), much less 5.8 which is the oldest
Perl version we claim to support. This doesn't look like a workable
answer unless we want to move the compatibility goalposts a long way.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Sun, Sep 27, 2015 at 5:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Tim Bunce <Tim.Bunce@pobox.com> writes:
I think you want to call croak_sv(SV) with an SV containing the error
message and flagged as UTF-8.I found croak_sv() in the headers for Perl 5.18 (present on OS X Yosemite)
but not in Perl 5.10 (present on RHEL6), much less 5.8 which is the oldest
Perl version we claim to support. This doesn't look like a workable
answer unless we want to move the compatibility goalposts a long way.
Yeah, I was just looking at that. I couldn't find a good way to support
5.8. Doing:
const char *utf8_str = utf_e2u(str);
SV *sv = get_sv("@", TRUE);
sv_setpvf(sv, "%s", utf8_str);
SvUTF8_on(sv);
croak(NULL);
Works, but screws up the error line numbers. For example, given the below
test in plperl.out:
--
-- Test with a bad type
--
CREATE OR REPLACE FUNCTION perl_spi_prepared_bad(double precision) RETURNS
double precision AS $$
my $x = spi_prepare('SELECT 10.0 * $1 AS a', 'does_not_exist');
my $q = spi_query_prepared($x,$_[0]);
my $result;
while (defined (my $y = spi_fetchrow($q))) {
$result = $y->{a};
}
spi_freeplan($x);
return $result;
$$ LANGUAGE plperl;
Instead of:
ERROR: type "does_not_exist" does not exist at line 2.
We get:
ERROR: type "does_not_exist" does not exist at -e line 56.
I poked at this for a bit and didn't see a way around that.
Attached is a draft patch that at least fixes the issue for 5.10 and up. I
guess we don't want to commit the regression test unless we want to add 2
new plperl_elog.out files (one for 5.8 and one for the already existing alt
file).
Another thing to note is sv2cstr() can elog(ERROR) if the message has an
invalid byte sequence:
create or replace function perl_test_err() returns text
language plperl as $$
elog(ERROR, "\0");
$$;
-- or
create or replace function perl_test_err() returns text
language plperl as $$
elog(INFO, "\0");
$$;
select perl_test_err();
ERROR: invalid byte sequence for encoding "UTF8": 0x00k
Without the PG_TRY block we can't punt these errors back to perl. Maybe
that does not matter all that much. After-all we could still have an error
converting that new error to utf8 for perl (realistically this probably
falls into the impossible category...). Thoughts?
Attached is a patch that does /not/ take into account errors thrown by
pg_any_to_server() along the lines of Tom's above suggestion. I added a
croak_cstr() function (trying to be in the spirit of the other
plperl_helper functions) and converted various croak("%s", e->message);
over to it. Tested on 5.8.9 and 5.18.2 from 9.1-stable to 9.4
(9.5 doesn't seem to pass a standard make check on my machine atm, but I
don't think there have been any plperl changes).
Note it does not apply cleaning to 9.1-9.3 (trivial rejects), I'm happy to
do the leg work if we like this proposed patch.
Attachments:
plperl_elog_encoding.patchapplication/octet-stream; name=plperl_elog_encoding.patchDownload+58-23
Alex Hunsaker <badalex@gmail.com> writes:
Yeah, I was just looking at that. I couldn't find a good way to support
5.8. Doing: ...
Works, but screws up the error line numbers.
I looked into this and determined that the problem is that the location
info doesn't get appended to ERRSV until after Perl has popped the stack.
A possibly-entirely-wrong hack that seems to fix that is to use mess()
to construct the SV, *and* append the location info, before we croak.
I tried this implementation of croak_cstr:
char *utf8_str = utf_e2u(str);
SV *ssv = mess("%s", utf8_str);
SV *errsv = get_sv("@", GV_ADD);
SvUTF8_on(ssv);
pfree(utf8_str);
sv_setsv(errsv, ssv);
croak(NULL);
with Perl 5.8.9 and it passed your proposed regression test. Obvious
questions include:
* I don't see mess() in the perlapi man page, so is it okay to use?
* Is there any leakage involved in this? (I don't know what prompted
you to add sv_2mortal() to the other implementation, but maybe we need
it here too?)
* Is there some other reason why this is wrong or a bad idea? Since we'd
only use this with versions lacking croak_sv(), future-proofing doesn't
seem like a good argument against it, but maybe there is another one.
BTW, I think we can't actually use this regression test case, because it
won't work as-is in non-UTF8 encodings. But the use of UTF8 characters in
the string doesn't seem like an essential property of the test. However,
other test cases may already provide sufficient coverage if we're not
going to test encoding conversion.
Also, I'm inclined to leave do_util_elog() alone other than replacing
croak("%s", edata->message) with croak_cstr(edata->message). Since
we have to have croak_cstr() anyway, there seems little value in taking
any risk that we've missed some reason why the existing coding there
is important.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alex Hunsaker <badalex@gmail.com> writes:
Yeah, I was just looking at that. I couldn't find a good way to support
5.8. Doing: ...
Works, but screws up the error line numbers.I looked into this and determined that the problem is that the location
info doesn't get appended to ERRSV until after Perl has popped the stack.
I tried this implementation of croak_cstr: ...
Yeah that seems to work great!
* I don't see mess() in the perlapi man page, so is it okay to use?
Looks like they added it to perlapi in 5.12, given that it exists and seems
to work in 5.8 and 5.10 I think we are ok to use it. I tested your above
usage on 5.8, 5.10, 5.12.
* Is there any leakage involved in this? (I don't know what prompted
you to add sv_2mortal() to the other implementation, but maybe we need
it here too?)
No, $@ is global and mess() also uses a global. (unless perl is
destructing).
In the croak_sv() case, we never use/assign what cstr2sv() returns anywhere
in perl land, so it always has a refcount of 1. We have similar usage of
sv_2mortal(cstr2sv()) albeit not on the same line in plperl.c.
* Is there some other reason why this is wrong or a bad idea? Since we'd
only use this with versions lacking croak_sv(), future-proofing doesn't
seem like a good argument against it, but maybe there is another one.
The only argument that comes to mind is AFAIK perl 5.8 and 5.10 are more or
less unmaintained so it might not be worth the effort to make them work.
And it's been this way forever.
BTW, I think we can't actually use this regression test case, [...]
However,
other test cases may already provide sufficient coverage if we're not
going to test encoding conversion.
Agreed. I'll attach it separately just for easy verification that
everything is working as intended.
Also, I'm inclined to leave do_util_elog() alone other than replacing
croak("%s", edata->message) with croak_cstr(edata->message).
Done that way.
The attached was tested on 5.8.9, 5.10.1, 5.12.5, 5.14.4, 5.18.2 and 5.22.0.
Alex Hunsaker <badalex@gmail.com> writes:
The attached was tested on 5.8.9, 5.10.1, 5.12.5, 5.14.4, 5.18.2 and 5.22.0.
Thanks! Barring objections, I'll push this in the morning.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Alex Hunsaker <badalex@gmail.com> writes:
On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, I think we can't actually use this regression test case, [...]
Agreed. I'll attach it separately just for easy verification that
everything is working as intended.
After awhile I remembered that we already had a similar test case in
plpython: it's testing with no-break space (U+A0) which has an equivalent
in most encodings. So I adopted that strategy, and pushed this with
a test case. We'll soon see if the buildfarm likes it.
regards, tom lane
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
On Tue, Sep 29, 2015 at 8:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alex Hunsaker <badalex@gmail.com> writes:
On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, I think we can't actually use this regression test case, [...]
After awhile I remembered that we already had a similar test case in
plpython: it's testing with no-break space (U+A0) which has an equivalent
in most encodings. So I adopted that strategy, and pushed this with
a test case. We'll soon see if the buildfarm likes it.regards, tom lane
I've been keeping a loose eye on the buildfarm, everything looks good so
far.
FYI I also ended up testing 5.16.3 and 5.20.3 which means it should be
tested on every stable version of perl.
Thanks!
Hello,
just reporting that the attached patch solves my problem. Thanks.
Best Regards
Michal Leinweber
On 2015-09-29 05:51, Alex Hunsaker wrote:
On Mon, Sep 28, 2015 at 10:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alex Hunsaker <badalex@gmail.com> writes:
Yeah, I was just looking at that. I couldn't find a good way to
support
5.8. Doing: ...
Works, but screws up the error line numbers.I looked into this and determined that the problem is that the
location
info doesn't get appended to ERRSV until after Perl has popped the
stack.I tried this implementation of croak_cstr: ...
Yeah that seems to work great!
* I don't see mess() in the perlapi man page, so is it okay to use?
Looks like they added it to perlapi in 5.12, given that it exists and
seems to work in 5.8 and 5.10 I think we are ok to use it. I tested
your above usage on 5.8, 5.10, 5.12.* Is there any leakage involved in this? (I don't know what prompted
you to add sv_2mortal() to the other implementation, but maybe we need
it here too?)No, $@ is global and mess() also uses a global. (unless perl is
destructing).In the croak_sv() case, we never use/assign what cstr2sv() returns
anywhere in perl land, so it always has a refcount of 1. We have
similar usage of sv_2mortal(cstr2sv()) albeit not on the same line in
plperl.c.* Is there some other reason why this is wrong or a bad idea? Since
we'd
only use this with versions lacking croak_sv(), future-proofing
doesn't
seem like a good argument against it, but maybe there is another one.The only argument that comes to mind is AFAIK perl 5.8 and 5.10 are
more or less unmaintained so it might not be worth the effort to make
them work. And it's been this way forever.BTW, I think we can't actually use this regression test case, [...]
However,other test cases may already provide sufficient coverage if we're not
going to test encoding conversion.Agreed. I'll attach it separately just for easy verification that
everything is working as intended.Also, I'm inclined to leave do_util_elog() alone other than replacing
croak("%s", edata->message) with croak_cstr(edata->message).Done that way.
The attached was tested on 5.8.9, 5.10.1, 5.12.5, 5.14.4, 5.18.2 and
5.22.0.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs