utf-8 flag always off in plperl function arguments
Hello:
Since 5.6 or so, perl has stored an internal flag on every string to
mark whether it's UTF-8 or not. For data of unknown encoding, such as
data read from files, the default is latin1, but it can be changed with
use encoding 'utf8'. Now, I have a postgresql database in charset
UNICODE. So, postgres knows the data is UTF-8. However, when passing
arguments to plperl functions, no matter what the charset, postgres
ALWAYS sets the UTF-8 flag to off. This means that the only way to
handle the string properly in perl, when it matters that perl knows
it's UTF-8, is to use utf8::upgrade -- on every argument, in every
function, every time. This is rather kludgy, considering there already
exists a way to fix it by calling the libperl API properly. It would be
nice if it could be fixed in 8 final (it's exactly the same in 8 beta
and 7.4.6).
Regards,
Dave
David Kamholz <davekam@pobox.com> writes:
This is rather kludgy, considering there already
exists a way to fix it by calling the libperl API properly.
If you know how to do it, how about offering a patch?
regards, tom lane
David Kamholz <davekam@pobox.com> writes:
*** plperl.c.orig Sat Dec 4 02:09:24 2004 --- plperl.c Sat Dec 4 03:41:33 2004 *************** *** 57,62 **** --- 57,63 ---- #include "utils/lsyscache.h" #include "utils/syscache.h" #include "utils/typcache.h" + #include "mb/pg_wchar.h"
/* perl stuff */
#include "EXTERN.h"
***************
*** 803,814 ****
else
{
char *tmp;
tmp = DatumGetCString(FunctionCall3(&(desc->arg_out_func[i]), fcinfo->arg[i], ObjectIdGetDatum(desc->arg_typioparam[i]), Int32GetDatum(-1))); ! XPUSHs(sv_2mortal(newSVpv(tmp, 0))); pfree(tmp); } } --- 804,818 ---- else { char *tmp; + SV *sv;
tmp = DatumGetCString(FunctionCall3(&(desc->arg_out_func[i]), fcinfo->arg[i], ObjectIdGetDatum(desc->arg_typioparam[i]), Int32GetDatum(-1))); ! sv = newSVpv(tmp, 0); ! if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv); ! XPUSHs(sv_2mortal(sv)); pfree(tmp); } } *************** *** 1553,1558 **** --- 1557,1563 ---- { int i; HV *hv; + SV *sv; Datum attr; bool isnull; char *attname; *************** *** 1601,1608 **** attr, ObjectIdGetDatum(typioparam), Int32GetDatum(tupdesc->attrs[i]->atttypmod))); ! ! hv_store(hv, attname, namelen, newSVpv(outputstr, 0), 0); }
return sv_2mortal(newRV((SV *)hv)); --- 1606,1614 ---- attr, ObjectIdGetDatum(typioparam), Int32GetDatum(tupdesc->attrs[i]->atttypmod))); ! sv = newSVpv(outputstr, 0); ! if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv); ! hv_store(hv, attname, namelen, sv, 0); }
return sv_2mortal(newRV((SV *)hv));
I don't think we can accept this patch as-is, mainly because it is going
to require some configuration checks (older Perls don't seem to have
SvUTF8_on()). That means it's probably too late to consider it for 8.0.
I agree something like this should make its way into 8.1 though.
regards, tom lane
Import Notes
Reply to msg id not found: CC7A1D66-45AC-11D9-863F-000D932F45FA@pobox.com