utf-8 flag always off in plperl function arguments

Started by David Kamholzover 21 years ago3 messagesbugs
Jump to latest
#1David Kamholz
davekam@pobox.com

Hello:

Since 5.6 or so, perl has stored an internal flag on every string to
mark whether it's UTF-8 or not. For data of unknown encoding, such as
data read from files, the default is latin1, but it can be changed with
use encoding 'utf8'. Now, I have a postgresql database in charset
UNICODE. So, postgres knows the data is UTF-8. However, when passing
arguments to plperl functions, no matter what the charset, postgres
ALWAYS sets the UTF-8 flag to off. This means that the only way to
handle the string properly in perl, when it matters that perl knows
it's UTF-8, is to use utf8::upgrade -- on every argument, in every
function, every time. This is rather kludgy, considering there already
exists a way to fix it by calling the libperl API properly. It would be
nice if it could be fixed in 8 final (it's exactly the same in 8 beta
and 7.4.6).

Regards,
Dave

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Kamholz (#1)
Re: utf-8 flag always off in plperl function arguments

David Kamholz <davekam@pobox.com> writes:

This is rather kludgy, considering there already
exists a way to fix it by calling the libperl API properly.

If you know how to do it, how about offering a patch?

regards, tom lane

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Kamholz (#1)
Re: [BUGS] utf-8 flag always off in plperl function arguments

David Kamholz <davekam@pobox.com> writes:

*** plperl.c.orig	Sat Dec  4 02:09:24 2004
--- plperl.c	Sat Dec  4 03:41:33 2004
***************
*** 57,62 ****
--- 57,63 ----
#include "utils/lsyscache.h"
#include "utils/syscache.h"
#include "utils/typcache.h"
+ #include "mb/pg_wchar.h"

/* perl stuff */
#include "EXTERN.h"
***************
*** 803,814 ****
else
{
char *tmp;

tmp = DatumGetCString(FunctionCall3(&(desc->arg_out_func[i]),
fcinfo->arg[i],
ObjectIdGetDatum(desc->arg_typioparam[i]),
Int32GetDatum(-1)));
! 			XPUSHs(sv_2mortal(newSVpv(tmp, 0)));
pfree(tmp);
}
}
--- 804,818 ----
else
{
char	   *tmp;
+ 			SV			*sv;
tmp = DatumGetCString(FunctionCall3(&(desc->arg_out_func[i]),
fcinfo->arg[i],
ObjectIdGetDatum(desc->arg_typioparam[i]),
Int32GetDatum(-1)));
! 			sv = newSVpv(tmp, 0);
! 			if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv);
! 			XPUSHs(sv_2mortal(sv));
pfree(tmp);
}
}
***************
*** 1553,1558 ****
--- 1557,1563 ----
{
int			i;
HV		   *hv;
+ 	SV			*sv;
Datum		attr;
bool		isnull;
char	   *attname;
***************
*** 1601,1608 ****
attr,
ObjectIdGetDatum(typioparam),
Int32GetDatum(tupdesc->attrs[i]->atttypmod)));
! 
! 		hv_store(hv, attname, namelen, newSVpv(outputstr, 0), 0);
}
return sv_2mortal(newRV((SV *)hv));
--- 1606,1614 ----
attr,
ObjectIdGetDatum(typioparam),
Int32GetDatum(tupdesc->attrs[i]->atttypmod)));
! 		sv = newSVpv(outputstr, 0);
! 		if (GetDatabaseEncoding() == PG_UTF8) SvUTF8_on(sv);
! 		hv_store(hv, attname, namelen, sv, 0);
}

return sv_2mortal(newRV((SV *)hv));

I don't think we can accept this patch as-is, mainly because it is going
to require some configuration checks (older Perls don't seem to have
SvUTF8_on()). That means it's probably too late to consider it for 8.0.
I agree something like this should make its way into 8.1 though.

regards, tom lane