a couple questions about convert()

Started by Stuart McGrawover 11 years ago3 messagesgeneral
Jump to latest
#1Stuart McGraw
smcg2297@frii.com

Hello,

In a postgresql-9.3.1 database with UTF8 encoding I can do:

select convert_from (E'\\x68656c6c6f', 'LATIN1');
convert_from
--------------
hello

But when I explicitly give the "to" encoding:

select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
convert
--------------
\x68656c6c6f

Why does that second one give different results from the first?

Second question: why is that none of the convert* functions are
marked as immutable (thus preventing me from creating a functional
index using them). Surely if I convert \x68 to utf-8 the result
will *always* be "h", won't it?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Stuart McGraw (#1)
Re: a couple questions about convert()

smcg2297@frii.com writes:

In a postgresql-9.3.1 database with UTF8 encoding I can do:

select convert_from (E'\\x68656c6c6f', 'LATIN1');
convert_from
--------------
hello

But when I explicitly give the "to" encoding:

select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
convert
--------------
\x68656c6c6f

Why does that second one give different results from the first?

convert_from() produces a result of type text. But convert() returns
bytea, because its output is not necessarily valid in the current
database encoding. It's the same bytes, but it prints differently.
(You could ameliorate the unreadability by changing the bytea_output
setting.)

Second question: why is that none of the convert* functions are
marked as immutable (thus preventing me from creating a functional
index using them). Surely if I convert \x68 to utf-8 the result
will *always* be "h", won't it?

Well, no, it'll be whatever the conversion function says it is;
and encoding conversion functions are replaceable through DDL.

I recall some discussion to the effect that that was silly and
we should rip out CREATE CONVERSION and friends in favor of
hard-wired conversion rules. Nothing's been done about it though.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Stuart McGraw
smcg2297@frii.com
In reply to: Tom Lane (#2)
Re: a couple questions about convert()

On 09/19/2014 08:50 AM, Tom Lane wrote:

smcg2297@frii.com writes:

In a postgresql-9.3.1 database with UTF8 encoding I can do:

select convert_from (E'\\x68656c6c6f', 'LATIN1');
convert_from
--------------
hello

But when I explicitly give the "to" encoding:

select convert (E'\\x68656c6c6f', 'LATIN1', 'UTF8');
convert
--------------
\x68656c6c6f

Why does that second one give different results from the first?

convert_from() produces a result of type text. But convert() returns
bytea, because its output is not necessarily valid in the current
database encoding. It's the same bytes, but it prints differently.
(You could ameliorate the unreadability by changing the bytea_output
setting.)

I my case I routinely deal with multi-byte utf-8 characters so the
"escape" format doesn't help much.

Second question: why is that none of the convert* functions are
marked as immutable (thus preventing me from creating a functional
index using them). Surely if I convert \x68 to utf-8 the result
will *always* be "h", won't it?

Well, no, it'll be whatever the conversion function says it is;
and encoding conversion functions are replaceable through DDL.

I recall some discussion to the effect that that was silly and
we should rip out CREATE CONVERSION and friends in favor of
hard-wired conversion rules. Nothing's been done about it though.

Well, if it matters, count me as +1.

Thanks very much for the explanation, things are clearer now. :-)

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general