convert accented character to base character

Started by Neubert Joachimalmost 16 years ago5 messagesgeneral
Jump to latest
#1Neubert Joachim
J.Neubert@zbw.eu

I want to convert accented characters to the according base character, e.g. "Ü" or "Ú" to "U".

Is there a way to do this with pgsql functions?

postgres=# select convert('Ü', 'UTF8', 'SQL_ASCII');

convert

----------

\303\234

did not work as I had hoped.

Any help would be appreciated -

Cheers, Joachim

#2Thom Brown
thombrown@gmail.com
In reply to: Neubert Joachim (#1)
Re: convert accented character to base character

On 28 June 2010 16:22, Neubert Joachim <J.Neubert@zbw.eu> wrote:

I want to convert accented characters to the according base character, e.g.
"Ü" or "Ú" to "U".

Is there a way to do this with pgsql functions?

  postgres=# select convert('Ü', 'UTF8', 'SQL_ASCII');

   convert

  ----------

   \303\234

did not work as I had hoped.

Any help would be appreciated -

Cheers, Joachim

There's a function called unaccent coming in PostgreSQL 9.0:
http://www.postgresql.org/docs/9.0/static/unaccent.html

But in the meantime, you could try this:

CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
DECLARE
input_string text := $1;
BEGIN

input_string := translate(input_string,
'âãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ',
'aaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu');

return input_string;
END;
$$ LANGUAGE plpgsql;

Then you can do:

select unaccent_string('Ü');

Someone else may have a better suggestion though.

Regards

Thom

#3John R Pierce
pierce@hogranch.com
In reply to: Neubert Joachim (#1)
Re: convert accented character to base character

On 06/28/10 8:22 AM, Neubert Joachim wrote:

I want to convert accented characters to the according base character,
e.g. "Ü" or "Ú" to "U".

Is there a way to do this with pgsql functions?

translate(somestring, 'ÙÚÛÜŨŪŬŮŰŲùúûüũūŭůűų', 'UUUUUUUUUUuuuuuuuuuu')

of course, expand those with all the other characters you want translated.

#4Thom Brown
thombrown@gmail.com
In reply to: Thom Brown (#2)
Re: convert accented character to base character

On 28 June 2010 16:33, Thom Brown <thombrown@gmail.com> wrote:

On 28 June 2010 16:22, Neubert Joachim <J.Neubert@zbw.eu> wrote:

I want to convert accented characters to the according base character, e.g.
"Ü" or "Ú" to "U".

Is there a way to do this with pgsql functions?

  postgres=# select convert('Ü', 'UTF8', 'SQL_ASCII');

   convert

  ----------

   \303\234

did not work as I had hoped.

Any help would be appreciated -

Cheers, Joachim

There's a function called unaccent coming in PostgreSQL 9.0:
http://www.postgresql.org/docs/9.0/static/unaccent.html

But in the meantime, you could try this:

CREATE OR REPLACE FUNCTION unaccent_string(text) RETURNS text AS $$
DECLARE
   input_string text := $1;
BEGIN

input_string := translate(input_string,
'âãäåāăąÁÂÃÄÅĀĂĄèééêëēĕėęěĒĔĖĘĚìíîïìĩīĭÌÍÎÏÌĨĪĬóôõöōŏőÒÓÔÕÖŌŎŐùúûüũūŭůÙÚÛÜŨŪŬŮ',
'aaaaaaaaaaaaaaaeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiooooooooooooooouuuuuuuuuuuuuuuu');

That should actually be
'aaaaaaaAAAAAAAAeeeeeeeeeeEEEEEiiiiiiiiIIIIIIIIoooooooOOOOOOOOuuuuuuuuUUUUUUUU');

Show quoted text

return input_string;
END;
$$ LANGUAGE plpgsql;

Then you can do:

select unaccent_string('Ü');

Someone else may have a better suggestion though.

Regards

Thom

#5Mathieu De Zutter
mathieu@dezutter.org
In reply to: Neubert Joachim (#1)
Re: convert accented character to base character

On 28/06/10 17:22, Neubert Joachim wrote:

I want to convert accented characters to the according base character,
e.g. "�" or "�" to "U".

I have the following:

CREATE OR REPLACE FUNCTION unaccent(TEXT) RETURNS TEXT AS
$$
use Text::Unaccent;
return unac_string('UTF8', $_[0]);
$$ LANGUAGE plperlu IMMUTABLE STRICT;

On debian you need the following package:
libtext-unaccent-perl

Kind regards,
Mathieu