Re: create custom collation from case insensitive portuguese

Started by Euler Taveira de Oliveiraabout 15 years ago4 messages

Alexandre Riveira escreveu:

I've achieved some success in changing collate operating system (linux)
to generate sort of way of Brazil Portuguese hopes by adding the
following code in LC_COLLATE

This was already discussed; search the archives [1]http://pgfoundry.org/pipermail/brasil-usuarios/20060330/001667.html [2]http://www.mail-archive.com/brasil-usuarios@pgfoundry.org/msg00895.html.

So far, I understood the mechanism of change collate and reproduce in
postgresql, and I could not generate a case-insensitive search, I
believe that would change within the LC_COLLATE variable, but could not
go any further than that.

PostgreSQL doesn't support case-insensitive searches specifying the collate
per column yet. Look at [3]http://archives.postgresql.org/pgsql-hackers/2010-07/msg00512.php. But you could use ILIKE or regular expression to
achieve that.

[1]: http://pgfoundry.org/pipermail/brasil-usuarios/20060330/001667.html
[2]: http://www.mail-archive.com/brasil-usuarios@pgfoundry.org/msg00895.html
[3]: http://archives.postgresql.org/pgsql-hackers/2010-07/msg00512.php

--
Euler Taveira de Oliveira
http://www.timbira.com/

#2Robert Haas
robertmhaas@gmail.com
In reply to: Euler Taveira de Oliveira (#1)

On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:

Alexandre Riveira escreveu:

I've achieved some success in changing collate operating system (linux)
to generate sort of way of Brazil Portuguese hopes by adding the
following code in LC_COLLATE

This was already discussed; search the archives [1] [2].

So far, I understood the mechanism of change collate and reproduce in
postgresql, and I could not generate a case-insensitive search, I
believe that would change within the LC_COLLATE variable, but could not
go any further than that.

PostgreSQL doesn't support case-insensitive searches specifying the collate
per column yet. Look at [3]. But you could use ILIKE or regular expression to
achieve  that.

Is citext also useful for this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#3Dimitri Fontaine
dimitri@2ndQuadrant.fr
In reply to: Euler Taveira de Oliveira (#1)

Alexandre Riveira <alexandre@objectdata.com.br> writes:

When mentioned in Portuguese case-insensitive in fact we are also talking
about accent-insensitive

See unaccent dictionary, but don't use only this one in your text search
configuration, IIRC.

http://www.postgresql.org/docs/9/static/unaccent.html

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

#4Alexandre Riveira
alexandre@objectdata.com.br
In reply to: Robert Haas (#2)

Thank you all for your help !

When mentioned in Portuguese case-insensitive in fact we are also
talking about accent-insensitive

A common example is that the name Jose and Jos� also can be written,
citext or ilike only not solve the problem

My progress is ...

Edit file /usr/share/i18n/locales/i18n e alter section tolower /, an
example:

(<U00C9>,<U00E9>) e alter for (<U00C9>,<U0065>)

LOWER reproduce: LOWER("ITAPAG�") => "itapage",

Example success:

SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%')

this behavior is reproduced in citext (logradouro is column citext)

SELECT * FROM endereco WHERE logradouro = 'itapage'
or
SELECT * FROM endereco WHERE logradouro LIKE 'itapage%'

All examples return the desired value "ITAPAG�"
Issue:

SELECT * FROM endereco WHERE logradouro LIKE 'itapage%' NOT USE INDEX
I tried
CREATE INDEX cep_ik_logradouro ON cep USING btree (logradouro);
CREATE INDEX like_index ON cep(logradouro varchar_pattern_ops);
CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops);

I've had success with using index
SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%')
and CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops)
But I want to solve using only citext

Tank's

Alexandre Riveira
Brazil

Robert Haas escreveu:

Show quoted text

On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira
<euler@timbira.com> wrote:

Alexandre Riveira escreveu:

I've achieved some success in changing collate operating system (linux)
to generate sort of way of Brazil Portuguese hopes by adding the
following code in LC_COLLATE

This was already discussed; search the archives [1] [2].

So far, I understood the mechanism of change collate and reproduce in
postgresql, and I could not generate a case-insensitive search, I
believe that would change within the LC_COLLATE variable, but could not
go any further than that.

PostgreSQL doesn't support case-insensitive searches specifying the collate
per column yet. Look at [3]. But you could use ILIKE or regular expression to
achieve that.

Is citext also useful for this?