Using regexp_replace to remove small words

Started by Henrique de Lima Trindadeover 15 years ago9 messagesgeneral
Jump to latest
#1Henrique de Lima Trindade
henrique@vivver.com.br

Hi,

I'm trying to find a regular expression that removes all small (length < N)
words from a string. But, until now I've not been successful.

For example:

If I pass 'George W Bush' as parameter, I want regexp_replace to return
'George Bush'.

Other examples are:

select regexp_replace( 'Tommy Lee Jones', 'REGEX', ' ', 'g' ) => 'Tommy
Jones'

select regexp_replace( 'Dr Tommy Lee Jones', 'REGEX', ' ', 'g' ) => 'Tommy
Jones'

select regexp_replace( 'Tommy Lee Jones Jr', 'REGEX', ' ', 'g' ) => 'Tommy
Jones'

Thanks in advance,

HENRIQUE TRINDADE

ANALISTA DE SISTEMA II

( 55 (31) 3025-3550

+henrique@vivver.com.br

assinatura

Attachments:

image002.jpgimage/jpeg; name=image002.jpgDownload
#2Peter Eisentraut
peter_e@gmx.net
In reply to: Henrique de Lima Trindade (#1)
Re: Using regexp_replace to remove small words

On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote:

I'm trying to find a regular expression that removes all small (length < N)
words from a string. But, until now I've not been successful.

Here is a start:

select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );

If you want to normalize the spaces after the removal and handle
beginning and end of the word, you will need to expand this to cover
those cases, but the example should contain the key ingredients.

#3rodrigo mendonca
rodrigo@vivver.com.br
In reply to: Peter Eisentraut (#2)
Re: Using regexp_replace to remove small words

Hi Peter,

your REGEX is exactly what we were trying to find.

Thank you very much!

Rodrigo.

--
<http://www.vivver.com.br/&gt; *Rodrigo Mendonça**
Diretor de Tecnologia
Tel: 31 3025 3550 / Cel: 31 8489 3119
rodrigo@vivver.com.br / diretoria@vivver.com.br*

#4Henrique de Lima Trindade
henrique@vivver.com.br
In reply to: Peter Eisentraut (#2)
RES: Using regexp_replace to remove small words

Hi Peter,

Your example works perfectly. But, I need Your help with on another situation.
We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error:

-----------------
create or replace function sp_remove_small_words( ptext text ) returns text immutable as
$$
begin

return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );

end;
$$ language plpgsql
;
-----------------
ERRO: erro de sintaxe em ou próximo a "\"
LINE 6: return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );
^

********** Error **********

ERRO: erro de sintaxe em ou próximo a "\"
SQL state: 42601
Character: 138

Thanks again!

-----Mensagem original-----
De: Peter Eisentraut [mailto:peter_e@gmx.net]
Enviada em: sexta-feira, 10 de dezembro de 2010 20:59
Para: Henrique de Lima Trindade
Cc: pgsql-general@postgresql.org
Assunto: Re: [GENERAL] Using regexp_replace to remove small words

On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote:

I'm trying to find a regular expression that removes all small (length < N)
words from a string. But, until now I've not been successful.

Here is a start:

select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );

If you want to normalize the spaces after the removal and handle
beginning and end of the word, you will need to expand this to cover
those cases, but the example should contain the key ingredients.

#5Vick Khera
vivek@khera.org
In reply to: Peter Eisentraut (#2)
Re: Using regexp_replace to remove small words

On Fri, Dec 10, 2010 at 5:59 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );

speaking of regular expressions... the other day I was trying to find
where the 'flags' option to regexp_replace() is described, but I
cannot find it in the 9.0 manual in any obvious place. it is not
described in the string functions section, nor in the regexp section.

#6Vick Khera
vivek@khera.org
In reply to: Henrique de Lima Trindade (#4)
Re: Using regexp_replace to remove small words

On Mon, Dec 13, 2010 at 7:24 AM, Henrique de Lima Trindade
<henrique@vivver.com.br> wrote:

Your example works perfectly. But, I need Your help with on another situation.
We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error:

You're using $$ quoting for the function, and $$ quoting for the
string in the regexp. Think of it like you're trying to embed a ' in
a string delimited by ' also.

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vick Khera (#5)
Re: Using regexp_replace to remove small words

Vick Khera <vivek@khera.org> writes:

speaking of regular expressions... the other day I was trying to find
where the 'flags' option to regexp_replace() is described, but I
cannot find it in the 9.0 manual in any obvious place. it is not
described in the string functions section, nor in the regexp section.

It's described in the same paragraph that describes regexp_replace, in
http://www.postgresql.org/docs/9.0/static/functions-matching.html

The regexp_replace function provides substitution of new text for
substrings that match POSIX regular expression patterns. It has the
syntax regexp_replace(source, pattern, replacement [, flags ]). The
source string is returned unchanged if there is no match to the
pattern. If there is a match, the source string is returned with the
replacement string substituted for the matching substring. The
replacement string can contain \n, where n is 1 through 9, to
indicate that the source substring matching the n'th parenthesized
subexpression of the pattern should be inserted, and it can contain
\& to indicate that the substring matching the entire pattern should
be inserted. Write \\ if you need to put a literal backslash in the
replacement text. (As always, remember to double backslashes written
in literal constant strings, assuming escape string syntax is used.)
The flags parameter is an optional text string containing zero or
more single-letter flags that change the function's behavior. Flag i
specifies case-insensitive matching, while flag g specifies
replacement of each matching substring rather than only the first
one. Other supported flags are described in Table 9-19.

regards, tom lane

#8Robert Gravsjö
robert@blogg.se
In reply to: Henrique de Lima Trindade (#4)
Re: RES: Using regexp_replace to remove small words

On 2010-12-13 13.24, Henrique de Lima Trindade wrote:

Hi Peter,

Your example works perfectly. But, I need Your help with on another situation.
We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error:

-----------------
create or replace function sp_remove_small_words( ptext text ) returns text immutable as
$$
begin

return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );

end;
$$ language plpgsql
;
-----------------
ERRO: erro de sintaxe em ou próximo a "\"
LINE 6: return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' );

You're ending the function declaration with the $$. Using $fun$ as For
example, the following works:

create or replace function sp_remove_small_words( ptext text ) returns
text immutable as
$fun$
begin
return regexp_replace( ptext,
$$\y\w{1,3}\y$$, '', 'g' ); end;
$fun$
language plpgsql;

For details on dollar quoting see:
http://www.postgresql.org/docs/current/interactive/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING

Regards,
roppert

^

********** Error **********

ERRO: erro de sintaxe em ou próximo a "\"
SQL state: 42601
Character: 138

Thanks again!

-----Mensagem original-----
De: Peter Eisentraut [mailto:peter_e@gmx.net]
Enviada em: sexta-feira, 10 de dezembro de 2010 20:59
Para: Henrique de Lima Trindade
Cc: pgsql-general@postgresql.org
Assunto: Re: [GENERAL] Using regexp_replace to remove small words

On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote:

I'm trying to find a regular expression that removes all small (length< N)
words from a string. But, until now I've not been successful.

Here is a start:

select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' );

If you want to normalize the spaces after the removal and handle
beginning and end of the word, you will need to expand this to cover
those cases, but the example should contain the key ingredients.

--
Regards,
Robert "roppert" Gravsjö

#9Vick Khera
vivek@khera.org
In reply to: Tom Lane (#7)
Re: Using regexp_replace to remove small words

On Mon, Dec 13, 2010 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Flag i
   specifies case-insensitive matching, while flag g specifies
   replacement of each matching substring rather than only the first
   one. Other supported flags are described in Table 9-19.

Thanks. Quite well hidden in plain sight. :(

As a user, I was looking for a table similar to 9-19, since I
discounted it because it described the embedded flags. Based on the
above text though it seems they are also valid for the flags
parameter.