Quick Regex Question

Started by Howard Coleover 18 years ago15 messagesgeneral
Jump to latest
#1Howard Cole
howardnews@selestial.com

Hi all,

I don't understand the last result:

select 'Ho Ho Ho' ~* '^Ho'; returns true
select 'Ho Ho Ho' ~* ' Ho'; returns true
select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
space between ^ and ])

From my limited experience of regex, the last one is searching for either
'Ho' preceeeded by space or
'Ho' at the beginning of a string.

How come it returns false?

Thanks.

P.S. The Ho's are Santa type Ho's - Not the other kind.

#2Florian Aumeier
faumeier@mediaventures.de
In reply to: Howard Cole (#1)
Re: Quick Regex Question

hi

select 'Ho Ho Ho' ~* '^Ho'; returns true
select 'Ho Ho Ho' ~* ' Ho'; returns true
select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
space between ^ and ])

"A /bracket expression/ is a list of characters enclosed in []. It
normally matches any single character from the list (but see below). If
the list begins with ^, it matches any single character /not/ from the
rest of the list."

from:
http://www.postgresql.org/docs/8.3/static/functions-matching.html#POSIX-BRACKET-EXPRESSIONS

Regards
Florian

--
Media Ventures GmbH
Jabber-ID faumeier@mabber.de
Telefon +49 (0) 2236 480 10 22

#3Richard Huxton
dev@archonet.com
In reply to: Howard Cole (#1)
Re: Quick Regex Question

Howard Cole wrote:

Hi all,

I don't understand the last result:

select 'Ho Ho Ho' ~* '^Ho'; returns true
select 'Ho Ho Ho' ~* ' Ho'; returns true
select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
space between ^ and ])

From my limited experience of regex, the last one is searching for either
'Ho' preceeeded by space or
'Ho' at the beginning of a string.

No, it's searching for not-space, the ^ inverts the meaning of the
square brackets. You probably want something like '(^Ho)|( Ho)'

--
Richard Huxton
Archonet Ltd

#4Ivan Sergio Borgonovo
mail@webthatworks.it
In reply to: Howard Cole (#1)
Re: Quick Regex Question

On Thu, 20 Dec 2007 09:56:00 +0000
Howard Cole <howardnews@selestial.com> wrote:

Hi all,

I don't understand the last result:

select 'Ho Ho Ho' ~* '^Ho'; returns true

There is actualli a Ho at the beginning of the string.

select 'Ho Ho Ho' ~* ' Ho'; returns true

There are actually 2 ' Ho'

select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is
a space between ^ and ])

There is no some character excluding space plus Ho.
What's missing is you're asking for some character before Ho.
The first Ho doesn't have a character preceding it.
The 2 other Ho have one... but it is a space and you don't want it.

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

#5Howard Cole
howardnews@selestial.com
In reply to: Ivan Sergio Borgonovo (#4)
Re: Quick Regex Question

Florian, Richard, Ivan.

Fantastic response thank you very much.

#6Howard Cole
howardnews@selestial.com
In reply to: Richard Huxton (#3)
Re: Quick Regex Question

Richard Huxton wrote:

Howard Cole wrote:

Hi all,

I don't understand the last result:

select 'Ho Ho Ho' ~* '^Ho'; returns true
select 'Ho Ho Ho' ~* ' Ho'; returns true
select 'Ho Ho Ho' ~* '[^ ]Ho'; returns false (Please note there is a
space between ^ and ])

From my limited experience of regex, the last one is searching for
either
'Ho' preceeeded by space or
'Ho' at the beginning of a string.

No, it's searching for not-space, the ^ inverts the meaning of the
square brackets. You probably want something like '(^Ho)|( Ho)'

Your expression works fine Richard, as does '(^| )ho', but can you tell
me why '[ ^]ho' doesn't work?

#7A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Howard Cole (#6)
Re: Quick Regex Question

am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:

Your expression works fine Richard, as does '(^| )ho', but can you tell
me why '[ ^]ho' doesn't work?

With ^ you means an anchor, but within the brackets it's a simple char.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

#8Martijn van Oosterhout
kleptog@svana.org
In reply to: A. Kretschmer (#7)
Re: Quick Regex Question

On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:

am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:

Your expression works fine Richard, as does '(^| )ho', but can you tell
me why '[ ^]ho' doesn't work?

With ^ you means an anchor, but within the brackets it's a simple char.

Err no, it inverts the test. [^ ] means any character *except* a space.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

Those who make peaceful revolution impossible will make violent revolution inevitable.
-- John F Kennedy

#9Richard Huxton
dev@archonet.com
In reply to: Martijn van Oosterhout (#8)
Re: Quick Regex Question

Martijn van Oosterhout wrote:

On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:

am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:

Your expression works fine Richard, as does '(^| )ho', but can you tell
me why '[ ^]ho' doesn't work?

With ^ you means an anchor, but within the brackets it's a simple char.

Err no, it inverts the test. [^ ] means any character *except* a space.

But only if it's the first character within the brackets.

Which is the opposite of how "-" behaves inside square-brackets of course.

Aren't regexps fun :-)

--
Richard Huxton
Archonet Ltd

#10A. Kretschmer
andreas.kretschmer@schollglas.com
In reply to: Martijn van Oosterhout (#8)
Re: Quick Regex Question

am Thu, dem 20.12.2007, um 12:03:57 +0100 mailte Martijn van Oosterhout folgendes:

On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:

am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:

Your expression works fine Richard, as does '(^| )ho', but can you tell
me why '[ ^]ho' doesn't work?

With ^ you means an anchor, but within the brackets it's a simple char.

Err no, it inverts the test. [^ ] means any character *except* a space.

I know, but only if the ^ at the beginning, or no?

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

#11Howard Cole
howardnews@selestial.com
In reply to: Martijn van Oosterhout (#8)
Re: Quick Regex Question

Martijn van Oosterhout wrote:

On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:

am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole folgendes:

Your expression works fine Richard, as does '(^| )ho', but can you tell
me why '[ ^]ho' doesn't work?

With ^ you means an anchor, but within the brackets it's a simple char.

Err no, it inverts the test. [^ ] means any character *except* a space.

Have a nice day,

Hi Marijn, Andreas,

I think Andreas is right, note the ordering of characters in the above
example as [ ^] rather than [^ ].
So if the '^' is taken as literal '^', can I check for the beginning of
a string in the brackets, or am I forced to use the (^| ) syntax?

Is it just me or are regular expressions crazy?

Howard

#12Howard Cole
howardnews@selestial.com
In reply to: Howard Cole (#11)
Re: Quick Regex Question

Howard Cole wrote:

Martijn van Oosterhout wrote:

On Thu, Dec 20, 2007 at 11:51:34AM +0100, A. Kretschmer wrote:

am Thu, dem 20.12.2007, um 10:36:08 +0000 mailte Howard Cole
folgendes:

Your expression works fine Richard, as does '(^| )ho', but can you
tell me why '[ ^]ho' doesn't work?

With ^ you means an anchor, but within the brackets it's a simple char.

Err no, it inverts the test. [^ ] means any character *except* a space.

Have a nice day,

Hi Marijn, Andreas,

I think Andreas is right, note the ordering of characters in the above
example as [ ^] rather than [^ ].
So if the '^' is taken as literal '^', can I check for the beginning
of a string in the brackets, or am I forced to use the (^| ) syntax?

Is it just me or are regular expressions crazy?

Howard

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Sorry - I have just read the relevant section of the manual again and it
is starting to make sense. I shall use the (^| ) syntax as suggested.
Thanks for all the help.

#13Terry Fielder
terry@ashtonwoodshomes.com
In reply to: Howard Cole (#11)
Re: Quick Regex Question

<Snip>
Howard Cole wrote:

Hi Marijn, Andreas,

I think Andreas is right, note the ordering of characters in the above
example as [ ^] rather than [^ ].
So if the '^' is taken as literal '^', can I check for the beginning
of a string in the brackets,

Why do you need to? Check for the beginning of the string BEFORE the
set brackets. The point of set brackets is "match from a set of
chars". Since "beginning of string" can only match one place, it has no
meaning as a member of a set. Or in other words, if it has meaning, it
needs to be matched FIRST out of the set, and therefore you can just
remove from the set and put before the set brackets.

or am I forced to use the (^| ) syntax?

Is it just me or are regular expressions crazy?

Complicated, not crazy.

Terry

Show quoted text

Howard

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

#14Howard Cole
howardnews@selestial.com
In reply to: Terry Fielder (#13)
Re: Quick Regex Question

Terry Fielder wrote:

Why do you need to? Check for the beginning of the string BEFORE the
set brackets. The point of set brackets is "match from a set of
chars". Since "beginning of string" can only match one place, it has
no meaning as a member of a set. Or in other words, if it has
meaning, it needs to be matched FIRST out of the set, and therefore
you can just remove from the set and put before the set brackets.

or am I forced to use the (^| ) syntax?

Is it just me or are regular expressions crazy?

Complicated, not crazy.

Terry

Hmm. Still think they are crazy - sometimes the characters are
interpreted as literals - other times not? Thats crazy in my book! It
would make more sense to me if you had to escape the characters inside
the [ ] as they seem to be everywhere else. There is possibly a good
reason for this - But perhaps they are just crazy!!!
;)

I am trying to match the beginning of a name, so to search for
'how' in 'Howard Cole' should match
'col' in 'Howard Cole' should match
'ole' in 'Howard Cole' should NOT match,

So using ~* '(^| )col' works for me! As would '(^col| col)' etc.

Just as an aside, is there a function that escapes my search string so
that any special regex characters are replaced? For example, if I was
going to search for 'howard.cole' in the search string it would convert
to 'howard[:.:]cole' or 'howard\.cole' - and then convert that into a
postgres compatible string!

#15Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Howard Cole (#14)
Re: Quick Regex Question

Howard Cole wrote:

Hmm. Still think they are crazy - sometimes the characters are interpreted
as literals - other times not? Thats crazy in my book!

Yeah. ^, like a lot of other chars, means different things when at the
beggining of a [] (where it means "negate the character class") than
any other position inside the [] (where it means "a literal ^") than
outside [] (where it means "anchor to beginning of string").

I am trying to match the beginning of a name, so to search for
'how' in 'Howard Cole' should match
'col' in 'Howard Cole' should match
'ole' in 'Howard Cole' should NOT match,

So using ~* '(^| )col' works for me! As would '(^col| col)' etc.

I think you are looking for [[:<:]] which means "beginning of word":

alvherre=# select 'Howard Cole' ~* '[[:<:]]ole';
?column?
----------
f
(1 row)

alvherre=# select 'Howard Cole' ~* '[[:<:]]col';
?column?
----------
t
(1 row)

I use to know the symbol as \< on other regex engines. It is also
known as \m on Postgres. It is not specified by the standard, so be
careful with it. Note double backslash is needed:

alvherre=# select 'Howard Cole' ~* e'\\mcol';
?column?
----------
t
(1 row)

alvherre=# select 'Howard Cole' ~* e'\\mole';
?column?
----------
f
(1 row)

Just as an aside, is there a function that escapes my search string so that
any special regex characters are replaced? For example, if I was going to
search for 'howard.cole' in the search string it would convert to
'howard[:.:]cole' or 'howard\.cole' - and then convert that into a postgres
compatible string!

Hmm, I have no idea about that.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support