why non-greedy modifier for one atom changes greediness of other atoms?

Started by hubert depesz lubaczewskiover 16 years ago3 messagesgeneral
Jump to latest

Example:
# select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
x | substring
-----------------+-----------
ab.123xxx.46hfd | ab.1
a.b.c.d.123xx | a.b.c.d.1
(2 rows)

I found in docs, that this is what happens, but I don't understand the
logic behind forcing unique greediness in whole expression.

Also - how can one write a regexp that will match "ab.123" and
"a.b.c.d.123" respectively?

in pl/perl it's of course trivial, but I can't seem to find a way to do it in substring() regexps.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

In reply to: hubert depesz lubaczewski (#1)
Re: why non-greedy modifier for one atom changes greediness of other atoms?

On Mon, Jan 04, 2010 at 11:30:51AM +0100, hubert depesz lubaczewski wrote:

Example:
# select x, substring( x from E'^((.*?)(\\.[0-9]+))') from ( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);
x | substring
-----------------+-----------
ab.123xxx.46hfd | ab.1
a.b.c.d.123xx | a.b.c.d.1
(2 rows)

I found in docs, that this is what happens, but I don't understand the
logic behind forcing unique greediness in whole expression.

Also - how can one write a regexp that will match "ab.123" and
"a.b.c.d.123" respectively?

sorry - it could have be unclear - in case of string 'ab123bc.12xx'
return value should be 'ab123bc.12' - i.e. we have to search to first .
followed by digits and return it from beginning of string to the last of
digits.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007

#3Laurenz Albe
laurenz.albe@cybertec.at
In reply to: hubert depesz lubaczewski (#2)
Re: why non-greedy modifier for one atom changesgreediness of other atoms?

hubert depesz lubaczewski wrote:

Example:
# select x, substring( x from E'^((.*?)(\\.[0-9]+))') from

( values ('ab.123xxx.46hfd'),('a.b.c.d.123xx')) as q (x);

x | substring
-----------------+-----------
ab.123xxx.46hfd | ab.1
a.b.c.d.123xx | a.b.c.d.1
(2 rows)

I found in docs, that this is what happens, but I don't understand the
logic behind forcing unique greediness in whole expression.

Yes, that's odd.

Also - how can one write a regexp that will match "ab.123" and
"a.b.c.d.123" respectively?

sorry - it could have be unclear - in case of string 'ab123bc.12xx'
return value should be 'ab123bc.12' - i.e. we have to search to first .
followed by digits and return it from beginning of string to the last of
digits.

You could add a negative lookahead to exclude digits after the last match:

... substring(x from E'^(.*?\\.\\d+(?!\\d))') ...

Yours,
Laurenz Albe