BUG #15046: non-greedy ignored

Started by PG Bug reporting formabout 8 years ago4 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 15046
Logged by: Bob Gailer
Email address: bgailer@gmail.com
PostgreSQL version: 10.1
Operating system: windows 10
Description:

I start psql; enter:

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)', '', 'g');
regexp_replace
----------------
asf
(1 row)

Works as expected. Then I add |q to the pattern, and the .*? becomes
greedy!

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)|q', '', 'g');
regexp_replace
----------------
af
(1 row)

#2David G. Johnston
david.g.johnston@gmail.com
In reply to: PG Bug reporting form (#1)
Re: BUG #15046: non-greedy ignored

On Friday, February 2, 2018, PG Bug reporting form <noreply@postgresql.org>
wrote:

The following bug has been logged on the website:

Bug reference: 15046
Logged by: Bob Gailer
Email address: bgailer@gmail.com
PostgreSQL version: 10.1
Operating system: windows 10
Description:

I start psql; enter:

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)', '', 'g');
regexp_replace
----------------
asf
(1 row)

Works as expected. Then I add |q to the pattern, and the .*? becomes
greedy!

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)|q', '', 'g');
regexp_replace
----------------
af
(1 row)

This seems to be explained by the final greediness rule:

https://www.postgresql.org/docs/10/static/functions-matching.html#POSIX-MATCHING-RULES

-

An RE consisting of two or more branches connected by the | operator is
always greedy.

David J.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: David G. Johnston (#2)
Re: BUG #15046: non-greedy ignored

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Friday, February 2, 2018, PG Bug reporting form <noreply@postgresql.org>
wrote:

Works as expected. Then I add |q to the pattern, and the .*? becomes
greedy!

This seems to be explained by the final greediness rule:
https://www.postgresql.org/docs/10/static/functions-matching.html#POSIX-MATCHING-RULES
An RE consisting of two or more branches connected by the | operator is
always greedy.

Yeah. That subsection also contains some useful advice about how to
control greediness decisions --- in this case, wrapping the whole
thing with (...){1,1}? might do what you want.

The short answer, perhaps, is that non-greedy patterns are not
standardized by POSIX and you shouldn't expect that all regex
engines do them the same way. Ours is definitely different
from Perl's, for example.

regards, tom lane

#4bob gailer
bgailer@gmail.com
In reply to: Tom Lane (#3)
Re: BUG #15046: non-greedy ignored

Thanks! Rtfp, eh?

On Feb 2, 2018 8:48 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

Show quoted text

"David G. Johnston" <david.g.johnston@gmail.com> writes:

On Friday, February 2, 2018, PG Bug reporting form <

noreply@postgresql.org>

wrote:

Works as expected. Then I add |q to the pattern, and the .*? becomes
greedy!

This seems to be explained by the final greediness rule:
https://www.postgresql.org/docs/10/static/functions-

matching.html#POSIX-MATCHING-RULES

An RE consisting of two or more branches connected by the | operator

is

always greedy.

Yeah. That subsection also contains some useful advice about how to
control greediness decisions --- in this case, wrapping the whole
thing with (...){1,1}? might do what you want.

The short answer, perhaps, is that non-greedy patterns are not
standardized by POSIX and you shouldn't expect that all regex
engines do them the same way. Ours is definitely different
from Perl's, for example.

regards, tom lane