Re: substring and POSIX re's

Started by Tom Lanealmost 21 years ago2 messagesgeneral
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

Don Isgitt <djisgitt@soundenergy.com> writes:

gds2=# select substring('NE NE SE 2310 FSL 330 FEL' from '^([A-Z][A-Z] )+');
substring
-----------
SE
(1 row)

The pg docs say that this form of substring uses POSIX re's, and my
understanding of POSIX re's is they are always greedy. So, why do I get
only SE instead of NE NE SE? Pilot error, probably, but would someone
please enlighten me? Thank you very much.

I think you want

regression=# select substring('NE NE SE 2310 FSL 330 FEL' from '^(([A-Z][A-Z] )+)');
substring
-----------
NE NE SE
(1 row)

ie, you need the "+" to be *inside* the capturing parentheses. When
it's outside, I guess the engine chooses to consider the last match
of the parenthesized subexpression as the thing to return. (I can't
recall if this choice is specified in the docs or not.)

regards, tom lane

#2Don Isgitt
djisgitt@soundenergy.com
In reply to: Tom Lane (#1)

Tom Lane wrote:

Don Isgitt <djisgitt@soundenergy.com> writes:

gds2=# select substring('NE NE SE 2310 FSL 330 FEL' from '^([A-Z][A-Z] )+');
substring
-----------
SE
(1 row)

The pg docs say that this form of substring uses POSIX re's, and my
understanding of POSIX re's is they are always greedy. So, why do I get
only SE instead of NE NE SE? Pilot error, probably, but would someone
please enlighten me? Thank you very much.

I think you want

regression=# select substring('NE NE SE 2310 FSL 330 FEL' from '^(([A-Z][A-Z] )+)');
substring
-----------
NE NE SE
(1 row)

ie, you need the "+" to be *inside* the capturing parentheses. When
it's outside, I guess the engine chooses to consider the last match
of the parenthesized subexpression as the thing to return. (I can't
recall if this choice is specified in the docs or not.)

regards, tom lane

Thanks, Tom. Interestingly enough, neither my original query or your
corrected one returns anything with pg 7.4--another good reason to
upgrade to 8.*

Don