BUG #3645: regular expression back references seem broken
The following bug has been logged online:
Bug reference: 3645
Logged by: Eric Haszlakiewicz
Email address: erh+pgsql@swapsimple.com
PostgreSQL version: 8.2.5
Operating system: NetBSD
Description: regular expression back references seem broken
Details:
I was attempting to create a simple regular expression that uses back
references and I noticed some very odd behaviour. This regexp is supposed
to match a string where all the characters are the same:
^(.)\1*$
If I try it, it doesn't work. I would expect this to return false:
template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)
But adding some extra parens does:
template1=# select 'xyz' ~ E'^(.)(\\1)*$';
?column?
----------
f
(1 row)
As does changing the "." to an "x":
template1=# select 'xyz' ~ E'^(x)\\1*$';
?column?
----------
f
(1 row)
As does forcing it to be a extended regular expression:
template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
?column?
----------
f
(1 row)
The docs claim: "A single non-zero digit, not followed by another digit, is
always taken as a back reference." (The note at the end of 9.7.3.3)
It's relatively easy to work around the problem, but it certainly led to a
fair bit of head scratching while trying to debug some code. :)
"Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes:
I would expect this to return false:
template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)
Seems to be a bug in the Tcl regexp library we use. It's already
reported upstream:
https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894
regards, tom lane
Tom Lane wrote:
"Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes:
I would expect this to return false:
template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)Seems to be a bug in the Tcl regexp library we use. It's already
reported upstream:
https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894regards, tom lane
er.. it's been languishing there for over 2 years. That doesn't sound
very promising for getting it fixed. :(
eric
Added to TODO:
* Fix regular expression bug when using complex back-references
http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php
---------------------------------------------------------------------------
Eric Haszlakiewicz wrote:
The following bug has been logged online:
Bug reference: 3645
Logged by: Eric Haszlakiewicz
Email address: erh+pgsql@swapsimple.com
PostgreSQL version: 8.2.5
Operating system: NetBSD
Description: regular expression back references seem broken
Details:I was attempting to create a simple regular expression that uses back
references and I noticed some very odd behaviour. This regexp is supposed
to match a string where all the characters are the same:^(.)\1*$
If I try it, it doesn't work. I would expect this to return false:
template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)But adding some extra parens does:
template1=# select 'xyz' ~ E'^(.)(\\1)*$';
?column?
----------
f
(1 row)As does changing the "." to an "x":
template1=# select 'xyz' ~ E'^(x)\\1*$';
?column?
----------
f
(1 row)As does forcing it to be a extended regular expression:
template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
?column?
----------
f
(1 row)The docs claim: "A single non-zero digit, not followed by another digit, is
always taken as a back reference." (The note at the end of 9.7.3.3)It's relatively easy to work around the problem, but it certainly led to a
fair bit of head scratching while trying to debug some code. :)---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +