BUG #3645: regular expression back references seem broken

Started by Eric Haszlakiewiczover 18 years ago4 messagesbugs
Jump to latest
#1Eric Haszlakiewicz
erh+pgsql@swapsimple.com

The following bug has been logged online:

Bug reference: 3645
Logged by: Eric Haszlakiewicz
Email address: erh+pgsql@swapsimple.com
PostgreSQL version: 8.2.5
Operating system: NetBSD
Description: regular expression back references seem broken
Details:

I was attempting to create a simple regular expression that uses back
references and I noticed some very odd behaviour. This regexp is supposed
to match a string where all the characters are the same:

^(.)\1*$

If I try it, it doesn't work. I would expect this to return false:

template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)

But adding some extra parens does:
template1=# select 'xyz' ~ E'^(.)(\\1)*$';
?column?
----------
f
(1 row)

As does changing the "." to an "x":

template1=# select 'xyz' ~ E'^(x)\\1*$';
?column?
----------
f
(1 row)

As does forcing it to be a extended regular expression:

template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
?column?
----------
f
(1 row)

The docs claim: "A single non-zero digit, not followed by another digit, is
always taken as a back reference." (The note at the end of 9.7.3.3)

It's relatively easy to work around the problem, but it certainly led to a
fair bit of head scratching while trying to debug some code. :)

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Eric Haszlakiewicz (#1)
Re: BUG #3645: regular expression back references seem broken

"Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes:

I would expect this to return false:

template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)

Seems to be a bug in the Tcl regexp library we use. It's already
reported upstream:
https://sourceforge.net/tracker/index.php?func=detail&amp;aid=1115587&amp;group_id=10894&amp;atid=110894

regards, tom lane

#3Eric Haszlakiewicz
erh+pgsql@swapsimple.com
In reply to: Tom Lane (#2)
Re: BUG #3645: regular expression back references seem broken

Tom Lane wrote:

"Eric Haszlakiewicz" <erh+pgsql@swapsimple.com> writes:

I would expect this to return false:

template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)

Seems to be a bug in the Tcl regexp library we use. It's already
reported upstream:
https://sourceforge.net/tracker/index.php?func=detail&amp;aid=1115587&amp;group_id=10894&amp;atid=110894

regards, tom lane

er.. it's been languishing there for over 2 years. That doesn't sound
very promising for getting it fixed. :(

eric

#4Bruce Momjian
bruce@momjian.us
In reply to: Eric Haszlakiewicz (#1)
Re: BUG #3645: regular expression back references seem broken

Added to TODO:

* Fix regular expression bug when using complex back-references

http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php

---------------------------------------------------------------------------

Eric Haszlakiewicz wrote:

The following bug has been logged online:

Bug reference: 3645
Logged by: Eric Haszlakiewicz
Email address: erh+pgsql@swapsimple.com
PostgreSQL version: 8.2.5
Operating system: NetBSD
Description: regular expression back references seem broken
Details:

I was attempting to create a simple regular expression that uses back
references and I noticed some very odd behaviour. This regexp is supposed
to match a string where all the characters are the same:

^(.)\1*$

If I try it, it doesn't work. I would expect this to return false:

template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)

But adding some extra parens does:
template1=# select 'xyz' ~ E'^(.)(\\1)*$';
?column?
----------
f
(1 row)

As does changing the "." to an "x":

template1=# select 'xyz' ~ E'^(x)\\1*$';
?column?
----------
f
(1 row)

As does forcing it to be a extended regular expression:

template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
?column?
----------
f
(1 row)

The docs claim: "A single non-zero digit, not followed by another digit, is
always taken as a back reference." (The note at the end of 9.7.3.3)

It's relatively easy to work around the problem, but it certainly led to a
fair bit of head scratching while trying to debug some code. :)

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +