BUG #5257: wrong results of SUBSTRING with SQL regular expressions

Started by Roman Kononovover 16 years ago3 messagesbugs
Jump to latest
#1Roman Kononov
kononov@ftml.net

The following bug has been logged online:

Bug reference: 5257
Logged by: Roman Kononov
Email address: kononov@ftml.net
PostgreSQL version: 8.4.2
Operating system: GNU/Linux x86_64
Description: wrong results of SUBSTRING with SQL regular expressions
Details:

test=# select substring('34' from '(2|3)#"4#"' for '#');
substring
-----------
3
(1 row)

test=# select substring('^' from '#"^#"' for '#');
substring
-----------

(1 row)

test=# select substring('$' from '#"$#"' for '#');
substring
-----------

(1 row)

These look wrong according to the PG documentation.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Roman Kononov (#1)
Re: BUG #5257: wrong results of SUBSTRING with SQL regular expressions

"Roman Kononov" <kononov@ftml.net> writes:

test=# select substring('34' from '(2|3)#"4#"' for '#');
substring
-----------
3
(1 row)

Hmm. I guess we need to translate ( and ) to non-capturing parens.

test=# select substring('^' from '#"^#"' for '#');
substring
-----------

(1 row)

test=# select substring('$' from '#"$#"' for '#');
substring
-----------

(1 row)

These cases are already fixed in HEAD.
http://archives.postgresql.org/pgsql-committers/2009-10/msg00048.php

regards, tom lane

#3Roman Kononov
kononov@ftml.net
In reply to: Roman Kononov (#1)
Re: BUG #5257: wrong results of SUBSTRING with SQL regular expressions

I don't know what SQL2008 says about SUBSTRING, but SQL2003 says in
ISO/IEC 9075-2:2003 (E), 6.29 <string value function>, General Rules,
5), page 261:

d) If R [the regular expression] does not contain exactly two
occurrences of the two-character sequence consisting of E [the escape
character], each immediately followed by <double quote>, then an
exception condition is raised: data exception - invalid use of escape
character.

This means that the following is wrong:

test-std=# select substring('a' from 'a' for '#');
substring
-----------
a
(1 row)