Regex bug

Started by David Fetterover 21 years ago4 messagesbugs
Jump to latest
#1David Fetter
david@fetter.org

Kind people,

Here's a symptom as reported by John Hansen aka applejack:

SELECT 'r'||'\000\125'||'hello' ~ '^.hello' AS "OMG";
OMG
-----
t
(1 row)

I have produced this behavior in 7.4.3 and CVS tip.

This should be false, shouldn't it?

Cheers,
D
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#1)
Re: Regex bug

David Fetter <david@fetter.org> writes:

Here's a symptom as reported by John Hansen aka applejack:

SELECT 'r'||'\000\125'||'hello' ~ '^.hello' AS "OMG";

This is not a regex bug: it has to do with the fact that we don't
support embedded nulls in text values. This may enlighten you
a bit as to what's happening:

regression=# select length ('\000\125');
length
--------
0
(1 row)

regards, tom lane

#3David Fetter
david@fetter.org
In reply to: Tom Lane (#2)
Re: Regex bug

On Fri, Aug 06, 2004 at 01:32:32PM -0400, Tom Lane wrote:

David Fetter <david@fetter.org> writes:

Here's a symptom as reported by John Hansen aka applejack:

SELECT 'r'||'\000\125'||'hello' ~ '^.hello' AS "OMG";

This is not a regex bug: it has to do with the fact that we don't
support embedded nulls in text values. This may enlighten you
a bit as to what's happening:

regression=# select length ('\000\125');
length
--------
0
(1 row)

Ah, right. John was testing his unicode patch, so there must be some
magick underneath that distinguishes characters from bytes :)

Cheers,
D (feeling a little sheepish. again.)
--
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100 mobile: +1 415 235 3778

Remember to vote!

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: David Fetter (#3)
Re: Regex bug

David Fetter <david@fetter.org> writes:

On Fri, Aug 06, 2004 at 01:32:32PM -0400, Tom Lane wrote:

regression=# select length ('\000\125');
length
--------
0
(1 row)

Ah, right. John was testing his unicode patch, so there must be some
magick underneath that distinguishes characters from bytes :)

Cheers,
D (feeling a little sheepish. again.)

It occurs to me that a case could be made for having text_in throw an
error if it sees '\000'. I cannot really see that there's any benefit
to the current behavior of (effectively) silently truncating the string.

Comments?

regards, tom lane