Regex match not back-referencing in function

Started by Thom Brownabout 14 years ago4 messagesgeneral
Jump to latest
#1Thom Brown
thom@linux.com

Hi,

Could someone explain the following behaviour?

SELECT regexp_replace(E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1');

This returns:

regexp_replace
------------------------
Hello \& goodbye
(1 row)

So it matched:

SELECT chr(92);
chr
-----
\
(1 row)

But notice that when I append the value it's supposed to have matched
to the end of the replacement value, it shows it should be '&'.

Just to confirm:

SELECT ascii('&');
ascii
-------
38
(1 row)

So I'd expect the output of the original statement to be:

regexp_replace
------------------------
Hello && goodbye
(1 row)

What am I missing?

--
Thom

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Thom Brown (#1)
Re: Regex match not back-referencing in function

Thom Brown <thom@linux.com> writes:

What am I missing?

I might be more confused than you, but I think you're supposing that
the result of ascii(E'\\1') has something to do with the match that
the surrounding regexp_replace function will find, later on when it
gets executed. The actual arguments seen by regexp_replace are

regression=# select E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1';
?column? | ?column? | ?column?
------------------+----------+----------
Hello & goodbye | ([&]) | &#92;\1
(1 row)

and given that, the result looks perfectly fine to me.

If there's a bug here, it's that ascii() ignores additional bytes in its
input instead of throwing an error for a string with more than one
character. But I believe we've discussed that in the past and decided
not to change it.

regards, tom lane

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: Thom Brown (#1)
Re: Regex match not back-referencing in function

On Feb 12, 2012, at 13:26, Thom Brown <thom@linux.com> wrote:

Hi,

Could someone explain the following behaviour?

SELECT regexp_replace(E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1');

This returns:

regexp_replace
------------------------
Hello &#92;& goodbye
(1 row)

So it matched:

SELECT chr(92);
chr
-----
\
(1 row)

But notice that when I append the value it's supposed to have matched
to the end of the replacement value, it shows it should be '&'.

Just to confirm:

SELECT ascii('&');
ascii
-------
38
(1 row)

So I'd expect the output of the original statement to be:

regexp_replace
------------------------
Hello &#38;& goodbye
(1 row)

What am I missing?

--
Thom

The "ASCII" function call is evaluated independently of, and before, the regexp_replace function call and so the E'\\1' has no special meaning. It only has special meaning inside of the regexp_replace function.

Try just evaluating ascii(E'\\1') by itself and confirm you get "92".

David J.

#4Thom Brown
thom@linux.com
In reply to: Tom Lane (#2)
Re: Regex match not back-referencing in function

On 12 February 2012 18:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thom Brown <thom@linux.com> writes:

What am I missing?

I might be more confused than you, but I think you're supposing that
the result of ascii(E'\\1') has something to do with the match that
the surrounding regexp_replace function will find, later on when it
gets executed.  The actual arguments seen by regexp_replace are

regression=# select E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1';
    ?column?     | ?column? | ?column?
------------------+----------+----------
 Hello & goodbye  | ([&])    | &#92;\1
(1 row)

and given that, the result looks perfectly fine to me.

If there's a bug here, it's that ascii() ignores additional bytes in its
input instead of throwing an error for a string with more than one
character.  But I believe we've discussed that in the past and decided
not to change it.

Okay, in that case I made the wrong assumptions about order of resolution.

Thanks

--
Thom