Regex match not back-referencing in function
Hi,
Could someone explain the following behaviour?
SELECT regexp_replace(E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1');
This returns:
regexp_replace
------------------------
Hello \& goodbye
(1 row)
So it matched:
SELECT chr(92);
chr
-----
\
(1 row)
But notice that when I append the value it's supposed to have matched
to the end of the replacement value, it shows it should be '&'.
Just to confirm:
SELECT ascii('&');
ascii
-------
38
(1 row)
So I'd expect the output of the original statement to be:
regexp_replace
------------------------
Hello && goodbye
(1 row)
What am I missing?
--
Thom
Thom Brown <thom@linux.com> writes:
What am I missing?
I might be more confused than you, but I think you're supposing that
the result of ascii(E'\\1') has something to do with the match that
the surrounding regexp_replace function will find, later on when it
gets executed. The actual arguments seen by regexp_replace are
regression=# select E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1';
?column? | ?column? | ?column?
------------------+----------+----------
Hello & goodbye | ([&]) | \\1
(1 row)
and given that, the result looks perfectly fine to me.
If there's a bug here, it's that ascii() ignores additional bytes in its
input instead of throwing an error for a string with more than one
character. But I believe we've discussed that in the past and decided
not to change it.
regards, tom lane
On Feb 12, 2012, at 13:26, Thom Brown <thom@linux.com> wrote:
Hi,
Could someone explain the following behaviour?
SELECT regexp_replace(E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1');This returns:
regexp_replace
------------------------
Hello \& goodbye
(1 row)So it matched:
SELECT chr(92);
chr
-----
\
(1 row)But notice that when I append the value it's supposed to have matched
to the end of the replacement value, it shows it should be '&'.Just to confirm:
SELECT ascii('&');
ascii
-------
38
(1 row)So I'd expect the output of the original statement to be:
regexp_replace
------------------------
Hello && goodbye
(1 row)What am I missing?
--
Thom
The "ASCII" function call is evaluated independently of, and before, the regexp_replace function call and so the E'\\1' has no special meaning. It only has special meaning inside of the regexp_replace function.
Try just evaluating ascii(E'\\1') by itself and confirm you get "92".
David J.
On 12 February 2012 18:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Thom Brown <thom@linux.com> writes:
What am I missing?
I might be more confused than you, but I think you're supposing that
the result of ascii(E'\\1') has something to do with the match that
the surrounding regexp_replace function will find, later on when it
gets executed. The actual arguments seen by regexp_replace areregression=# select E'Hello & goodbye ',E'([&])','&#' ||
ascii(E'\\1') || E';\\1';
?column? | ?column? | ?column?
------------------+----------+----------
Hello & goodbye | ([&]) | \\1
(1 row)and given that, the result looks perfectly fine to me.
If there's a bug here, it's that ascii() ignores additional bytes in its
input instead of throwing an error for a string with more than one
character. But I believe we've discussed that in the past and decided
not to change it.
Okay, in that case I made the wrong assumptions about order of resolution.
Thanks
--
Thom