Emulating flexible regex replace

Started by twoflowerover 11 years ago5 messagesgeneral
Jump to latest
#1twoflower
standa.kurik@gmail.com

Hello,

my scenario is this: I have a *SEGMENT* table with two text fields, *source*
and *target*. From the user, I get the following input:

/source pattern/
/target pattern/

Where both patterns are regexes and moreover the target pattern contains
references to the source in the following way:

Supposing *source* matches the /source pattern/, the $/n/ expressions inside
the /target pattern/ correspond to the captured groups inside *source*.

Example:

Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target text

This yields a successful match since $1 in the /target pattern/ is replaced
by "123" from the first captured group in *source* and the resulting string,
"123 target text", matches the /target pattern/.

I would like to execute a query which for a given /source pattern/ and
/target pattern/ returns all rows from the *SEGMENT* table where *source*
matches the /source pattern/ and *target* matches the /target pattern/ after
it has its references replaced with the actual captured groups.

I believe this is not possible since *regexp_replace* expects a string as
its /replacement/ argument which is not enough in this case. This kind of
stuff is easy in e.g. C# where for regex replace you can provide a function
which receives the (in this case) reference index as its argument and you
can build the replacement string using external knowledge.

However, as I am no pro in Postgres, I may be missing something and
therefore I ask: is it possible to somehow mimic the behavior of
hypothetical *regexp_replace* which would accept a function of the
to-be-replaced value and would return the replacement string based on that.

And as I am thinking about it, even that would not suffice since that
function would need to access not only the to-be-replaced value but also the
corresponding source pattern match.

Still, isn't there some super clever way to do that?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: twoflower (#1)
Re: Emulating flexible regex replace

twoflower <standa.kurik@gmail.com> writes:

Supposing *source* matches the /source pattern/, the $/n/ expressions inside
the /target pattern/ correspond to the captured groups inside *source*.

Example:

Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target text

This yields a successful match since $1 in the /target pattern/ is replaced
by "123" from the first captured group in *source* and the resulting string,
"123 target text", matches the /target pattern/.

I would like to execute a query which for a given /source pattern/ and
/target pattern/ returns all rows from the *SEGMENT* table where *source*
matches the /source pattern/ and *target* matches the /target pattern/ after
it has its references replaced with the actual captured groups.

I believe this is not possible since *regexp_replace* expects a string as
its /replacement/ argument which is not enough in this case.

Well, you could pull out the source text captures with regexp_matches,
escape them somehow (don't think there's a built-in function for that),
insert them into the target pattern with regexp_replace, and then apply
the target pattern with a simple regexp match operator. Kinda tedious,
but hardly "not possible".

A lot of people feel that this sort of text-mashing requirement is best
handled in plperl, but if you don't want to use that for some reason,
it's surely possible in plpgsql.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3David G. Johnston
david.g.johnston@gmail.com
In reply to: twoflower (#1)
Re: Emulating flexible regex replace

twoflower wrote

Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target text

Still, isn't there some super clever way to do that?

You use "\1" instead of "$1"

SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123
def'

http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP

9.7.3 - paragraph beginning "The regexp_replace function provides
substitution..."

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824046.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4David G. Johnston
david.g.johnston@gmail.com
In reply to: David G. Johnston (#3)
Re: Emulating flexible regex replace

David G Johnston wrote

twoflower wrote

Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target text

Still, isn't there some super clever way to do that?

You use "\1" instead of "$1"

SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123
def'

http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP

9.7.3 - paragraph beginning "The regexp_replace function provides
substitution..."

David J.

<reading this a few more times>

Is it possible to express the WHERE clause as:

regexp_replace(source, source_pattern, target_pattern) = target

maybe with a substring check instead of equals?

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824065.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5blackbee045
blackbee045@gmail.com
In reply to: David G. Johnston (#4)
Re: Emulating flexible regex replace

This yields a successful match since $1 in the /target pattern/ is replaced

by "123" from the first captured group in *source* and the resulting
string,
"123 target text", matches the /target pattern/.

I would like to execute a query which for a given /source pattern/ and
/target pattern/ returns all rows from the *SEGMENT* table where *source*
matches the /source pattern/ and *target* matches the /target pattern/
after
it has its references replaced with the actual captured groups.

I believe this is not possible since *regexp_replace* expects a string as
its /replacement/ argument which is not enough in this case.

-----
GuL
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824107.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general