Emulating flexible regex replace
Hello,
my scenario is this: I have a *SEGMENT* table with two text fields, *source*
and *target*. From the user, I get the following input:
/source pattern/
/target pattern/
Where both patterns are regexes and moreover the target pattern contains
references to the source in the following way:
Supposing *source* matches the /source pattern/, the $/n/ expressions inside
the /target pattern/ correspond to the captured groups inside *source*.
Example:
Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target text
This yields a successful match since $1 in the /target pattern/ is replaced
by "123" from the first captured group in *source* and the resulting string,
"123 target text", matches the /target pattern/.
I would like to execute a query which for a given /source pattern/ and
/target pattern/ returns all rows from the *SEGMENT* table where *source*
matches the /source pattern/ and *target* matches the /target pattern/ after
it has its references replaced with the actual captured groups.
I believe this is not possible since *regexp_replace* expects a string as
its /replacement/ argument which is not enough in this case. This kind of
stuff is easy in e.g. C# where for regex replace you can provide a function
which receives the (in this case) reference index as its argument and you
can build the replacement string using external knowledge.
However, as I am no pro in Postgres, I may be missing something and
therefore I ask: is it possible to somehow mimic the behavior of
hypothetical *regexp_replace* which would accept a function of the
to-be-replaced value and would return the replacement string based on that.
And as I am thinking about it, even that would not suffice since that
function would need to access not only the to-be-replaced value but also the
corresponding source pattern match.
Still, isn't there some super clever way to do that?
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
twoflower <standa.kurik@gmail.com> writes:
Supposing *source* matches the /source pattern/, the $/n/ expressions inside
the /target pattern/ correspond to the captured groups inside *source*.
Example:
Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target text
This yields a successful match since $1 in the /target pattern/ is replaced
by "123" from the first captured group in *source* and the resulting string,
"123 target text", matches the /target pattern/.
I would like to execute a query which for a given /source pattern/ and
/target pattern/ returns all rows from the *SEGMENT* table where *source*
matches the /source pattern/ and *target* matches the /target pattern/ after
it has its references replaced with the actual captured groups.
I believe this is not possible since *regexp_replace* expects a string as
its /replacement/ argument which is not enough in this case.
Well, you could pull out the source text captures with regexp_matches,
escape them somehow (don't think there's a built-in function for that),
insert them into the target pattern with regexp_replace, and then apply
the target pattern with a simple regexp match operator. Kinda tedious,
but hardly "not possible".
A lot of people feel that this sort of text-mashing requirement is best
handled in plperl, but if you don't want to use that for some reason,
it's surely possible in plpgsql.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
twoflower wrote
Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target textStill, isn't there some super clever way to do that?
You use "\1" instead of "$1"
SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123
def'
http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
9.7.3 - paragraph beginning "The regexp_replace function provides
substitution..."
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824046.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
David G Johnston wrote
twoflower wrote
Source: 123 source text
Target: 123 target text
Source pattern: ([0-9]+) source text
Target pattern: $1 target textStill, isn't there some super clever way to do that?
You use "\1" instead of "$1"
SELECT regexp_replace('123 abc','(\d+)\s(\w+)','\1 def'); --output: '123
def'http://www.postgresql.org/docs/9.3/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP
9.7.3 - paragraph beginning "The regexp_replace function provides
substitution..."David J.
<reading this a few more times>
Is it possible to express the WHERE clause as:
regexp_replace(source, source_pattern, target_pattern) = target
maybe with a substring check instead of equals?
David J.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824065.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
This yields a successful match since $1 in the /target pattern/ is replaced
by "123" from the first captured group in *source* and the resulting
string,
"123 target text", matches the /target pattern/.
I would like to execute a query which for a given /source pattern/ and
/target pattern/ returns all rows from the *SEGMENT* table where *source*
matches the /source pattern/ and *target* matches the /target pattern/
after
it has its references replaced with the actual captured groups.
I believe this is not possible since *regexp_replace* expects a string as
its /replacement/ argument which is not enough in this case.
-----
GuL
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824034p5824107.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general