Emulating flexible regex replace
( Forgot reply all, forwarding a copy, sorry for the noise. )
Hullo.
On Thu, Oct 23, 2014 at 4:03 PM, twoflower <standa.kurik@gmail.com> wrote:
my scenario is this: I have a *SEGMENT* table with two text fields,
*source*
and *target*. From the user, I get the following input:/source pattern/
/target pattern/Where both patterns are regexes and moreover the target pattern contains
references to the source in the following way:Supposing *source* matches the /source pattern/, the $/n/ expressions
inside
the /target pattern/ correspond to the captured groups inside *source*.
*If* you have some available char sequence which is not going to be in any
of them and your regexp are not too esoteric, you could just try to match
source || 'XyZzYX' || target against source_pattern || 'XyZzYX' ||
target_pattern, replace separator as needed.
If your patterns are ( not ) anchored you may need to insert some .* / ^ /
$ and multiline modifiers, but this normally works for me ( in perl
normally, but should be easy to do ).
Your performance maybe really bad if your tables are big and you have no
more conditions, but any query with general patterns tend to behave this
way.
Francisco Olarte.
Thank you Francisco, that's a clever idea. However, I don't think this would
reduce the complexity since the target pattern can contain
1) regular back-references (referencing to matches of its own)
2) the special source text references I mentioned
Obviously, these will have to be written in a different way and this I
believe brings me back to start (or in other words, it's not a silver bullet
obviating the need to rewrite the target pattern manually).
I will probably end up writing a function in PL/Perl which Tom Lane
suggested since I'm apparently not skilled in SQL enough to be able to do it
using a single query without using custom functions.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824058p5824109.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi:
On Fri, Oct 24, 2014 at 8:24 AM, twoflower <standa.kurik@gmail.com> wrote:
Thank you Francisco, that's a clever idea. However, I don't think this
would
reduce the complexity since the target pattern can contain1) regular back-references (referencing to matches of its own)
2) the special source text references I mentioned
Well, if I had this problem I would consider solving part of it in the
application, not in the database ( like rewriting two patterns into one on
the app ).
Obviously, these will have to be written in a different way and this I
believe brings me back to start (or in other words, it's not a silver
bullet
obviating the need to rewrite the target pattern manually).
Yeah, I would solve it renumbering backreferences, but it will be a genuine
PITA, specially if you want to allow an arbitrary number of them.
I will probably end up writing a function in PL/Perl which Tom Lane
suggested since I'm apparently not skilled in SQL enough to be able to do
it
using a single query without using custom functions.
You can do two things with a pl/perl function, one is write and use it in
the condition of a general query ( where supermatch(s,t,s_p,t_p) ) or,
depending on your query, you may write a table return function ( select
whatever from my_fancy_func(s_p, t_p, whatever_else ). And, IMO, trying to
solve everything in SQL is not always TRTTD. There are a lot of special
apps which needs to be solved in a mix of code, and in problems as complex
as the one you are proposing ( like, when filtering CDRs by esoteric
criteria in my apps ) I've many times found that the easier ( and faster )
way is let the db do it's stuff in sql and postfilter it in the app ( in my
case, let the db do time range, duration and number prefix filtering, which
it's really good at, reducing the dataset from billions to tens of
thousands of records which it streams really fast into a perl app which
then does the later fancy conditions in a breeze ).
Regards.
Francisco Olarte.
Thank you Francisco.
In fact, I am already solving part of the problem in my application -
fetching from the DB the records matching the source pattern and then
filtering them in the application's memory by matching against the target
pattern, with the references replaced (it's a breeze in C#).
It works very vell. However, I am not completely satisfied with i as it's
unnecessarily loading larger data set than it absolutely must. Besides, I'd
also like to get some experience in DB programming. That's why the PL/Perl
way seems pretty attractive to me.
--
View this message in context: http://postgresql.1045698.n5.nabble.com/Emulating-flexible-regex-replace-tp5824058p5824174.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Hi:
On Fri, Oct 24, 2014 at 6:13 PM, twoflower <standa.kurik@gmail.com> wrote:
....
It works very vell. However, I am not completely satisfied with i as it's
unnecessarily loading larger data set than it absolutely must. Besides, I'd
also like to get some experience in DB programming. That's why the PL/Perl
way seems pretty attractive to me.
Try it them. But bear in mind you are not *unnecessarily* loading the data
set, the server will need to load it to apply the plperl filters, you will
just avoid loading it in the client and transmitting it. You are making a
trade off, a design decission. You are trading some network and client cpu
usage for some server cpu usage and coding complexity. The tricky part is
the the coding complexity, which needs a lot of resources you may never
recover. Part of the experience in DB programming is knowing what NOT to do
in the database, so coding this will be good, the worst thing that could
happen is you do it and then drop it after learning one thing.
Francisco Olarte.