Regex problem
I'm usually ok at Regex stuff, but this one is driving me a bit crazy.
Here's a string in a single field. I'm trying to grab the long db query bit.
---------------------------------------------------------------
initial time: 0.0001058578491210
After _request set time: 0.0001859664916992
Before include modules time: 0.001070976257324
Before session_start time: 0.003780841827392
SessionHandler read() start time: 0.004056930541992
SessionHandler read() query finished: SELECT * FROM sessions WHERE
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6' time:
0.005122900009155
After session start time: 0.005219936370849
After create db time: 0.005784034729003
before create new session time: 0.005914926528930
session call constructor 1 time: 0.005953073501586
session call constructor (org loaded) time: 0.008623838424682
session call constructor (finished) time: 0.01247286796569LONG DB
QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.03999090194
Session set up time: 5.040019989013
Behavior loaded time: 5.040072917938
Start of page body time: 5.129977941513
End of page body time: 6.25822091102
---------------------------------------------------------
I'm using this substring to grab part of it:
select substring (notes from E'LONG DB QUERY.+time: [0-9]+.[0-9]+')
from table where id=1;
And that returns this:
LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.03999090194
: Session set up time: 5.040019989013
: Behavior loaded time: 5.040072917938
: Start of page body time: 5.129977941513
: End of page body time: 6.25822091102
Which is not surprising. It's greedy. So, I turn off the greediness
of the first + with a ? and then I get this
select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
from table where id=1;
LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.0
Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
5.03999090194 at the end. I know the . is a regex match for one char
there. There's only ever one number before it, but changing the . to
\. doesn't help either.
Any ideas? I'm guessing some old hand at regex will look at it and
see what I'm doing wrong, but I'm not seeing it.
"Scott Marlowe" <scott.marlowe@gmail.com> writes:
...Which is not surprising. It's greedy. So, I turn off the greediness
of the first + with a ? and then I get this
select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
from table where id=1;
LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.0
Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
5.03999090194 at the end.
You're getting bit by the fact that the initial non-greedy quantifier
makes the entire regex non-greedy --- see rules in section 9.7.3.5:
http://developer.postgresql.org/pgdocs/postgres/functions-matching.html#POSIX-MATCHING-RULES
If you know that there will always be something after the first time
value, you could do something like
E'(LONG DB QUERY.+?time: [0-9]+\\.[0-9]+)[^0-9]'
to force the issue about how much the second and third quantifiers
match.
regards, tom lane
On Thu, Jul 10, 2008 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"Scott Marlowe" <scott.marlowe@gmail.com> writes:
...Which is not surprising. It's greedy. So, I turn off the greediness
of the first + with a ? and then I get thisselect substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
from table where id=1;LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.0Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
5.03999090194 at the end.You're getting bit by the fact that the initial non-greedy quantifier
makes the entire regex non-greedy --- see rules in section 9.7.3.5:
http://developer.postgresql.org/pgdocs/postgres/functions-matching.html#POSIX-MATCHING-RULESIf you know that there will always be something after the first time
value, you could do something likeE'(LONG DB QUERY.+?time: [0-9]+\\.[0-9]+)[^0-9]'
to force the issue about how much the second and third quantifiers
match.
Thanks Tom, that's the exact answer I needed. Now, it's back to the
bit mines...