Regex problem

Started by Scott Marlowealmost 18 years ago3 messagesgeneral
Jump to latest
#1Scott Marlowe
scott.marlowe@gmail.com

I'm usually ok at Regex stuff, but this one is driving me a bit crazy.

Here's a string in a single field. I'm trying to grab the long db query bit.

---------------------------------------------------------------

initial time: 0.0001058578491210
After _request set time: 0.0001859664916992
Before include modules time: 0.001070976257324
Before session_start time: 0.003780841827392
SessionHandler read() start time: 0.004056930541992
SessionHandler read() query finished: SELECT * FROM sessions WHERE
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6' time:
0.005122900009155
After session start time: 0.005219936370849
After create db time: 0.005784034729003
before create new session time: 0.005914926528930
session call constructor 1 time: 0.005953073501586
session call constructor (org loaded) time: 0.008623838424682
session call constructor (finished) time: 0.01247286796569LONG DB
QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.03999090194
Session set up time: 5.040019989013
Behavior loaded time: 5.040072917938
Start of page body time: 5.129977941513
End of page body time: 6.25822091102

---------------------------------------------------------

I'm using this substring to grab part of it:

select substring (notes from E'LONG DB QUERY.+time: [0-9]+.[0-9]+')
from table where id=1;

And that returns this:

LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.03999090194
: Session set up time: 5.040019989013
: Behavior loaded time: 5.040072917938
: Start of page body time: 5.129977941513
: End of page body time: 6.25822091102

Which is not surprising. It's greedy. So, I turn off the greediness
of the first + with a ? and then I get this

select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
from table where id=1;

LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.0

Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
5.03999090194 at the end. I know the . is a regex match for one char
there. There's only ever one number before it, but changing the . to
\. doesn't help either.

Any ideas? I'm guessing some old hand at regex will look at it and
see what I'm doing wrong, but I'm not seeing it.

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Scott Marlowe (#1)
Re: Regex problem

"Scott Marlowe" <scott.marlowe@gmail.com> writes:

...Which is not surprising. It's greedy. So, I turn off the greediness
of the first + with a ? and then I get this

select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
from table where id=1;

LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.0

Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
5.03999090194 at the end.

You're getting bit by the fact that the initial non-greedy quantifier
makes the entire regex non-greedy --- see rules in section 9.7.3.5:
http://developer.postgresql.org/pgdocs/postgres/functions-matching.html#POSIX-MATCHING-RULES

If you know that there will always be something after the first time
value, you could do something like

E'(LONG DB QUERY.+?time: [0-9]+\\.[0-9]+)[^0-9]'

to force the issue about how much the second and third quantifiers
match.

regards, tom lane

#3Scott Marlowe
scott.marlowe@gmail.com
In reply to: Tom Lane (#2)
Re: Regex problem

On Thu, Jul 10, 2008 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"Scott Marlowe" <scott.marlowe@gmail.com> writes:

...Which is not surprising. It's greedy. So, I turn off the greediness
of the first + with a ? and then I get this

select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
from table where id=1;

LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.0

Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
5.03999090194 at the end.

You're getting bit by the fact that the initial non-greedy quantifier
makes the entire regex non-greedy --- see rules in section 9.7.3.5:
http://developer.postgresql.org/pgdocs/postgres/functions-matching.html#POSIX-MATCHING-RULES

If you know that there will always be something after the first time
value, you could do something like

E'(LONG DB QUERY.+?time: [0-9]+\\.[0-9]+)[^0-9]'

to force the issue about how much the second and third quantifiers
match.

Thanks Tom, that's the exact answer I needed. Now, it's back to the
bit mines...