back references using regex
Hi.
I'm trying to do a slice directly from a table so I
can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.
Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much.
Many thanks.
MP
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/
On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew Peter wrote:
I'm trying to do a slice directly from a table so I
can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.
Are you trying to extract the first N words from a string? If
that's not what you mean then please clarify.
Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much.
Regular expressions aren't the only way to solve the problem, but
maybe the following example will give you some ideas:
CREATE TABLE article (id integer, content text);
INSERT INTO article VALUES (1, 'one');
INSERT INTO article VALUES (2, 'one two');
INSERT INTO article VALUES (3, 'one two three');
INSERT INTO article VALUES (4, 'one two three four');
INSERT INTO article VALUES (5, 'one two three four five');
INSERT INTO article VALUES (6, 'one two three four five six');
SELECT id, substring(content FROM '(([^[:space:]]+[[:space:]]*){1,3})')
FROM article;
id | substring
----+----------------
1 | one
2 | one two
3 | one two three
4 | one two three
5 | one two three
6 | one two three
(6 rows)
In PostgreSQL 7.4 and later you could shorten the regular expression:
SELECT id, substring(content FROM '((\\S+\\s*){1,3})')
FROM article;
If this example isn't what you're looking for then please explain
what you're trying to do.
--
Michael Fuhr
Thanks. I'll check it out asap. I didn't realize the
regex expressions needed to be escaped for it to be a
valid expression. I thought it was the other way
around. Would it be possible to choose what paragraph
to use in a summary? I'll try to clarify... Let's say
I have 14 paragraphs of lorem lipsum text. Let's say I
want to show the 3rd paragraph... Could I use a regex
to search the content and return that 3rd paragraph to
use as a summary of that article?
--- Michael Fuhr <mike@fuhr.org> wrote:
On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew
Peter wrote:I'm trying to do a slice directly from a table so
I
can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.Are you trying to extract the first N words from a
string? If
that's not what you mean then please clarify.Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much.Regular expressions aren't the only way to solve the
problem, but
maybe the following example will give you some
ideas:CREATE TABLE article (id integer, content text);
INSERT INTO article VALUES (1, 'one');
INSERT INTO article VALUES (2, 'one two');
INSERT INTO article VALUES (3, 'one two three');
INSERT INTO article VALUES (4, 'one two three
four');
INSERT INTO article VALUES (5, 'one two three four
five');
INSERT INTO article VALUES (6, 'one two three four
five six');SELECT id, substring(content FROM
'(([^[:space:]]+[[:space:]]*){1,3})')
FROM article;
id | substring
----+----------------
1 | one
2 | one two
3 | one two three
4 | one two three
5 | one two three
6 | one two three
(6 rows)In PostgreSQL 7.4 and later you could shorten the
regular expression:SELECT id, substring(content FROM
'((\\S+\\s*){1,3})')
FROM article;If this example isn't what you're looking for then
please explain
what you're trying to do.--
Michael Fuhr
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
On Wed, Sep 07, 2005 at 05:26:07PM -0700, Matthew Peter wrote:
Thanks. I'll check it out asap. I didn't realize the
regex expressions needed to be escaped for it to be a
valid expression.
If you use ordinary quotes (') around the regular expression then
you have to escape the backslashes because there's an extra level
of string parsing that you're probably unaccustomed to. If you use
dollar quotes (available since 8.0) then you don't need the extra
escapes:
SELECT id, substring(content FROM $$((\S+\s*){1,3})$$) FROM article;
Would it be possible to choose what paragraph to use in a summary?
You might be able to use split_part(). For example, if paragraphs
are separated by pairs of newline (\n) characters, then the following
should return each article's third paragraph:
SELECT id, split_part(content, '\n\n', 3) FROM article;
See "String Functions and Operators" and "Pattern Matching" in the
documentation for more information. If you need to get fancy then
consider writing a function in a language like PL/Perl.
--
Michael Fuhr
[Please copy the mailing list on replies so others can participate
in and learn from the discussion.]
On Wed, Sep 07, 2005 at 10:40:22PM -0700, Matthew Peter wrote:
I did read the docs ;) I always do. The question I
really wanted answered is how to reference the back
references in my regular expressions parentheses. Like
the 2nd position or 4th from a group. Like \2 or $2.
Can I do this in postgres in the query?
Are you looking for something like this?
SELECT substring('abc.foo.foo.xyz' FROM '(([[:alpha:]]+)\\.\\2)');
substring
-----------
foo.foo
(1 row)
That is, one or more alphabetic characters followed by a dot followed
by the same set of characters (this is a simplistic example: it would
also match 'foo.oog' and return 'oo.oo').
Note that the back reference is \2 because it refers to the inner
set of parentheses (i.e., the subexpression with the second opening
parenthesis); the outer set is used here for capturing. And again,
note the escaped backslashes because we're using ordinary quotes.
With dollar quotes the above query would be:
SELECT substring('abc.foo.foo.xyz' FROM $$(([[:alpha:]]+)\.\2)$$);
--
Michael Fuhr
Import Notes
Reply to msg id not found: 20050908054022.46369.qmail@web35204.mail.mud.yahoo.com
That doesn't seem to work with digits
SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
or
SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);
but works with strings
SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);
What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list.
Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do. I know if I can
figure how to get that working I can figure out how to
deal with extending it/handling misc. characters/etc
in pgsql. I only need a working vanilla example.
Thanks for all you help.
--- Michael Fuhr <mike@fuhr.org> wrote:
SELECT substring('abc.foo.foo.xyz' FROM
'(([[:alpha:]]+)\\.\\2)');
substring
-----------
foo.foo
(1 row)That is, one or more alphabetic characters followed
by a dot followed
by the same set of characters (this is a simplistic
example: it would
also match 'foo.oog' and return 'oo.oo').Note that the back reference is \2 because it refers
to the inner
set of parentheses (i.e., the subexpression with the
second opening
parenthesis); the outer set is used here for
capturing. And again,
note the escaped backslashes because we're using
ordinary quotes.
With dollar quotes the above query would be:SELECT substring('abc.foo.foo.xyz' FROM
$$(([[:alpha:]]+)\.\2)$$);--
Michael Fuhr
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Matthew Peter wrote:
That doesn't seem to work with digits
SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
or
SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);but works with strings
SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list.
You could always just write it in pl/python...
--
Peter Fein pfein@pobox.com 773-575-0694
Basically, if you're not a utopianist, you're a schmuck. -J. Feldman
Ya, but I'd have to recompile to get python in. Plus,
I don't want to use Python. I want to use and learn
more pgsql. Keep things clean and lean if possible...
I just got a postgres book yesterday for additional
reading which it only had 2 pages on regex's in the
index :(
--- Peter Fein <pfein@pobox.com> wrote:
Matthew Peter wrote:
That doesn't seem to work with digits
SELECT substring('12.00.00.34' FROM
$$((\d+)\.\2)$$);
or
SELECT substring('12.00.00.34' FROM$$((\w+)\.\2)$$);
but works with strings
SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);What I basically want to do is have a slice
function
like Python, where I can slice out items from a
\s, \.
or \n\n separated list.
You could always just write it in pl/python...
--
Peter Fein pfein@pobox.com
773-575-0694Basically, if you're not a utopianist, you're a
schmuck. -J. Feldman
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/
On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew Peter wrote:
Ya, but I'd have to recompile to get python in.
Recompiling to add support for another procedural language is a
one-time operation and it's easy to do, so that's not a good argument.
Plus, I don't want to use Python. I want to use and learn
more pgsql. Keep things clean and lean if possible...
"Clean and lean" suggests using the right tool for the job. Languages
like Perl and Python are better at string manipulation than PL/pgSQL,
and one of PostgreSQL's strengths is that it allows you to write
server-side functions in those languages. Exploit such strengths
when it makes sense.
I just got a postgres book yesterday for additional
reading which it only had 2 pages on regex's in the
index :(
Regular expressions aren't specific to PostgreSQL; there's ample
material covering them elsewhere. See for example _Mastering
Regular Expressions_ by Jeffrey Friedl. I'm sure a search engine
would yield many free tutorials on the subject.
--
Michael Fuhr
I knew I should never have said Python. I know regular
expressions, just not how postgresql handles them. The
fact of the matter is I don't want to use Python, it
was an example of the functionality I'm interested in
accomplishing with pgsql. Plus, I would like to use
other regex's once I figure out how they are used.
I only need a regular expression in the substring of a
where cluase. Not entire language support for a single
function.
It's not a complex regex as I have wrote one that does
what I want, yet not at the database level. The docs
didn't help clarify anything. I'm still not clear on
how it determines where the back reference comes from
in the previous example you gave. And why digits
wouldn't work.
I would like a basic example that accomplishes what
I'm trying to do if at all possible?
My original message/problem...
What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.
Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do. I know if I can
figure how to get that working I can figure out how to
deal with extending it/handling misc. characters/etc
in pgsql. I only need a working vanilla example.
Thanks for all your help.
--- Michael Fuhr <mike@fuhr.org> wrote:
On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew
Peter wrote:Ya, but I'd have to recompile to get python in.
Recompiling to add support for another procedural
language is a
one-time operation and it's easy to do, so that's
not a good argument.Plus, I don't want to use Python. I want to use
and learn
more pgsql. Keep things clean and lean if
possible...
"Clean and lean" suggests using the right tool for
the job. Languages
like Perl and Python are better at string
manipulation than PL/pgSQL,
and one of PostgreSQL's strengths is that it allows
you to write
server-side functions in those languages. Exploit
such strengths
when it makes sense.I just got a postgres book yesterday for
additional
reading which it only had 2 pages on regex's in
the
index :(
Regular expressions aren't specific to PostgreSQL;
there's ample
material covering them elsewhere. See for example
_Mastering
Regular Expressions_ by Jeffrey Friedl. I'm sure a
search engine
would yield many free tutorials on the subject.--
Michael Fuhr
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:
What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.
There is a function for some sort of text slicing, though I'm not sure
if it does what you want. It's called split_part(). Have a look at the
documentation.
--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"Aprender sin pensar es in�til; pensar sin aprender, peligroso" (Confucio)
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:
It's not a complex regex as I have wrote one that does
what I want, yet not at the database level. The docs
didn't help clarify anything. I'm still not clear on
how it determines where the back reference comes from
in the previous example you gave. And why digits
wouldn't work.
Back references work as they usually do in regular expressions:
they refer to the matched value of a previous parenthesized
subexpression. If you have multiple open parentheses then you
need to refer to the correct subexpression to get what you want.
Example:
CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');
SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
t
-----------------
abc.foo.foo.xyz
12.00.00.34
(2 rows)
In the above query the regular expression has only one set of
parentheses, so the back reference refers to \1. The result set
contains the two rows that have one or more word characters followed
by a dot followed by the same set of characters.
In the following query, note the difference between the regular
expression in the select list and the one in the where clause:
SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
substring
-----------
foo.foo
00.00
(2 rows)
In the regular expression in the select list, we use the outermost
set of parentheses for grouping, so the back reference needs to
refer to the subexpression that begins with the second open
parenthesis (i.e., we must use \2). In the regular expression
in the where clause, we have only one set of parentheses so the
back reference is \1.
Regarding digits, you didn't post any output in your example, so
we don't know if it really doesn't work or if it just doesn't do
what you were expecting. Here's what I get from your examples:
SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
substring
-----------
00.00
(1 row)
SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);
substring
-----------
00.00
(1 row)
SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$);
substring
-----------
foo.foo
(1 row)
Do you get different results, or do these results surprise you?
They all appear to be correct.
What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do.
You can use split_part() to get a single item or string_to_array()
to build an array from which you can extract multiple items.
CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('one.two.three.four.five.six.');
SELECT (string_to_array(t, '.'))[3:5] FROM foo;
string_to_array
-------------------
{three,four,five}
(1 row)
SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo;
array_to_string
-----------------
three.four.five
(1 row)
Is that what you're looking for?
--
Michael Fuhr
Thank you for your patience and such a complete
answer. I'm not on the pgbox right now but those
examples did help clarify how to reference the back
references, which was my problem.
I wasn't aware the 1st parenthesis must be counted as
part of the regex, I assumed it was a wrapper. Thanks
for helping me out and putting up with me. :)
matt
--- Michael Fuhr <mike@fuhr.org> wrote:
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
Peter wrote:It's not a complex regex as I have wrote one that
does
what I want, yet not at the database level. The
docs
didn't help clarify anything. I'm still not clear
on
how it determines where the back reference comes
from
in the previous example you gave. And why digits
wouldn't work.Back references work as they usually do in regular
expressions:
they refer to the matched value of a previous
parenthesized
subexpression. If you have multiple open
parentheses then you
need to refer to the correct subexpression to get
what you want.
Example:CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
t
-----------------
abc.foo.foo.xyz
12.00.00.34
(2 rows)In the above query the regular expression has only
one set of
parentheses, so the back reference refers to \1.
The result set
contains the two rows that have one or more word
characters followed
by a dot followed by the same set of characters.In the following query, note the difference between
the regular
expression in the select list and the one in the
where clause:SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
substring
-----------
foo.foo
00.00
(2 rows)In the regular expression in the select list, we use
the outermost
set of parentheses for grouping, so the back
reference needs to
refer to the subexpression that begins with the
second open
parenthesis (i.e., we must use \2). In the regular
expression
in the where clause, we have only one set of
parentheses so the
back reference is \1.Regarding digits, you didn't post any output in your
example, so
we don't know if it really doesn't work or if it
just doesn't do
what you were expecting. Here's what I get from
your examples:SELECT substring('12.00.00.34' FROM
$$((\d+)\.\2)$$);
substring
-----------
00.00
(1 row)SELECT substring('12.00.00.34' FROM
$$((\w+)\.\2)$$);
substring
-----------
00.00
(1 row)SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);
substring
-----------
foo.foo
(1 row)Do you get different results, or do these results
surprise you?
They all appear to be correct.What I basically want to do is have a slice
function
like Python, where I can slice out items from a
\s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.Where I could start it at a certain point and end
it
at another. Like slicing out paragraph 3-6
(delimiter
\n\n) or the 2nd-6th sentence in a article
(delimiter
\.). That is what I am trying to do.
You can use split_part() to get a single item or
string_to_array()
to build an array from which you can extract
multiple items.CREATE TABLE foo (t text);
INSERT INTO foo VALUES
('one.two.three.four.five.six.');SELECT (string_to_array(t, '.'))[3:5] FROM foo;
string_to_array
-------------------
{three,four,five}
(1 row)SELECT array_to_string((string_to_array(t,
'.'))[3:5], '.') FROM foo;
array_to_string
-----------------
three.four.five
(1 row)Is that what you're looking for?
--
Michael Fuhr---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.
SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
substring
-----------
(1 row)
little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value. so it's saying x+ one more of the same value
separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?
--- Michael Fuhr <mike@fuhr.org> wrote:
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
Peter wrote:It's not a complex regex as I have wrote one that
does
what I want, yet not at the database level. The
docs
didn't help clarify anything. I'm still not clear
on
how it determines where the back reference comes
from
in the previous example you gave. And why digits
wouldn't work.Back references work as they usually do in regular
expressions:
they refer to the matched value of a previous
parenthesized
subexpression. If you have multiple open
parentheses then you
need to refer to the correct subexpression to get
what you want.
Example:CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
t
-----------------
abc.foo.foo.xyz
12.00.00.34
(2 rows)In the above query the regular expression has only
one set of
parentheses, so the back reference refers to \1.
The result set
contains the two rows that have one or more word
characters followed
by a dot followed by the same set of characters.In the following query, note the difference between
the regular
expression in the select list and the one in the
where clause:SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
substring
-----------
foo.foo
00.00
(2 rows)In the regular expression in the select list, we use
the outermost
set of parentheses for grouping, so the back
reference needs to
refer to the subexpression that begins with the
second open
parenthesis (i.e., we must use \2). In the regular
expression
in the where clause, we have only one set of
parentheses so the
back reference is \1.Regarding digits, you didn't post any output in your
example, so
we don't know if it really doesn't work or if it
just doesn't do
what you were expecting. Here's what I get from
your examples:SELECT substring('12.00.00.34' FROM
$$((\d+)\.\2)$$);
substring
-----------
00.00
(1 row)SELECT substring('12.00.00.34' FROM
$$((\w+)\.\2)$$);
substring
-----------
00.00
(1 row)SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);
substring
-----------
foo.foo
(1 row)Do you get different results, or do these results
surprise you?
They all appear to be correct.What I basically want to do is have a slice
function
like Python, where I can slice out items from a
\s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.Where I could start it at a certain point and end
it
at another. Like slicing out paragraph 3-6
(delimiter
\n\n) or the 2nd-6th sentence in a article
(delimiter
\.). That is what I am trying to do.
You can use split_part() to get a single item or
string_to_array()
to build an array from which you can extract
multiple items.CREATE TABLE foo (t text);
INSERT INTO foo VALUES
('one.two.three.four.five.six.');SELECT (string_to_array(t, '.'))[3:5] FROM foo;
string_to_array
-------------------
{three,four,five}
(1 row)SELECT array_to_string((string_to_array(t,
'.'))[3:5], '.') FROM foo;
array_to_string
-----------------
three.four.five
(1 row)Is that what you're looking for?
--
Michael Fuhr---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/
Matthew Peter <survivedsushi@yahoo.com> writes:
One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
substring
-----------(1 row)
little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value. so it's saying x+ one more of the same value
separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?
Backreferences match the exact string matched by the corresponding set
of parentheses. It's not the equivalent of substituting in the
parenthesized part of the regex and testing that for a match. The
behavior above is as expected. If you want it as "any followed by
any" you shold write the regex as '((\w+)\.(\w+))' -- then the two
parts can differ.
-Doug
On Fri, Sep 09, 2005 at 10:11:46PM -0700, Matthew Peter wrote:
One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.
When posting examples, please post something you actually tried so
other people can see exactly what you're doing and attempt to
duplicate your results. Little differences sometimes matter, as
this case illustrates.
SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
substring
-----------(1 row)
little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value.
That's what a back reference is: it means "match the same value you
matched before," not just anything that matches the same regular
expression.
Usenet junkies might find it amusing to use back references to
search their new server's list of newsgroups:
egrep '([^.]+)\.\1\.\1' newsgroups
--
Michael Fuhr
ahhhhhhhhh I swear I never came across any of these
gems of information in the docs. It was these subtle
differences that were throwing me.
I didn't originally catch that regex's were based on
grep/sed/awk syntax which I haven't studied throughly
yet. I've only used some basic operations in bash
scripts. I'll read up more on those. Thanks.
MP
--- Douglas McNaught <doug@mcnaught.org> wrote:
Matthew Peter <survivedsushi@yahoo.com> writes:
One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like ishowed in
the example, which is why i said it didn't work on
digits.SELECT substring('34.31.29.20' FROM
$$((\w+)\.\2)$$);
substring
-----------(1 row)
little did i know writing it with 12.00.00.34
would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returnsa
value. so it's saying x+ one more of the same
value
separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?Backreferences match the exact string matched by the
corresponding set
of parentheses. It's not the equivalent of
substituting in the
parenthesized part of the regex and testing that for
a match. The
behavior above is as expected. If you want it as
"any followed by
any" you shold write the regex as '((\w+)\.(\w+))'
-- then the two
parts can differ.-Doug
______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/
On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew Peter wrote:
ahhhhhhhhh I swear I never came across any of these
gems of information in the docs. It was these subtle
differences that were throwing me.
From "Regular Expression Escapes" in the "Pattern Matching" section
of the manual:
A back reference (\n) matches the same string matched by the
previous parenthesized subexpression specified by the number n
(see Table 9-18). For example, ([bc])\1 matches bb or cc but not
bc or cb. The subexpression must entirely precede the back
reference in the RE. Subexpressions are numbered in the order
of their leading parentheses. Non-capturing parentheses do not
define subexpressions.
http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES
--
Michael Fuhr
How about this then, I didn't retain that information
from the doc. ;) I sometimes glaze over important gems
every now and then. It happens. I'm not a robot, yet.
At least I know the answer to my question is now
retained. You were a big help too. Thank you very
much. I appreciate it.
Speaking of data manipulation in a table... I was
thinking about storing and manipulating a list in a
column...
Is it possible to append and delete (unknown location)
items in a list? Or is another way more efficient? I'm
relatively new so sorry if I'm asking too many
questions. If possible, I will read about it if
there's docs for it so I don't trouble anyone with my
questions. It would be sweet to do that at the
database level.
Thanks again,
MP
Posgresql convert
--- Michael Fuhr <mike@fuhr.org> wrote:
On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew
Peter wrote:ahhhhhhhhh I swear I never came across any of
these
gems of information in the docs. It was these
subtle
differences that were throwing me.
From "Regular Expression Escapes" in the "Pattern
Matching" section
of the manual:A back reference (\n) matches the same string
matched by the
previous parenthesized subexpression specified by
the number n
(see Table 9-18). For example, ([bc])\1 matches
bb or cc but not
bc or cb. The subexpression must entirely precede
the back
reference in the RE. Subexpressions are numbered
in the order
of their leading parentheses. Non-capturing
parentheses do not
define subexpressions.
http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES
--
Michael Fuhr---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
majordomo@postgresql.org so that your
message can get through to the mailing list
cleanly
______________________________________________________
Yahoo! for Good
Watch the Hurricane Katrina Shelter From The Storm concert
http://advision.webevents.yahoo.com/shelter
On Sun, Sep 11, 2005 at 12:30:59AM -0700, Matthew Peter wrote:
Speaking of data manipulation in a table... I was
thinking about storing and manipulating a list in a
column...
Please ask new questions in a new thread with a Subject header
related to the new topic. People who might be interested in following
the discussion, either to learn from it or to provide answers, might
miss it if the new topic hijacks an old thread.
--
Michael Fuhr