back references using regex

Started by Matthew Peteralmost 21 years ago22 messagesgeneral

survivedsushi@yahoo.com

almost 21 years ago

Hi.

I'm trying to do a slice directly from a table so I
can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.

Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much.

Many thanks.
MP

______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Matthew Peter (#1)

Re: back references using regex

On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew Peter wrote:

I'm trying to do a slice directly from a table so I
can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.

Are you trying to extract the first N words from a string? If
that's not what you mean then please clarify.

Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much.

Regular expressions aren't the only way to solve the problem, but
maybe the following example will give you some ideas:

CREATE TABLE article (id integer, content text);

INSERT INTO article VALUES (1, 'one');
INSERT INTO article VALUES (2, 'one two');
INSERT INTO article VALUES (3, 'one two three');
INSERT INTO article VALUES (4, 'one two three four');
INSERT INTO article VALUES (5, 'one two three four five');
INSERT INTO article VALUES (6, 'one two three four five six');

SELECT id, substring(content FROM '(([^[:space:]]+[[:space:]]*){1,3})')
FROM article;
id | substring
----+----------------
1 | one
2 | one two
3 | one two three
4 | one two three
5 | one two three
6 | one two three
(6 rows)

In PostgreSQL 7.4 and later you could shorten the regular expression:

SELECT id, substring(content FROM '((\\S+\\s*){1,3})')
FROM article;

If this example isn't what you're looking for then please explain
what you're trying to do.

--
Michael Fuhr

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Michael Fuhr (#2)

Re: back references using regex

Thanks. I'll check it out asap. I didn't realize the
regex expressions needed to be escaped for it to be a
valid expression. I thought it was the other way
around. Would it be possible to choose what paragraph
to use in a summary? I'll try to clarify... Let's say
I have 14 paragraphs of lorem lipsum text. Let's say I
want to show the 3rd paragraph... Could I use a regex
to search the content and return that 3rd paragraph to
use as a summary of that article?

--- Michael Fuhr <mike@fuhr.org> wrote:

On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew
Peter wrote:

I'm trying to do a slice directly from a table so

I

can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.

Are you trying to extract the first N words from a
string? If
that's not what you mean then please clarify.

Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much.

Regular expressions aren't the only way to solve the
problem, but
maybe the following example will give you some
ideas:

CREATE TABLE article (id integer, content text);

INSERT INTO article VALUES (1, 'one');
INSERT INTO article VALUES (2, 'one two');
INSERT INTO article VALUES (3, 'one two three');
INSERT INTO article VALUES (4, 'one two three
four');
INSERT INTO article VALUES (5, 'one two three four
five');
INSERT INTO article VALUES (6, 'one two three four
five six');

SELECT id, substring(content FROM
'(([^[:space:]]+[[:space:]]*){1,3})')
FROM article;
id | substring
----+----------------
1 | one
2 | one two
3 | one two three
4 | one two three
5 | one two three
6 | one two three
(6 rows)

In PostgreSQL 7.4 and later you could shorten the
regular expression:

SELECT id, substring(content FROM
'((\\S+\\s*){1,3})')
FROM article;

If this example isn't what you're looking for then
please explain
what you're trying to do.

--
Michael Fuhr

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Matthew Peter (#3)

Re: back references using regex

On Wed, Sep 07, 2005 at 05:26:07PM -0700, Matthew Peter wrote:

Thanks. I'll check it out asap. I didn't realize the
regex expressions needed to be escaped for it to be a
valid expression.

If you use ordinary quotes (') around the regular expression then
you have to escape the backslashes because there's an extra level
of string parsing that you're probably unaccustomed to. If you use
dollar quotes (available since 8.0) then you don't need the extra
escapes:

SELECT id, substring(content FROM $$((\S+\s*){1,3})$$) FROM article;

Would it be possible to choose what paragraph to use in a summary?

You might be able to use split_part(). For example, if paragraphs
are separated by pairs of newline (\n) characters, then the following
should return each article's third paragraph:

SELECT id, split_part(content, '\n\n', 3) FROM article;

See "String Functions and Operators" and "Pattern Matching" in the
documentation for more information. If you need to get fancy then
consider writing a function in a language like PL/Perl.

--
Michael Fuhr

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Michael Fuhr (#4)

Re: back references using regex

[Please copy the mailing list on replies so others can participate
in and learn from the discussion.]

On Wed, Sep 07, 2005 at 10:40:22PM -0700, Matthew Peter wrote:

I did read the docs ;) I always do. The question I
really wanted answered is how to reference the back
references in my regular expressions parentheses. Like
the 2nd position or 4th from a group. Like \2 or $2.
Can I do this in postgres in the query?

Are you looking for something like this?

SELECT substring('abc.foo.foo.xyz' FROM '(([[:alpha:]]+)\\.\\2)');
substring
-----------
foo.foo
(1 row)

That is, one or more alphabetic characters followed by a dot followed
by the same set of characters (this is a simplistic example: it would
also match 'foo.oog' and return 'oo.oo').

Note that the back reference is \2 because it refers to the inner
set of parentheses (i.e., the subexpression with the second opening
parenthesis); the outer set is used here for capturing. And again,
note the escaped backslashes because we're using ordinary quotes.
With dollar quotes the above query would be:

SELECT substring('abc.foo.foo.xyz' FROM $$(([[:alpha:]]+)\.\2)$$);

--
Michael Fuhr

Import Notes

Reply to msg id not found: 20050908054022.46369.qmail@web35204.mail.mud.yahoo.com

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Michael Fuhr (#5)

Re: back references using regex

That doesn't seem to work with digits

SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
or
SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);

but works with strings

SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list.

Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do. I know if I can
figure how to get that working I can figure out how to
deal with extending it/handling misc. characters/etc
in pgsql. I only need a working vanilla example.
Thanks for all you help.

--- Michael Fuhr <mike@fuhr.org> wrote:

SELECT substring('abc.foo.foo.xyz' FROM
'(([[:alpha:]]+)\\.\\2)');
substring
-----------
foo.foo
(1 row)

That is, one or more alphabetic characters followed
by a dot followed
by the same set of characters (this is a simplistic
example: it would
also match 'foo.oog' and return 'oo.oo').

Note that the back reference is \2 because it refers
to the inner
set of parentheses (i.e., the subexpression with the
second opening
parenthesis); the outer set is used here for
capturing. And again,
note the escaped backslashes because we're using
ordinary quotes.
With dollar quotes the above query would be:

SELECT substring('abc.foo.foo.xyz' FROM
$$(([[:alpha:]]+)\.\2)$$);

--
Michael Fuhr

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Peter Fein

pfein@pobox.com

almost 21 years ago

In reply to: Matthew Peter (#6)

Re: back references using regex

Matthew Peter wrote:

That doesn't seem to work with digits

SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
or
SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);

but works with strings

SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list.

You could always just write it in pl/python...

--
Peter Fein pfein@pobox.com 773-575-0694

Basically, if you're not a utopianist, you're a schmuck. -J. Feldman

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Peter Fein (#7)

Re: back references using regex

Ya, but I'd have to recompile to get python in. Plus,
I don't want to use Python. I want to use and learn
more pgsql. Keep things clean and lean if possible...
I just got a postgres book yesterday for additional
reading which it only had 2 pages on regex's in the
index :(

--- Peter Fein <pfein@pobox.com> wrote:

Matthew Peter wrote:

That doesn't seem to work with digits

SELECT substring('12.00.00.34' FROM

$$((\d+)\.\2)$$);

or
SELECT substring('12.00.00.34' FROM

$$((\w+)\.\2)$$);

but works with strings

SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);

What I basically want to do is have a slice

function

like Python, where I can slice out items from a

\s, \.

or \n\n separated list.

You could always just write it in pl/python...

--
Peter Fein pfein@pobox.com
773-575-0694

Basically, if you're not a utopianist, you're a
schmuck. -J. Feldman

______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Matthew Peter (#8)

Re: back references using regex

On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew Peter wrote:

Ya, but I'd have to recompile to get python in.

Recompiling to add support for another procedural language is a
one-time operation and it's easy to do, so that's not a good argument.

Plus, I don't want to use Python. I want to use and learn
more pgsql. Keep things clean and lean if possible...

"Clean and lean" suggests using the right tool for the job. Languages
like Perl and Python are better at string manipulation than PL/pgSQL,
and one of PostgreSQL's strengths is that it allows you to write
server-side functions in those languages. Exploit such strengths
when it makes sense.

I just got a postgres book yesterday for additional
reading which it only had 2 pages on regex's in the
index :(

Regular expressions aren't specific to PostgreSQL; there's ample
material covering them elsewhere. See for example _Mastering
Regular Expressions_ by Jeffrey Friedl. I'm sure a search engine
would yield many free tutorials on the subject.

--
Michael Fuhr

#10

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Michael Fuhr (#9)

Re: back references using regex

I knew I should never have said Python. I know regular
expressions, just not how postgresql handles them. The
fact of the matter is I don't want to use Python, it
was an example of the functionality I'm interested in
accomplishing with pgsql. Plus, I would like to use
other regex's once I figure out how they are used.

I only need a regular expression in the substring of a
where cluase. Not entire language support for a single
function.

It's not a complex regex as I have wrote one that does
what I want, yet not at the database level. The docs
didn't help clarify anything. I'm still not clear on
how it determines where the back reference comes from
in the previous example you gave. And why digits
wouldn't work.

I would like a basic example that accomplishes what
I'm trying to do if at all possible?

My original message/problem...

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.

--- Michael Fuhr <mike@fuhr.org> wrote:

On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew
Peter wrote:

Ya, but I'd have to recompile to get python in.

Recompiling to add support for another procedural
language is a
one-time operation and it's easy to do, so that's
not a good argument.

Plus, I don't want to use Python. I want to use

and learn

more pgsql. Keep things clean and lean if

possible...

"Clean and lean" suggests using the right tool for
the job. Languages
like Perl and Python are better at string
manipulation than PL/pgSQL,
and one of PostgreSQL's strengths is that it allows
you to write
server-side functions in those languages. Exploit
such strengths
when it makes sense.

I just got a postgres book yesterday for

additional

reading which it only had 2 pages on regex's in

the

index :(

Regular expressions aren't specific to PostgreSQL;
there's ample
material covering them elsewhere. See for example
_Mastering
Regular Expressions_ by Jeffrey Friedl. I'm sure a
search engine
would yield many free tutorials on the subject.

--
Michael Fuhr

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#11

Alvaro Herrera

alvherre@2ndquadrant.com

almost 21 years ago

In reply to: Matthew Peter (#10)

Re: back references using regex

On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.

There is a function for some sort of text slicing, though I'm not sure
if it does what you want. It's called split_part(). Have a look at the
documentation.

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"Aprender sin pensar es inï¿½til; pensar sin aprender, peligroso" (Confucio)

#12

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Matthew Peter (#10)

Re: back references using regex

On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:

It's not a complex regex as I have wrote one that does
what I want, yet not at the database level. The docs
didn't help clarify anything. I'm still not clear on
how it determines where the back reference comes from
in the previous example you gave. And why digits
wouldn't work.

Back references work as they usually do in regular expressions:
they refer to the matched value of a previous parenthesized
subexpression. If you have multiple open parentheses then you
need to refer to the correct subexpression to get what you want.
Example:

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');

SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
t
-----------------
abc.foo.foo.xyz
12.00.00.34
(2 rows)

In the above query the regular expression has only one set of
parentheses, so the back reference refers to \1. The result set
contains the two rows that have one or more word characters followed
by a dot followed by the same set of characters.

In the following query, note the difference between the regular
expression in the select list and the one in the where clause:

SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
substring
-----------
foo.foo
00.00
(2 rows)

In the regular expression in the select list, we use the outermost
set of parentheses for grouping, so the back reference needs to
refer to the subexpression that begins with the second open
parenthesis (i.e., we must use \2). In the regular expression
in the where clause, we have only one set of parentheses so the
back reference is \1.

Regarding digits, you didn't post any output in your example, so
we don't know if it really doesn't work or if it just doesn't do
what you were expecting. Here's what I get from your examples:

SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
substring
-----------
00.00
(1 row)

SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);
substring
-----------
00.00
(1 row)

SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$);
substring
-----------
foo.foo
(1 row)

Do you get different results, or do these results surprise you?
They all appear to be correct.

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.

Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do.

You can use split_part() to get a single item or string_to_array()
to build an array from which you can extract multiple items.

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('one.two.three.four.five.six.');

SELECT (string_to_array(t, '.'))[3:5] FROM foo;
string_to_array
-------------------
{three,four,five}
(1 row)

SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo;
array_to_string
-----------------
three.four.five
(1 row)

Is that what you're looking for?

--
Michael Fuhr

#13

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Michael Fuhr (#12)

Re: back references using regex

Thank you for your patience and such a complete
answer. I'm not on the pgbox right now but those
examples did help clarify how to reference the back
references, which was my problem.

I wasn't aware the 1st parenthesis must be counted as
part of the regex, I assumed it was a wrapper. Thanks
for helping me out and putting up with me. :)

matt

--- Michael Fuhr <mike@fuhr.org> wrote:

On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
Peter wrote:

It's not a complex regex as I have wrote one that

does

what I want, yet not at the database level. The

docs

didn't help clarify anything. I'm still not clear

on

how it determines where the back reference comes

from

in the previous example you gave. And why digits
wouldn't work.

Back references work as they usually do in regular
expressions:
they refer to the matched value of a previous
parenthesized
subexpression. If you have multiple open
parentheses then you
need to refer to the correct subexpression to get
what you want.
Example:

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');

SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
t
-----------------
abc.foo.foo.xyz
12.00.00.34
(2 rows)

In the above query the regular expression has only
one set of
parentheses, so the back reference refers to \1.
The result set
contains the two rows that have one or more word
characters followed
by a dot followed by the same set of characters.

In the following query, note the difference between
the regular
expression in the select list and the one in the
where clause:

SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
substring
-----------
foo.foo
00.00
(2 rows)

In the regular expression in the select list, we use
the outermost
set of parentheses for grouping, so the back
reference needs to
refer to the subexpression that begins with the
second open
parenthesis (i.e., we must use \2). In the regular
expression
in the where clause, we have only one set of
parentheses so the
back reference is \1.

Regarding digits, you didn't post any output in your
example, so
we don't know if it really doesn't work or if it
just doesn't do
what you were expecting. Here's what I get from
your examples:

SELECT substring('12.00.00.34' FROM
$$((\d+)\.\2)$$);
substring
-----------
00.00
(1 row)

SELECT substring('12.00.00.34' FROM
$$((\w+)\.\2)$$);
substring
-----------
00.00
(1 row)

SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);
substring
-----------
foo.foo
(1 row)

Do you get different results, or do these results
surprise you?
They all appear to be correct.

What I basically want to do is have a slice

function

like Python, where I can slice out items from a

\s, \.

or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.

Where I could start it at a certain point and end

it

at another. Like slicing out paragraph 3-6

(delimiter

\n\n) or the 2nd-6th sentence in a article

(delimiter

\.). That is what I am trying to do.

You can use split_part() to get a single item or
string_to_array()
to build an array from which you can extract
multiple items.

CREATE TABLE foo (t text);
INSERT INTO foo VALUES
('one.two.three.four.five.six.');

SELECT (string_to_array(t, '.'))[3:5] FROM foo;
string_to_array
-------------------
{three,four,five}
(1 row)

SELECT array_to_string((string_to_array(t,
'.'))[3:5], '.') FROM foo;
array_to_string
-----------------
three.four.five
(1 row)

Is that what you're looking for?

--
Michael Fuhr

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

#14

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Michael Fuhr (#12)

Re: back references using regex

One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.

SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
substring
-----------

(1 row)

little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value. so it's saying x+ one more of the same value
separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?

--- Michael Fuhr <mike@fuhr.org> wrote:

On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
Peter wrote:

It's not a complex regex as I have wrote one that

does

what I want, yet not at the database level. The

docs

didn't help clarify anything. I'm still not clear

on

how it determines where the back reference comes

from

in the previous example you gave. And why digits
wouldn't work.

Back references work as they usually do in regular
expressions:
they refer to the matched value of a previous
parenthesized
subexpression. If you have multiple open
parentheses then you
need to refer to the correct subexpression to get
what you want.
Example:

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');

SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
t
-----------------
abc.foo.foo.xyz
12.00.00.34
(2 rows)

In the above query the regular expression has only
one set of
parentheses, so the back reference refers to \1.
The result set
contains the two rows that have one or more word
characters followed
by a dot followed by the same set of characters.

In the following query, note the difference between
the regular
expression in the select list and the one in the
where clause:

SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
substring
-----------
foo.foo
00.00
(2 rows)

In the regular expression in the select list, we use
the outermost
set of parentheses for grouping, so the back
reference needs to
refer to the subexpression that begins with the
second open
parenthesis (i.e., we must use \2). In the regular
expression
in the where clause, we have only one set of
parentheses so the
back reference is \1.

Regarding digits, you didn't post any output in your
example, so
we don't know if it really doesn't work or if it
just doesn't do
what you were expecting. Here's what I get from
your examples:

SELECT substring('12.00.00.34' FROM
$$((\d+)\.\2)$$);
substring
-----------
00.00
(1 row)

SELECT substring('12.00.00.34' FROM
$$((\w+)\.\2)$$);
substring
-----------
00.00
(1 row)

SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);
substring
-----------
foo.foo
(1 row)

Do you get different results, or do these results
surprise you?
They all appear to be correct.

What I basically want to do is have a slice

function

like Python, where I can slice out items from a

\s, \.

or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.

Where I could start it at a certain point and end

it

at another. Like slicing out paragraph 3-6

(delimiter

\n\n) or the 2nd-6th sentence in a article

(delimiter

\.). That is what I am trying to do.

You can use split_part() to get a single item or
string_to_array()
to build an array from which you can extract
multiple items.

CREATE TABLE foo (t text);
INSERT INTO foo VALUES
('one.two.three.four.five.six.');

SELECT (string_to_array(t, '.'))[3:5] FROM foo;
string_to_array
-------------------
{three,four,five}
(1 row)

SELECT array_to_string((string_to_array(t,
'.'))[3:5], '.') FROM foo;
array_to_string
-----------------
three.four.five
(1 row)

Is that what you're looking for?

--
Michael Fuhr

---------------------------(end of
broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

#15

Doug McNaught

doug@mcnaught.org

almost 21 years ago

In reply to: Matthew Peter (#14)

Re: back references using regex

Matthew Peter <survivedsushi@yahoo.com> writes:

One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.

SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
substring
-----------

(1 row)

little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value. so it's saying x+ one more of the same value
separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?

Backreferences match the exact string matched by the corresponding set
of parentheses. It's not the equivalent of substituting in the
parenthesized part of the regex and testing that for a match. The
behavior above is as expected. If you want it as "any followed by
any" you shold write the regex as '((\w+)\.(\w+))' -- then the two
parts can differ.

-Doug

#16

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Matthew Peter (#14)

Re: back references using regex

On Fri, Sep 09, 2005 at 10:11:46PM -0700, Matthew Peter wrote:

One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.

When posting examples, please post something you actually tried so
other people can see exactly what you're doing and attempt to
duplicate your results. Little differences sometimes matter, as
this case illustrates.

SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
substring
-----------

(1 row)

little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value.

That's what a back reference is: it means "match the same value you
matched before," not just anything that matches the same regular
expression.

Usenet junkies might find it amusing to use back references to
search their new server's list of newsgroups:

egrep '([^.]+)\.\1\.\1' newsgroups

--
Michael Fuhr

#17

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Doug McNaught (#15)

Re: back references using regex

ahhhhhhhhh I swear I never came across any of these
gems of information in the docs. It was these subtle
differences that were throwing me.

I didn't originally catch that regex's were based on
grep/sed/awk syntax which I haven't studied throughly
yet. I've only used some basic operations in bash
scripts. I'll read up more on those. Thanks.

--- Douglas McNaught <doug@mcnaught.org> wrote:

Matthew Peter <survivedsushi@yahoo.com> writes:

One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i

showed in

the example, which is why i said it didn't work on
digits.

SELECT substring('34.31.29.20' FROM

$$((\w+)\.\2)$$);

substring
-----------

(1 row)

little did i know writing it with 12.00.00.34

would

return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns

a

value. so it's saying x+ one more of the same

value

separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?

Backreferences match the exact string matched by the
corresponding set
of parentheses. It's not the equivalent of
substituting in the
parenthesized part of the regex and testing that for
a match. The
behavior above is as expected. If you want it as
"any followed by
any" you shold write the regex as '((\w+)\.(\w+))'
-- then the two
parts can differ.

-Doug

______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

#18

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Matthew Peter (#17)

Re: back references using regex

On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew Peter wrote:

ahhhhhhhhh I swear I never came across any of these
gems of information in the docs. It was these subtle
differences that were throwing me.

From "Regular Expression Escapes" in the "Pattern Matching" section
of the manual:

A back reference (\n) matches the same string matched by the
previous parenthesized subexpression specified by the number n
(see Table 9-18). For example, ([bc])\1 matches bb or cc but not
bc or cb. The subexpression must entirely precede the back
reference in the RE. Subexpressions are numbered in the order
of their leading parentheses. Non-capturing parentheses do not
define subexpressions.

http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES

--
Michael Fuhr

#19

Matthew Peter

survivedsushi@yahoo.com

almost 21 years ago

In reply to: Michael Fuhr (#18)

Re: back references using regex

How about this then, I didn't retain that information
from the doc. ;) I sometimes glaze over important gems
every now and then. It happens. I'm not a robot, yet.
At least I know the answer to my question is now
retained. You were a big help too. Thank you very
much. I appreciate it.

Speaking of data manipulation in a table... I was
thinking about storing and manipulating a list in a
column...

Is it possible to append and delete (unknown location)
items in a list? Or is another way more efficient? I'm
relatively new so sorry if I'm asking too many
questions. If possible, I will read about it if
there's docs for it so I don't trouble anyone with my
questions. It would be sweet to do that at the
database level.

Thanks again,
MP
Posgresql convert

--- Michael Fuhr <mike@fuhr.org> wrote:

On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew
Peter wrote:

ahhhhhhhhh I swear I never came across any of

these

gems of information in the docs. It was these

subtle

differences that were throwing me.

From "Regular Expression Escapes" in the "Pattern
Matching" section
of the manual:

A back reference (\n) matches the same string
matched by the
previous parenthesized subexpression specified by
the number n
(see Table 9-18). For example, ([bc])\1 matches
bb or cc but not
bc or cb. The subexpression must entirely precede
the back
reference in the RE. Subexpressions are numbered
in the order
of their leading parentheses. Non-capturing
parentheses do not
define subexpressions.

http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES

--
Michael Fuhr

---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please
send an appropriate
subscribe-nomail command to
majordomo@postgresql.org so that your
message can get through to the mailing list
cleanly

______________________________________________________
Yahoo! for Good
Watch the Hurricane Katrina Shelter From The Storm concert
http://advision.webevents.yahoo.com/shelter

#20

Michael Fuhr

mike@fuhr.org

almost 21 years ago

In reply to: Matthew Peter (#19)

Re: back references using regex

On Sun, Sep 11, 2005 at 12:30:59AM -0700, Matthew Peter wrote:

Speaking of data manipulation in a table... I was
thinking about storing and manipulating a list in a
column...

Please ask new questions in a new thread with a Subject header
related to the new topic. People who might be interested in following
the discussion, either to learn from it or to provide answers, might
miss it if the new topic hijacks an old thread.

--
Michael Fuhr

#21

Christian Goetze

cg@sensage.com

almost 21 years ago

In reply to: Matthew Peter (#3)

#22

Thomas Pundt

mlists@rp-online.de

almost 21 years ago

In reply to: Christian Goetze (#21)