SUBSTRING for a regular expression

Started by Berend Toberalmost 22 years ago8 messagesgeneral
Jump to latest
#1Berend Tober
btober@computer.org

I can't seem to get right the regular expression for parsing data like
these four sample rows (names and addresses changed to ficticious values)
from a text-type column:

Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
Sunday with breach of peace and interfering with a police officer.

Allen K. George, 30, of 88 Beverly Court was charged Saturday with
possession of marijuana, third-degree criminal mischief, breach of peace,
evading responsibility, interfering with a police officer, driving with a
suspended license, driving under the influence of drugs or alcohol,
failure to drive right and failure to have proper insurance.

Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
possession of marijuana, possession of alcohol by a minor and failure to
wear a seat belt.

Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
Sunday with driving under the influence of drugs or alcohol, evading
responsibility and following too closely.

Into separate columns for: name, age, address, charge. For example the
first record would have

name='Yolanda Harris'
age=38
address='40 South Main St., Newtown City'
charge='was charged Sunday with breach of peace and interfering with a
police officer.'

To get the name, for instance, I tried

SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
police_log;

or the age value

SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
police_log;

But return values are all NULL. Can anyone give me some RE help, please?

--Berend Tober

#2mike g
mike@thegodshalls.com
In reply to: Berend Tober (#1)
Re: SUBSTRING for a regular expression

What language are you using for this plperl?

That command listed in the documentation certainly works in psql or
plpgsql but I don't know about plperl.

Are you assigning the result to a variable and then printing the result
of that variable before the function ends or printing based on what the
function returns?

I think you will probably need to show us the rest of the code in your
function.

Show quoted text

On Mon, 2004-07-05 at 19:40, btober@computer.org wrote:

I can't seem to get right the regular expression for parsing data like
these four sample rows (names and addresses changed to ficticious values)
from a text-type column:

Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
Sunday with breach of peace and interfering with a police officer.

Allen K. George, 30, of 88 Beverly Court was charged Saturday with
possession of marijuana, third-degree criminal mischief, breach of peace,
evading responsibility, interfering with a police officer, driving with a
suspended license, driving under the influence of drugs or alcohol,
failure to drive right and failure to have proper insurance.

Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
possession of marijuana, possession of alcohol by a minor and failure to
wear a seat belt.

Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
Sunday with driving under the influence of drugs or alcohol, evading
responsibility and following too closely.

Into separate columns for: name, age, address, charge. For example the
first record would have

name='Yolanda Harris'
age=38
address='40 South Main St., Newtown City'
charge='was charged Sunday with breach of peace and interfering with a
police officer.'

To get the name, for instance, I tried

SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
police_log;

or the age value

SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
police_log;

But return values are all NULL. Can anyone give me some RE help, please?

--Berend Tober

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

#3Harald Fuchs
hf517@protecting.net
In reply to: Berend Tober (#1)
Re: SUBSTRING for a regular expression

In article <64617.206.53.65.243.1089074434.squirrel@$HOSTNAME>,
<btober@computer.org> writes:

I can't seem to get right the regular expression for parsing data like
these four sample rows (names and addresses changed to ficticious values)
from a text-type column:

Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
Sunday with breach of peace and interfering with a police officer.

Allen K. George, 30, of 88 Beverly Court was charged Saturday with
possession of marijuana, third-degree criminal mischief, breach of peace,
evading responsibility, interfering with a police officer, driving with a
suspended license, driving under the influence of drugs or alcohol,
failure to drive right and failure to have proper insurance.

Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
possession of marijuana, possession of alcohol by a minor and failure to
wear a seat belt.

Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
Sunday with driving under the influence of drugs or alcohol, evading
responsibility and following too closely.

Into separate columns for: name, age, address, charge. For example the
first record would have

name='Yolanda Harris'
age=38
address='40 South Main St., Newtown City'
charge='was charged Sunday with breach of peace and interfering with a
police officer.'

To get the name, for instance, I tried

SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
police_log;

or the age value

SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
police_log;

But return values are all NULL. Can anyone give me some RE help, please?

Could you use Perl? A Perl regexp for that would be

/^(.+), (\d+), of (.+?),? (was charged.+)$/

#4Berend Tober
btober@computer.org
In reply to: Berend Tober (#1)
Re: SUBSTRING for a regular expression

SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged') AS
address, SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;

Aha! The old double-slash escape. Thank you very much.

--Berend Tober

#5Greg Sabino Mullane
greg@turnstep.com
In reply to: Berend Tober (#1)
Re: SUBSTRING for a regular expression

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged')
AS address,
SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200407062103
-----BEGIN PGP SIGNATURE-----

iD8DBQFA60wAvJuQZxSWSsgRAuKPAJ0QAeG0hdoJ/Ofqq/lXVtwMjyzQjACgoer3
kwPy0xvRiZxwr3cgPq6Rjwc=
=mF/C
-----END PGP SIGNATURE-----

#6Berend Tober
btober@computer.org
In reply to: Greg Sabino Mullane (#5)
Re: SUBSTRING for a regular expression

SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged')
AS address,
SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200407062103

Thanks Greg,

I was hopeful that this would work, since I had missed the need to double
the back-slash escape character in my original work, but something still
isn't right.

First I got an error message that psql didn't like the "?" characters in
the RE, so I eliminated them and wrote

SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*), was charged') AS address,
SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;

This modified version of your suggestion gets the dow, and charge columns
right, but I'm still not seeing the vname, age, and address columns --
they return null. You've gotten me part way there, and I appreciate that.
Any further ideas?

--Berend Tober

#7Berend Tober
btober@computer.org
In reply to: Berend Tober (#6)
Re: SUBSTRING for a regular expression

I was hopeful that this would work, since I had missed the need to
double the back-slash escape character in my original work, but
something still isn't right.

First I got an error message that psql didn't like the "?" characters
in the RE, so I eliminated them and wrote

Sounds like you may be using an older version of Pg than me. What
version are you running? Only that can tell us how limiting your regex
capabilitites are....

Not that old (I think)!

crime=# select version();
version
--------------------------------------------------------------------------------
-----------------------------------------------
PostgreSQL 7.3.4 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gc c (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
(1 row)

--Berend Tober

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Berend Tober (#7)
Re: SUBSTRING for a regular expression

<btober@computer.org> writes:

Sounds like you may be using an older version of Pg than me. What
version are you running? Only that can tell us how limiting your regex
capabilitites are....

Not that old (I think)!

PostgreSQL 7.3.4 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gc c (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)

Nonetheless too old --- 7.4 contains an entirely new regex package.
The one in 7.3 is quite brain-dead compared to any modern regexes...

regards, tom lane