regular expressions in query

Started by fionaabout 21 years ago10 messagesgeneral
Jump to latest
#1fiona
fbissett@blueyonder.co.uk

My database table holds phone numbers that may contain characters other
than digits (that's not a problem in itself).

I want to be able to apply a regular expression (to ignore all
characters except digits) to the attribute 'phone' first and then for
the ILIKE to compare
the result to $telephone. I can't find any way of applying the RE to phone.
My current query without the RE is as follows:

SELECT telephone FROM addresses WHERE id = user_id AND phone ILIKE
'%".addslashes($telephone)."%'"
I want to do something like: AND phone([^[:digit:]]) ILIKE $telephone
But this doesn't work.
Any ideas?
--
Get Thunderbird <http://www.mozilla.org/products/thunderbird/&gt;
<http://www.mozilla.org/products/thunderbird/&gt;

#2Jeff Davis
pgsql@j-davis.com
In reply to: fiona (#1)
Re: regular expressions in query

Try using the "~" regex matching operator instead of ILIKE.

Regards,
Jeff Davis

Show quoted text

On Fri, 2005-02-11 at 22:21 +0000, fiona wrote:

My database table holds phone numbers that may contain characters other
than digits (that's not a problem in itself).

I want to be able to apply a regular expression (to ignore all
characters except digits) to the attribute 'phone' first and then for
the ILIKE to compare
the result to $telephone. I can't find any way of applying the RE to phone.
My current query without the RE is as follows:

SELECT telephone FROM addresses WHERE id = user_id AND phone ILIKE
'%".addslashes($telephone)."%'"
I want to do something like: AND phone([^[:digit:]]) ILIKE $telephone
But this doesn't work.
Any ideas?

#3fiona
fbissett@blueyonder.co.uk
In reply to: Jeff Davis (#2)
Re: regular expressions in query

No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.7 - Release Date: 10/02/2005

#4Scott Marlowe
smarlowe@g2switchworks.com
In reply to: fiona (#3)
Re: regular expressions in query

On Sat, 2005-02-12 at 10:31, F.Bissett wrote:

On Fri, 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:

Try using the "~" regex matching operator instead of ILIKE.

Regards,
Jeff Davis

I still need the ILIKE to compare the returned value with $telephone.

I have the following PHP to check an input string for non numeric
characters:

$tel = ereg_replace('[^[:digit:]]', "", $test); -- $tel then equals
only the numbers in test.

This is what I want to be able to do inside the query, but without
altering the values in the database - to look at the column 'phone',
see if there are any non-numeric characters and ignore them then
compare the numbers that are left with $telephone.

I tried "AND phone ~ '[^[:digit:]]' ILIKE $telephone"

But get the following error:

Try anding them:

where phone ILIKE $telephone AND phone ~ '[.... etc...

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: fiona (#3)
Re: regular expressions in query

"F.Bissett" <fbissett@blueyonder.co.uk> writes:

</head><BODY BGCOLOR=3D"#F0F0F0" ><p><SPAN style=3D"font-size:10pt;">On Fri=
, 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:<br /></SPAN><SPAN style=3D"=
font-size:10pt;color:navy;">&gt;=A0Try using the "~" regex matching operato=
r instead of ILIKE.</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPA=
N style=3D"font-size:10pt;color:navy;">&gt;</SPAN><SPAN style=3D"font-size:=
10pt;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">&gt;=A0Regar=
ds,</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPAN style=3D"font-=
size:10pt;color:navy;">&gt;=A0Jeff Davis</SPAN><SPAN style=3D"font-size:10p=
t;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">&gt;</SPAN></p>

Please don't post HTML email; it's a pain in the neck to quote.

I have the following PHP to check an input string for non numeric characters:

$tel = ereg_replace('[^[:digit:]]', "", $test); -- tel then equals only the numbers in test.

The closest equivalent we have to that is the regex-extraction version
of the substring() function --- see
http://www.postgresql.org/docs/8.0/static/functions-matching.html
It would go something like

substring($test from '[0-9]+')

However, what that actually gets you is the first all-numeric substring;
if there are multiple occurrences of digits separated by non-digits this
will not do what you want.

My advice is to write the function you want in one of the PLs that have
good string-mashing facilities --- either plperl or pltcl would
certainly do. (Probably plpython too, but I'm not very familiar with
Python.) Plain SQL is not very strong on string manipulation, but
that's why we have extension languages.

regards, tom lane

#6elein
elein@varlena.com
In reply to: Tom Lane (#5)
Re: regular expressions in query

No doubt someone more adept at perl can write
this function as a one-liner.

create or replace function just_digits(text)
returns text as
$$
my $innum = $_[0];
$innum =~ s/\D//g;
return $innum;
$$ language 'plperl'

SELECT telephone FROM addresses
WHERE user_id = 'bob'
AND just_digits(telephone) = '1115551212';

--elein

Show quoted text

On Sat, Feb 12, 2005 at 12:27:20PM -0500, Tom Lane wrote:

"F.Bissett" <fbissett@blueyonder.co.uk> writes:

</head><BODY BGCOLOR=3D"#F0F0F0" ><p><SPAN style=3D"font-size:10pt;">On Fri=
, 11 Feb 2005 19:56:33 -0800, Jeff Davis wrote:<br /></SPAN><SPAN style=3D"=
font-size:10pt;color:navy;">&gt;=A0Try using the "~" regex matching operato=
r instead of ILIKE.</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPA=
N style=3D"font-size:10pt;color:navy;">&gt;</SPAN><SPAN style=3D"font-size:=
10pt;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">&gt;=A0Regar=
ds,</SPAN><SPAN style=3D"font-size:10pt;"><br /></SPAN><SPAN style=3D"font-=
size:10pt;color:navy;">&gt;=A0Jeff Davis</SPAN><SPAN style=3D"font-size:10p=
t;"><br /></SPAN><SPAN style=3D"font-size:10pt;color:navy;">&gt;</SPAN></p>

Please don't post HTML email; it's a pain in the neck to quote.

I have the following PHP to check an input string for non numeric characters:

$tel = ereg_replace('[^[:digit:]]', "", $test); -- tel then equals only the numbers in test.

The closest equivalent we have to that is the regex-extraction version
of the substring() function --- see
http://www.postgresql.org/docs/8.0/static/functions-matching.html
It would go something like

substring($test from '[0-9]+')

However, what that actually gets you is the first all-numeric substring;
if there are multiple occurrences of digits separated by non-digits this
will not do what you want.

My advice is to write the function you want in one of the PLs that have
good string-mashing facilities --- either plperl or pltcl would
certainly do. (Probably plpython too, but I'm not very familiar with
Python.) Plain SQL is not very strong on string manipulation, but
that's why we have extension languages.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

#7Russ Brown
pickscrape@gmail.com
In reply to: elein (#6)
Re: regular expressions in query

elein wrote:

No doubt someone more adept at perl can write
this function as a one-liner.

create or replace function just_digits(text)
returns text as
$$
my $innum = $_[0];
$innum =~ s/\D//g;
return $innum;
$$ language 'plperl'

SELECT telephone FROM addresses
WHERE user_id = 'bob'
AND just_digits(telephone) = '1115551212';

--elein

I've thought about things like this in the past, and a thought that
occurred to me was to add a functional index on just_digits(telephone)
to the table. Would this not allow the above query to use an index while
searching?

--

Russ.

#8Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: Russ Brown (#7)
Re: regular expressions in query

At 09:57 AM 2/13/2005 +0000, Russ Brown wrote:

I've thought about things like this in the past, and a thought that
occurred to me was to add a functional index on just_digits(telephone) to
the table. Would this not allow the above query to use an index while
searching?

I think it should. But for phone numbers it may be better to reverse the
digits before indexing - usually whilst the area code changes, the last 4
or 5 digits don't change.

This way you can do a LIKE search on *5678. Where the number ends with 5678.

I'm not sure how to get Postgresql to index from the ending to the start of
a string vs the normal from the start to the end, so in my webapp I
reversed it at the application layer. If you are going to do this sort of
thing at the application layer you might as well do the nondigit removal
there too.

e.g.
$phone=~tr/0-9%_//cd; # I allowed the wildcards % and _
$phone=reverse $phone;

You may still wish to store the phone numbers "as is" for display purposes.

Link.

#9J. Greenlees
jaqui@telus.net
In reply to: elein (#6)
Re: regular expressions in query

Lincoln Yeoh wrote:

At 09:57 AM 2/13/2005 +0000, Russ Brown wrote:

I've thought about things like this in the past, and a thought that
occurred to me was to add a functional index on just_digits(telephone)
to the table. Would this not allow the above query to use an index
while searching?

I think it should. But for phone numbers it may be better to reverse the
digits before indexing - usually whilst the area code changes, the last
4 or 5 digits don't change.

This way you can do a LIKE search on *5678. Where the number ends with
5678.

I'm not sure how to get Postgresql to index from the ending to the start
of a string vs the normal from the start to the end, so in my webapp I
reversed it at the application layer. If you are going to do this sort
of thing at the application layer you might as well do the nondigit
removal there too.

e.g.
$phone=~tr/0-9%_//cd; # I allowed the wildcards % and _
$phone=reverse $phone;

You may still wish to store the phone numbers "as is" for display purposes.

Link.

make sure the table stores as text rather than as numeric data.
then you can use the excellent perl string tools to pull the last 4
characters of the number.

$base=((strlen-4,strlen)

$base being the last 4 digits.
then convert to numeric to test against search requirements.

Jaqui

#10Lincoln Yeoh
lyeoh@pop.jaring.my
In reply to: J. Greenlees (#9)
Re: regular expressions in query

But that method would be specific for searches for the last 4 digits. It
won't work as well for the general case of the last X digits.

To clarify the method I suggested:

Say a phone number is: 818 9567 1234

You reverse the number and store it as text and index it as

43217659818

Then if someone searches for 5671234 you reverse the query string and do a
search for

select * from phonebook where number like '4321765%' and ....

If they enter just the last 5 digits: 71234

select * from phonebook where number like '43217%' and ....

These sort of searches are indexable on postgresql.

Link.

At 04:33 AM 2/13/2005 -0800, J. Greenlees wrote:

Show quoted text

Lincoln Yeoh wrote:
I think it should. But for phone numbers it may be better to reverse the
digits before indexing - usually whilst the area code changes, the last 4
or 5 digits don't change.

This way you can do a LIKE search on *5678. Where the number ends with 5678.

make sure the table stores as text rather than as numeric data.
then you can use the excellent perl string tools to pull the last 4
characters of the number.

$base=((strlen-4,strlen)

$base being the last 4 digits.
then convert to numeric to test against search requirements.

Jaqui