Method to detect certain characters in column?

Started by Ian Meyeralmost 18 years ago10 messagesgeneral
Jump to latest
#1Ian Meyer
ianmmeyer@gmail.com

So I have a column that contains usernames that have characters such
as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names
with non A-Za-z0-9?

Thanks in advance!
- Ian

#2Tommy Gildseth
tommy.gildseth@usit.uio.no
In reply to: Ian Meyer (#1)
Re: Method to detect certain characters in column?

Ian Meyer wrote:

So I have a column that contains usernames that have characters such
as �(c)(R), for example: fuch�(c)r.. is there any way to find names
with non A-Za-z0-9?

...WHERE col ~ '[^a-zA-Z0-9]';

Someone with a bit more regex fu can probably condense down the regex.

--
Tommy Gildseth

#3Antonio Perez
renjin25@yahoo.com
In reply to: Ian Meyer (#1)
Re: Method to detect certain characters in column?

use this

regular expression                             
~
regular expression - case insensitive     
~*

example

SELECT name FROM  table1 where name
~* '*Ã*'

check here

http://www.postgresql.org/docs/current/static/functions-matching.html

--- El lun 23-jun-08, Ian Meyer <ianmmeyer@gmail.com> escribió:
De: Ian Meyer <ianmmeyer@gmail.com>
Asunto: [GENERAL] Method to detect certain characters in column?
A: pgsql-general@postgresql.org
Fecha: lunes, 23 junio, 2008, 5:58 pm

So I have a column that contains usernames that have characters such
as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names
with non A-Za-z0-9?

Thanks in advance!
- Ian

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#4Alvaro Herrera
alvherre@2ndquadrant.com
In reply to: Antonio Perez (#3)
Re: Method to detect certain characters in column?

Antonio Perez wrote:

example

SELECT name FROM� table1 where name
~* '*�*'

Actually this regex is flawed. It looks like a common shell "glob"
pattern (I don't know the real name of these things), which is a very
different and simpler animal from a regex.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ian Meyer (#1)
Re: Method to detect certain characters in column?

"Ian Meyer" <ianmmeyer@gmail.com> writes:

So I have a column that contains usernames that have characters such
as �(c)(R), for example: fuch�(c)r.. is there any way to find names
with non A-Za-z0-9?

Hmm, none of the responses so far look right to me. How about

WHERE NOT (col ~ '^[A-Za-z0-9]*$')

regards, tom lane

#6Steve Atkins
steve@blighty.com
In reply to: Ian Meyer (#1)
Re: Method to detect certain characters in column?

On Jun 23, 2008, at 1:58 PM, Ian Meyer wrote:

So I have a column that contains usernames that have characters such
as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names
with non A-Za-z0-9?

... WHERE column ~* '[^a-z0-9]'

Cheers,
Steve

#7Ian Meyer
ianmmeyer@gmail.com
In reply to: Steve Atkins (#6)
Re: Method to detect certain characters in column?

Ah, so I forgot to mention the one caveat to this (sorry!) was there
was a ton of punctuation/spaces and other ilk.. so this is what I came
up with:

bco=# select name from member where not (name ~ '^[A-Za-z0-9[:punct:] ]*$');
name
----------------------
Señorita Lolita
Long Pig
täkäurgh
blåbärsöl
fuchér MkII
fuchér ver2.0
Gûm-ishi Ashi Gurum
kängnäve
Fuchér-version 2.1
fuchÃ(c)r

Thank you everyone for your help.. that looks to be the correct amount
I was looking for.

Ian

Show quoted text

On Mon, Jun 23, 2008 at 7:28 PM, Steve Atkins <steve@blighty.com> wrote:

On Jun 23, 2008, at 1:58 PM, Ian Meyer wrote:

So I have a column that contains usernames that have characters such
as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names
with non A-Za-z0-9?

... WHERE column ~* '[^a-z0-9]'

Cheers,
Steve

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#8Tino Wildenhain
tino@wildenhain.de
In reply to: Ian Meyer (#7)
Re: Method to detect certain characters in column?

Hi Ian,

Ian Meyer wrote:

Ah, so I forgot to mention the one caveat to this (sorry!) was there
was a ton of punctuation/spaces and other ilk.. so this is what I came
up with:

bco=# select name from member where not (name ~ '^[A-Za-z0-9[:punct:] ]*$');
name
----------------------
Señorita Lolita
Long Pig
täkäurgh
blåbärsöl
fuchér MkII
fuchér ver2.0
Gûm-ishi Ashi Gurum
kängnäve
Fuchér-version 2.1
fuchÃ(c)r

Uh, is that really the name as it should be? To me it
looks much more like UTF-8 stored in SQL-Ascii. Maybe
converting it correctly would help?

Cheers
Tino

#9Ian Meyer
ianmmeyer@gmail.com
In reply to: Tino Wildenhain (#8)
Re: Method to detect certain characters in column?

That's entirely possible.. which is the reason for cleanup.. we're
moving to a model where members can be queried by name, and UTF-8
isn't allowed in URLs, so we need to rename/remove users with those
types of names. A lot of these members are from years ago where we
were on mysql with not enough experience to sanity check everything,
or do things as we're doing now.

And yeah, that's how the names are. People got crafty.

*shrugs*

thanks again for all the help everyone!

Show quoted text

On Mon, Jun 23, 2008 at 11:54 PM, Tino Wildenhain <tino@wildenhain.de> wrote:

Hi Ian,

Ian Meyer wrote:

Ah, so I forgot to mention the one caveat to this (sorry!) was there
was a ton of punctuation/spaces and other ilk.. so this is what I came
up with:

bco=# select name from member where not (name ~ '^[A-Za-z0-9[:punct:]
]*$');
name
----------------------
Señorita Lolita
Long Pig
täkäurgh
blåbärsöl
fuchér MkII
fuchér ver2.0
Gûm-ishi Ashi Gurum
kängnäve
Fuchér-version 2.1
fuchÃ(c)r

Uh, is that really the name as it should be? To me it
looks much more like UTF-8 stored in SQL-Ascii. Maybe
converting it correctly would help?

Cheers
Tino

#10Craig Ringer
craig@2ndquadrant.com
In reply to: Ian Meyer (#9)
Re: Method to detect certain characters in column?

Ian Meyer wrote:

That's entirely possible.. which is the reason for cleanup.. we're
moving to a model where members can be queried by name, and UTF-8
isn't allowed in URLs, so we need to rename/remove users with those
types of names.

Depending on your webserver, Unicode characters should be possible in
URLs. Certainly some non-7-bit characters can be represented in URLs
with % encoding.

On my Apache system, for example, the string:

áüÜć

in latin-1 encoding is represented in a URL as:

%c3%a1%c3%bc%c3%9c%c4%87

If that same byte sequence is decoded as UTF-8 instead, it is:

áüÜć

(a-acute u-umlaut U-umlaut c-acute)

In other words, you can send an arbitrary byte sequence in a URL with %
escapes. How that byte sequence is decoded into a sequence of characters
depends on your web server and its configuration - or, if the web server
sends the % encoded URL directly to your client, on how your client
chooses to interpret it.

The main trouble will be ensuring that clients and the server agree on
the character encoding used in the URL.

--
Craig Ringer