repeated characters in SQL

Started by Govind Chettiarabout 10 years ago5 messagesgeneral
Jump to latest
#1Govind Chettiar
rashapoo@gmail.com

I have a simple table consisting of a bunch of English words. I am trying
to find words that have repeated characters in them, for example
apple
tattoo

but not

orange
lemon

I know that only a maximum of one repetition can occur

I tried various options like
SELECT word FROM public."SpellItWords"
WHERE word ~ E'(.)\1{2,}'

SELECT word FROM public."SpellItWords"
WHERE word ~ E'([a-z])\1{2}'

What finally worked was this
SELECT word FROM public."SpellItWords"
WHERE word ~ E'(.)\\1'

But I don't really understand what this does...Can you explain?

Thanks!

#2David Rowley
dgrowleyml@gmail.com
In reply to: Govind Chettiar (#1)
Re: repeated characters in SQL

On 24 January 2016 at 12:44, Govind Chettiar <rashapoo@gmail.com> wrote:

I have a simple table consisting of a bunch of English words. I am trying
to find words that have repeated characters in them, for example
apple
tattoo

but not

orange
lemon

I know that only a maximum of one repetition can occur

I tried various options like
SELECT word FROM public."SpellItWords"
WHERE word ~ E'(.)\1{2,}'

SELECT word FROM public."SpellItWords"
WHERE word ~ E'([a-z])\1{2}'

What finally worked was this
SELECT word FROM public."SpellItWords"
WHERE word ~ E'(.)\\1'

But I don't really understand what this does...Can you explain?

The ~ operator is a regular expression matching operator, and the
(.)\1 is a regular expression. More details here
http://www.postgresql.org/docs/current/static/functions-matching.html

The regular expression . matches a single character, since that . is
wrapped in () the regex engine captures the match and stores it in a
variable, this is called a capture group. Since this is the first such
capture group in the regular expression, then the value matching the .
gets stored in the variable \1, so your regex basically says; "match a
single character which has the same single character to its immediate
right hand side". The extra \ is just an escape character.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#3Govind Chettiar
rashapoo@gmail.com
In reply to: David Rowley (#2)
Re: repeated characters in SQL

Thanks David...so it's looking at each character, storing it in /1, then
comparing the "next" character with what is in /1.
I guess the escape character (which is not needed in, say, Notepad++) threw
me a bit.

On Sun, Jan 24, 2016 at 2:32 AM, David Rowley <david.rowley@2ndquadrant.com>
wrote:

Show quoted text

On 24 January 2016 at 12:44, Govind Chettiar <rashapoo@gmail.com> wrote:

I have a simple table consisting of a bunch of English words. I am

trying

to find words that have repeated characters in them, for example
apple
tattoo

but not

orange
lemon

I know that only a maximum of one repetition can occur

I tried various options like
SELECT word FROM public."SpellItWords"
WHERE word ~ E'(.)\1{2,}'

SELECT word FROM public."SpellItWords"
WHERE word ~ E'([a-z])\1{2}'

What finally worked was this
SELECT word FROM public."SpellItWords"
WHERE word ~ E'(.)\\1'

But I don't really understand what this does...Can you explain?

The ~ operator is a regular expression matching operator, and the
(.)\1 is a regular expression. More details here
http://www.postgresql.org/docs/current/static/functions-matching.html

The regular expression . matches a single character, since that . is
wrapped in () the regex engine captures the match and stores it in a
variable, this is called a capture group. Since this is the first such
capture group in the regular expression, then the value matching the .
gets stored in the variable \1, so your regex basically says; "match a
single character which has the same single character to its immediate
right hand side". The extra \ is just an escape character.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#4Francisco Olarte
folarte@peoplecall.com
In reply to: Govind Chettiar (#3)
Re: repeated characters in SQL

On Sun, Jan 24, 2016 at 7:05 PM, <rashapoo@gmail.com> wrote:

I guess the escape character (which is not needed in, say, Notepad++) threw
me a bit.

Notepad ++ is, AFAIK, an editor, it SHOULD (within reason) let you
write any text.

The double quote is needed due to the quoting rules of the language.
You want the regexp engine to see the characters leftp, dot, rightp,
backslash, one. But backslah is the scape character in strings ( in
many languages ), so you need to escape it too.

The same happens in, for example, C and Java. To put those five chars
in a string you need "(.)\\1". If you send "(.)\1" to a C compiler it
will build the string leftp, dot, rightp, SOH=(char)(1). It will
arrive to the backslash when parsing, see it is followed by a digit
less than 8, interpret it as an octal escape, and emit the SOH.

Francisco Olarte.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Francisco Olarte (#4)
Re: repeated characters in SQL

Francisco Olarte <folarte@peoplecall.com> writes:

On Sun, Jan 24, 2016 at 7:05 PM, <rashapoo@gmail.com> wrote:

I guess the escape character (which is not needed in, say, Notepad++) threw
me a bit.

Notepad ++ is, AFAIK, an editor, it SHOULD (within reason) let you
write any text.

The double quote is needed due to the quoting rules of the language.
You want the regexp engine to see the characters leftp, dot, rightp,
backslash, one. But backslah is the scape character in strings ( in
many languages ), so you need to escape it too.

In this particular case, the extra backslash was needed only because
the OP used E'...' syntax for his string literal. In a plain SQL
string literal, backslash isn't special.

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general