order of (escaped) characters in regex range

Started by InterRobover 14 years ago10 messagesgeneral
Jump to latest
#1InterRob
rob.marjot@gmail.com

Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR: invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
regexp_matches
----------------
{3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the
expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

#2Szymon Guz
mabewlun@gmail.com
In reply to: InterRob (#1)
Re: order of (escaped) characters in regex range

On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:

Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR: invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
regexp_matches
----------------
{3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the
expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards
Szymon

#3Rob Marjot
rob@marjot-multisoft.com
In reply to: Szymon Guz (#2)
Re: order of (escaped) characters in regex range

True, but still weird...

And are you sure it does the same thing?

2011/12/13 Szymon Guz <mabewlun@gmail.com>

Show quoted text

On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:

Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR: invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
regexp_matches
----------------
{3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of
the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards
Szymon

#4InterRob
rob.marjot@gmail.com
In reply to: Szymon Guz (#2)
Re: order of (escaped) characters in regex range

True, but still weird...

And are you sure it does the same thing?

2011/12/13 Szymon Guz <mabewlun@gmail.com>

Show quoted text

On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:

Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR: invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
regexp_matches
----------------
{3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of
the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards
Szymon

#5David G. Johnston
david.g.johnston@gmail.com
In reply to: Szymon Guz (#2)
Re: order of (escaped) characters in regex range

On Dec 13, 2011, at 8:09, Szymon Guz <mabewlun@gmail.com> wrote:

On 13 December 2011 14:04, InterRob <rob.marjot@gmail.com> wrote:
Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR: invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
regexp_matches
----------------
{3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards

If you don't intend to use PostgreSQL escapes in your string then omit the leading 'E'.

In a character class the - symbol has special meaning if it appears anywhere but the first character of the group. To avoid that special meaning you have to escape it. If it appears first it always means a literal -. The PostgreSQL documentation does not fully describe RegularExpressions but a reference book on them would note this particular behavior.

David J.

#6InterRob
rob.marjot@gmail.com
In reply to: David G. Johnston (#5)
Re: order of (escaped) characters in regex range

Thanks guys, i see what you mean.

I do intend to use the PG escaping, in order to avoid that annoying
warning... Hence, my expression should indeed be:
SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');

In the above expression i added the parentheses as I whish to match these
as well :))

Thanks!

2011/12/13 David Johnston <polobo@yahoo.com>

Show quoted text

On Dec 13, 2011, at 8:09, Szymon Guz <mabewlun@gmail.com> wrote:

On 13 December 2011 14:04, InterRob < <rob.marjot@gmail.com>
rob.marjot@gmail.com> wrote:

Dear List,

I found this interesting:

SELECT regexp_matches('123-A' , E'(3[A-Z\- ])');
ERROR: invalid regular expression: invalid character range

whereas:
SELECT regexp_matches('123-A' , E'(3[\- A-Z])');
regexp_matches
----------------
{3-}
(1 row)

Notice the order of (escaped) characters and ranges in the last bit of
the expression.

Am I missing some key concept of the regular expression?

Regards,
Rob

Hi Rob,
try '\\-' instead of '\-'
and it works :)

regards

If you don't intend to use PostgreSQL escapes in your string then omit the
leading 'E'.

In a character class the - symbol has special meaning if it appears
anywhere but the first character of the group. To avoid that special
meaning you have to escape it. If it appears first it always means a
literal -. The PostgreSQL documentation does not fully describe
RegularExpressions but a reference book on them would note this particular
behavior.

David J.

In reply to: InterRob (#6)
Re: order of (escaped) characters in regex range

On Tue, Dec 13, 2011 at 02:51:15PM +0100, InterRob wrote:

Thanks guys, i see what you mean.

I do intend to use the PG escaping, in order to avoid that annoying
warning... Hence, my expression should indeed be:
SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');

In the above expression i added the parentheses as I whish to match these
as well :))

instead of putting that much quoting just do:
SELECT regexp_matches('123-A' , '(3[A-Z() -])');
( and ) don't need to be quoted, and if you'll move - at the beginning
or end (i prefer end) of range, it doesn't need to be quoted either.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
http://depesz.com/

#8Merlin Moncure
mmoncure@gmail.com
In reply to: InterRob (#6)
Re: order of (escaped) characters in regex range

On Tue, Dec 13, 2011 at 7:51 AM, InterRob <rob.marjot@gmail.com> wrote:

Thanks guys, i see what you mean.

I do intend to use the PG escaping, in order to avoid that annoying
warning... Hence, my expression should indeed be:
SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');

In the above expression i added the parentheses as I whish to match these as
well :))

I advise dollar quoting when writing complicated regular expressions:

E'(3[A-Z\\-\\(\\) ])'
becomes
$$(3[A-Z\-\(\) ])$$

merlin

#9David G. Johnston
david.g.johnston@gmail.com
In reply to: Merlin Moncure (#8)
Re: order of (escaped) characters in regex range

-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Merlin Moncure
Sent: Tuesday, December 13, 2011 11:39 AM
To: rob@marjot-multisoft.com
Cc: David Johnston; Szymon Guz; pgsql-general
Subject: Re: [GENERAL] order of (escaped) characters in regex range

On Tue, Dec 13, 2011 at 7:51 AM, InterRob <rob.marjot@gmail.com> wrote:

Thanks guys, i see what you mean.

I do intend to use the PG escaping, in order to avoid that annoying
warning... Hence, my expression should indeed be:
SELECT regexp_matches('123-A' , E'(3[A-Z\\-\\(\\) ])');

In the above expression i added the parentheses as I whish to match
these as well :))

I advise dollar quoting when writing complicated regular expressions:

E'(3[A-Z\\-\\(\\) ])'
becomes
$$(3[A-Z\-\(\) ])$$

merlin

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make
changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

---------------------------------------------------------------

Aside from backward compatibility, and the various warnings, is there any
reason to prefer dollar-quoting over a non-SQL-escaped string literal (i.e.,
'3[A-Z\-\(\) ]' ) ?

David J.

#10Merlin Moncure
mmoncure@gmail.com
In reply to: David G. Johnston (#9)
Re: order of (escaped) characters in regex range

On Tue, Dec 13, 2011 at 10:53 AM, David Johnston <polobo@yahoo.com> wrote:

Aside from backward compatibility, and the various warnings, is there any
reason to prefer dollar-quoting over a non-SQL-escaped string literal (i.e.,
'3[A-Z\-\(\) ]'   ) ?

yeah -- because sooner or later you have to stick a single quote in
there (of course, you can double the ', but I personally think that's
awful).

merlin