someone please explain this regex behaviour

Started by Martin Lejaabout 25 years ago3 messagesgeneral
Jump to latest
#1Martin Leja
Martin.Leja@unix-ag.uni-siegen.de

Hi,

consider the following simple example, where i want to select a path by
using a where clause with the case insensitive operator "~*". Everything is
ok where i get a result of one row, but i don't understand the results with
0 rows:

backup=> select version();
version
-------------------------------------------------------------
PostgreSQL 6.5.3 on i686-pc-linux-gnu, compiled by gcc 2.95.2
(1 row)

backup=> create table foo (path varchar(300));
CREATE
backup=> insert into foo (path) values ('/My');
INSERT 29400 1
backup=> select path from foo where path ~* '/My';
path
----
/My
(1 row)

backup=> select path from foo where path ~* '^/My';
path
----
/My
(1 row)

backup=> select path from foo where path ~* '^/my';
path
----
(0 rows)

backup=> select path from foo where path ~* '/my';
path
----
/My
(1 row)

backup=> select path from foo where path ~* '/mY';
path
----
/My
(1 row)

backup=> select path from foo where path ~* '^/mY';
path
----
(0 rows)

Why e.g. does the statement "select path from foo where path ~* '^/my';"
not return the only entry "/My"? Can someone explain this?
--
Regards, martin@unix-ag.org

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Martin Leja (#1)
Re: someone please explain this regex behaviour

Martin Leja <Martin.Leja@unix-ag.uni-siegen.de> writes:

Why e.g. does the statement "select path from foo where path ~* '^/my';"
not return the only entry "/My"? Can someone explain this?

I think you're getting bitten by the LIKE-index-optimization-in-non-
ASCII-locale problem. Are you running the server in a locale other
than "C"? See the many past threads about this type of issue ...

regards, tom lane

#3Martin Leja
Martin.Leja@unix-ag.uni-siegen.de
In reply to: Tom Lane (#2)
Re: someone please explain this regex behaviour

At 18:47 27.01.2001 -0500, Tom Lane wrote:

Martin Leja <Martin.Leja@unix-ag.uni-siegen.de> writes:

Why e.g. does the statement "select path from foo where path ~* '^/my';"
not return the only entry "/My"? Can someone explain this?

I think you're getting bitten by the LIKE-index-optimization-in-non-
ASCII-locale problem. Are you running the server in a locale other
than "C"? See the many past threads about this type of issue ...

i'm not quite familiar with this locale stuff, so i searched the docs and
found the following in doc/postgresql-doc/postgres/install12893.htm:
...
If you configure and compile Postgres with --enable-locale then you should
set the locale environment to "C" (or unset all "LC_*" variables) by
putting these additional lines to your login environment before starting
postmaster:
LC_COLLATE=C
LC_CTYPE=C
export LC_COLLATE LC_CTYPE
...

i then changed /etc/init.d/postgresql (postmaster start script in debian)
accordingly, restarted postmaster with the script, but unfortunetly i get
the same results.

I wonder if i disabled the LIKE-index-optimization-in-non-ASCII-locale with
the above action at all and if this is the problem of my select results.
Isn't there a "psql -c 'show ???'" command which can report the locale
setting to me?

--
Regards, martin@unix-ag.org