BUG #18870: weird behavior with regexp_replace

Started by PG Bug reporting formabout 1 year ago2 messagesbugs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following bug has been logged on the website:

Bug reference: 18870
Logged by: reinko
Email address: devops@key2asset.com
PostgreSQL version: 17.4
Operating system: Ubuntu 11.4.0-1ubuntu1~22.04
Description:

select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') in postgres 15 the
result is örebro which is correct since ö should fit in the \w for a
regex.
select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') --> since postgres 17
the result is _rebro which is incorrect since \w should also contain
characters like ö, ä, ë.

the to lower is not really relevant to this issue the same happens when it's
just a direct string aswell.
this issue happens with alot of special a-z characters é, è have the same
issue for example.

Kind regards,
Reinko Brink

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: PG Bug reporting form (#1)
Re: BUG #18870: weird behavior with regexp_replace

PG Bug reporting form <noreply@postgresql.org> writes:

select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') in postgres 15 the
result is örebro which is correct since ö should fit in the \w for a
regex.
select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') --> since postgres 17
the result is _rebro which is incorrect since \w should also contain
characters like ö, ä, ë.

This most likely indicates that you've got a different database
collation selected in the v17 installation. Postgres defers to
the LC_CTYPE setting (or, in some configurations, the ICU collation)
to decide what is a letter. See

https://www.postgresql.org/docs/current/charset.html

psql's "\l" command will give a quick overview of what collations
you have selected.

regards, tom lane