BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

Started by Grzegorz Danilukover 15 years ago2 messagesbugs
Jump to latest
#1Grzegorz Daniluk
gdaniluk@gmail.com

The following bug has been logged online:

Bug reference: 5766
Logged by: Grzegorz Daniluk
Email address: gdaniluk@gmail.com
PostgreSQL version: 9.0.1
Operating system: Windows 7 64-bit
Description: regexp \y doesn't work properly when a word starts on
ends with a UTF-8 char
Details:

select regexp_replace('Foo Pasaż Bar', E'\\yPasaż\\y', '');

Above query doesn't replace the word 'Pasaż'. It returns full 'Foo Pasaż
Bar' string, when the correct behavior is to return 'Foo Bar'.

When the 'ż' is replaced with normal ASCII character like 'z',
regexp_replace works as expected.

My db details:
ENCODING = 'UTF8'
LC_COLLATE = 'Polish_Poland.1250'
LC_CTYPE = 'Polish_Poland.1250'

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Grzegorz Daniluk (#1)
Re: BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

"Grzegorz Daniluk" <gdaniluk@gmail.com> writes:

select regexp_replace('Foo Pasaż Bar', E'\\yPasaż\\y', '');

Above query doesn't replace the word 'Pasaż'. It returns full 'Foo Pasaż
Bar' string, when the correct behavior is to return 'Foo Bar'.

Is this problem limited to \y, or do other regex operations that depend
on locale-specific character classification also not work for you?

regards, tom lane