BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints

Started by Vlad Romascanuover 15 years ago2 messagesbugs
Jump to latest
#1Vlad Romascanu
vromascanu@accurev.com

The following bug has been logged online:

Bug reference: 5743
Logged by: Vlad Romascanu
Email address: vromascanu@accurev.com
PostgreSQL version: 8.4.3
Operating system: Windows, Linux
Description: Regexp engine fails to case-insensitively match
multi-byte codepoints
Details:

Already reported in 2006 but seems to have fallen through the cracks (I can
find no followup.) Problem still exists in v8.4.3.

Problem still appears to be pg_wc_tolower downcasting to char before calling
tolower() (instead of calling towlower().)

This one of several inconsistencies unfortunately still present in
case-insensitive regexp vs. LOWER(str) [str_lower] treatment (including char
to wchar conversion using MultiByteToWideChar/mbstowcs vs. char2wchar, or
towlower vs. pg_wc_tolower.)

Current workaround is to use LOWER(str) ~ LOWER('regexp').

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Vlad Romascanu (#1)
Re: BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints

"Vlad Romascanu" <vromascanu@accurev.com> writes:

Description: Regexp engine fails to case-insensitively match
multi-byte codepoints

Already reported in 2006 but seems to have fallen through the cracks (I can
find no followup.) Problem still exists in v8.4.3.

It's fixed in 9.0, at least for cases using UTF8 encoding.

regards, tom lane