BUG #5743: Regexp engine fails to case-insensitively match multi-byte codepoints
The following bug has been logged online:
Bug reference: 5743
Logged by: Vlad Romascanu
Email address: vromascanu@accurev.com
PostgreSQL version: 8.4.3
Operating system: Windows, Linux
Description: Regexp engine fails to case-insensitively match
multi-byte codepoints
Details:
Already reported in 2006 but seems to have fallen through the cracks (I can
find no followup.) Problem still exists in v8.4.3.
Problem still appears to be pg_wc_tolower downcasting to char before calling
tolower() (instead of calling towlower().)
This one of several inconsistencies unfortunately still present in
case-insensitive regexp vs. LOWER(str) [str_lower] treatment (including char
to wchar conversion using MultiByteToWideChar/mbstowcs vs. char2wchar, or
towlower vs. pg_wc_tolower.)
Current workaround is to use LOWER(str) ~ LOWER('regexp').
"Vlad Romascanu" <vromascanu@accurev.com> writes:
Description: Regexp engine fails to case-insensitively match
multi-byte codepoints
Already reported in 2006 but seems to have fallen through the cracks (I can
find no followup.) Problem still exists in v8.4.3.
It's fixed in 9.0, at least for cases using UTF8 encoding.
regards, tom lane