BUG: ILIKE with single-byte encoding

Started by Rolf Jentschabout 18 years ago2 messagesbugs
Jump to latest
#1Rolf Jentsch
RJentsch@electronicpartner.de

Hello,

With PostgreSQL 8.3.0 the following bug has been introduced with the ILIKE or
~~* operator:

In a database with single-byte encoding as LATIN1 the expression

SELECT 'aü' ILIKE '%ü';
returns false.

This error is true for every pattern, where a % is followed by a char with a
decimal value between 128 and 255.

I was able to track down the error to the file
src/backend/utils/adt/like_match.c

For the single-byte case there are some places where a (signed) char value is
compared to the return value auf tolower() which is an int. The 'ü' in Latin1
is -4 as signed char and 252 as int as returned by tolower() which is
obviously not equal.

It could be fixed, with the appended patch.

cu
Rolf Jentsch
Entwicklung Mitglieder-Systeme Dezentral

ElectronicPartner GmbH
Mündelheimer Weg 40
40472 Düsseldorf
phone: +49-(0)211-4156-0
fax: +49-(0)211-4156-6865
eMail: rjentsch@electronicpartner.de

Sitz der Gesellschaft Düsseldorf
Amtsgericht - Registergericht Düsseldorf - HRB 4078
Geschäftsführer: Oliver Haubrich,
Dr. Sven-Olaf Krauß, Karl Trautman

--- src/backend/utils/adt/like_match.c       2008-02-28 18:19:30.000000000 
+0100
+++ src/backend/utils/adt/like_match.c        2008-02-28 18:19:43.000000000 
+0100
@@ -71,7 +71,7 @@
  */

#ifdef MATCH_LOWER
-#define TCHAR(t) tolower((t))
+#define TCHAR(t) ((char)tolower((t)))
#else
#define TCHAR(t) (t)
#endif

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Rolf Jentsch (#1)
Re: BUG: ILIKE with single-byte encoding

Rolf Jentsch <RJentsch@electronicpartner.de> writes:

With PostgreSQL 8.3.0 the following bug has been introduced with the ILIKE or
~~* operator:
In a database with single-byte encoding as LATIN1 the expression
SELECT 'a�' ILIKE '%�';
returns false.

For the single-byte case there are some places where a (signed) char
value is compared to the return value auf tolower() which is an int.

Patch applied, thanks! It turns out there was a second bug on the very
same line: some machines have problems if the argument of tolower()
isn't explicitly cast to unsigned char ...

regards, tom lane