tsearch is non-multibyte-aware in a few places

Started by Tom Lanealmost 18 years ago2 messageshackers
Jump to latest
#1Tom Lane
tgl@sss.pgh.pa.us

I've identified the cause of bug #4253:

/* Trim trailing space */
while (*pbuf && !t_isspace(pbuf))
pbuf++;
*pbuf = '\0';

At least on Macs, t_isspace is capable of returning "true" when pointed
at the second byte of a 2-byte UTF8 character. This explains the report
that the letter "�" has a problem when some other ones don't. Of
course pbuf needs to be incremented using pg_mblen not just ++.

I looked around for other occurrences of the same problem and found
a couple. I also found occurrences of the same pattern for skipping
whitespace:

while (*s && t_isspace(s))
s++;

This is safe if and only if t_isspace is never true for multibyte
characters ... can anyone think of a counterexample?

regards, tom lane

#2Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#1)
Re: tsearch is non-multibyte-aware in a few places

I wrote:

This is safe if and only if t_isspace is never true for multibyte
characters ... can anyone think of a counterexample?

Non-breaking space is a counterexample, so I pg_mblen-ified those
loops too. Fortunately this code only executes during dictionary
cache load, so a few extra cycles aren't too critical.

regards, tom lane