BUG #5478: ILIKE operator returns wrong result

Started by Markusalmost 16 years ago10 messagesbugs
Jump to latest
#1Markus
markus.herven@outpost24.com

The following bug has been logged online:

Bug reference: 5478
Logged by: Markus
Email address: markus.herven@outpost24.com
PostgreSQL version: PostgreSQL 8.4.
Operating system: Ubuntu 10.04
Description: ILIKE operator returns wrong result
Details:

The following query

select 'ba' ilike '%__%';

return true as expected in 8.2 but false in 8.4.

#2Bruce Momjian
bruce@momjian.us
In reply to: Markus (#1)
Re: BUG #5478: ILIKE operator returns wrong result

Markus wrote:

The following bug has been logged online:

Bug reference: 5478
Logged by: Markus
Email address: markus.herven@outpost24.com
PostgreSQL version: PostgreSQL 8.4.
Operating system: Ubuntu 10.04
Description: ILIKE operator returns wrong result
Details:

The following query

select 'ba' ilike '%__%';

return true as expected in 8.2 but false in 8.4.

I can confirm the odd behavior in current CVS:

test=> select 'ba' ilike '%__%';
?column?
----------
f
(1 row)

test=> select 'ba' like '__';
?column?
----------
t
(1 row)

test=> select 'ba' like '__%';
?column?
----------
t
(1 row)

test=> select 'ba' like '%_%';
?column?
----------
t
(1 row)

It seems to be the leading '%' it does not like. Our docs clearly state
your syntax is recommended:

LIKE pattern matching always covers the entire string. Therefore, to
match a sequence anywhere within a string, the pattern must start and
end with a percent sign.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Markus (#1)
Re: BUG #5478: ILIKE operator returns wrong result

"Markus" <markus.herven@outpost24.com> writes:

select 'ba' ilike '%__%';
return true as expected in 8.2 but false in 8.4.

I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?

regards, tom lane

#4Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: BUG #5478: ILIKE operator returns wrong result

Tom Lane wrote:

"Markus" <markus.herven@outpost24.com> writes:

select 'ba' ilike '%__%';
return true as expected in 8.2 but false in 8.4.

I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?

Yea, looks like it is this code in like_match.c:

/* %_ is the same as _% - avoid matching _ repeatedly */

do
{
NextChar(t, tlen);
NextByte(p, plen);
} while (tlen > 0 && plen > 0 && *p == '_');

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

#5Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#4)
Re: BUG #5478: ILIKE operator returns wrong result

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?

Yea, looks like it is this code in like_match.c:

No, actually it's the bit right after that:

/* Look for a place that matches the rest of the pattern */
while (tlen > 0)
{
int matched = MatchText(t, tlen, p, plen);

if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */

NextChar(t, tlen);
}

If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring. So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.

regards, tom lane

#6Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#5)
Re: BUG #5478: ILIKE operator returns wrong result

Tom Lane wrote:

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?

Yea, looks like it is this code in like_match.c:

No, actually it's the bit right after that:

/* Look for a place that matches the rest of the pattern */
while (tlen > 0)
{
int matched = MatchText(t, tlen, p, plen);

if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */

NextChar(t, tlen);
}

If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring. So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.

I took a different approach. I think the problem is that we check for
end of pattern without consuming '%' patterns. I copied that consume
loop from code above that where we also test for end of pattern.

With the attached patch (which includes a regression test addition), it
works fine:

test=> select 'ba' like '%__%';
?column?
----------
t
(1 row)

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

Attachments:

/pgpatches/liketext/plainDownload+9-1
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bruce Momjian (#6)
Re: BUG #5478: ILIKE operator returns wrong result

Bruce Momjian <bruce@momjian.us> writes:

Tom Lane wrote:

If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring. So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.

I took a different approach. I think the problem is that we check for
end of pattern without consuming '%' patterns. I copied that consume
loop from code above that where we also test for end of pattern.

With the attached patch (which includes a regression test addition), it
works fine:

No, that patch is just plain wrong. It eats %'s that would affect
the later recursive MatchText calls.

regards, tom lane

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#7)
Re: BUG #5478: ILIKE operator returns wrong result

I wrote:

No, that patch is just plain wrong. It eats %'s that would affect
the later recursive MatchText calls.

Hmm ... actually, it's not wrong, but it's not good either. What we
really ought to do here is not just eat _'s following %, but eat *any
mixture of* % and _, advancing over a text character per _. The
subsequent search loop is reached only when we find a literal pattern
character to match. This generalizes the original intention of not
using recursion to deal with simple advancing.

regards, tom lane

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#8)
Re: BUG #5478: ILIKE operator returns wrong result

BTW, while I'm looking at this, I notice that there was an oversight in
the change that made us throw an error for \ at the end of the LIKE
pattern. We throw error in the first code chunk that deals with \
but we don't do so here:

if (plen < 2)
return LIKE_FALSE;
firstpat = CHAR(p[1]);

In some cases the problem is masked because we'll eventually apply the
normal \ processing, but I think there are other cases where we'll reach
a LIKE_ABORT condition and return false without ever throwing the error.
Seems like this should be fixed. But should we back-patch that fix into
8.4? We didn't backpatch the original change for fear of breaking
existing apps, and the same argument could probably be made this time.
Should I change it in 8.4, or only 9.0?

regards, tom lane

#10Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#9)
Re: BUG #5478: ILIKE operator returns wrong result

Tom Lane wrote:

BTW, while I'm looking at this, I notice that there was an oversight in
the change that made us throw an error for \ at the end of the LIKE
pattern. We throw error in the first code chunk that deals with \
but we don't do so here:

if (plen < 2)
return LIKE_FALSE;
firstpat = CHAR(p[1]);

In some cases the problem is masked because we'll eventually apply the
normal \ processing, but I think there are other cases where we'll reach
a LIKE_ABORT condition and return false without ever throwing the error.
Seems like this should be fixed. But should we back-patch that fix into
8.4? We didn't backpatch the original change for fear of breaking
existing apps, and the same argument could probably be made this time.
Should I change it in 8.4, or only 9.0?

Tom has patch this and the fix will appear in the next minor release of
Postgres 8.3.X and 8.4.X.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com