BUG #5478: ILIKE operator returns wrong result
The following bug has been logged online:
Bug reference: 5478
Logged by: Markus
Email address: markus.herven@outpost24.com
PostgreSQL version: PostgreSQL 8.4.
Operating system: Ubuntu 10.04
Description: ILIKE operator returns wrong result
Details:
The following query
select 'ba' ilike '%__%';
return true as expected in 8.2 but false in 8.4.
Markus wrote:
The following bug has been logged online:
Bug reference: 5478
Logged by: Markus
Email address: markus.herven@outpost24.com
PostgreSQL version: PostgreSQL 8.4.
Operating system: Ubuntu 10.04
Description: ILIKE operator returns wrong result
Details:The following query
select 'ba' ilike '%__%';
return true as expected in 8.2 but false in 8.4.
I can confirm the odd behavior in current CVS:
test=> select 'ba' ilike '%__%';
?column?
----------
f
(1 row)
test=> select 'ba' like '__';
?column?
----------
t
(1 row)
test=> select 'ba' like '__%';
?column?
----------
t
(1 row)
test=> select 'ba' like '%_%';
?column?
----------
t
(1 row)
It seems to be the leading '%' it does not like. Our docs clearly state
your syntax is recommended:
LIKE pattern matching always covers the entire string. Therefore, to
match a sequence anywhere within a string, the pattern must start and
end with a percent sign.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
"Markus" <markus.herven@outpost24.com> writes:
select 'ba' ilike '%__%';
return true as expected in 8.2 but false in 8.4.
I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?
regards, tom lane
Tom Lane wrote:
"Markus" <markus.herven@outpost24.com> writes:
select 'ba' ilike '%__%';
return true as expected in 8.2 but false in 8.4.I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?
Yea, looks like it is this code in like_match.c:
/* %_ is the same as _% - avoid matching _ repeatedly */
do
{
NextChar(t, tlen);
NextByte(p, plen);
} while (tlen > 0 && plen > 0 && *p == '_');
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?
Yea, looks like it is this code in like_match.c:
No, actually it's the bit right after that:
/* Look for a place that matches the rest of the pattern */
while (tlen > 0)
{
int matched = MatchText(t, tlen, p, plen);
if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */
NextChar(t, tlen);
}
If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring. So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.
regards, tom lane
Tom Lane wrote:
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
I have a feeling that this represents still another bug in the
special-case path for % followed by _ (cf bug #4821). If so,
maybe we ought to just toss out that optimization?Yea, looks like it is this code in like_match.c:
No, actually it's the bit right after that:
/* Look for a place that matches the rest of the pattern */
while (tlen > 0)
{
int matched = MatchText(t, tlen, p, plen);if (matched != LIKE_FALSE)
return matched; /* TRUE or ABORT */NextChar(t, tlen);
}If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring. So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.
I took a different approach. I think the problem is that we check for
end of pattern without consuming '%' patterns. I copied that consume
loop from code above that where we also test for end of pattern.
With the attached patch (which includes a regression test addition), it
works fine:
test=> select 'ba' like '%__%';
?column?
----------
t
(1 row)
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
Attachments:
/pgpatches/liketext/plainDownload+9-1
Bruce Momjian <bruce@momjian.us> writes:
Tom Lane wrote:
If tlen == 0 when we reach this loop, we'll fall through and fail.
But that is wrong since we need to consider the possibility that
the remaining pattern can match a zero-length substring. So the
loop needs to be changed to attempt a recursive MatchText for
tlen equal to zero as well as greater than zero.
I took a different approach. I think the problem is that we check for
end of pattern without consuming '%' patterns. I copied that consume
loop from code above that where we also test for end of pattern.
With the attached patch (which includes a regression test addition), it
works fine:
No, that patch is just plain wrong. It eats %'s that would affect
the later recursive MatchText calls.
regards, tom lane
I wrote:
No, that patch is just plain wrong. It eats %'s that would affect
the later recursive MatchText calls.
Hmm ... actually, it's not wrong, but it's not good either. What we
really ought to do here is not just eat _'s following %, but eat *any
mixture of* % and _, advancing over a text character per _. The
subsequent search loop is reached only when we find a literal pattern
character to match. This generalizes the original intention of not
using recursion to deal with simple advancing.
regards, tom lane
BTW, while I'm looking at this, I notice that there was an oversight in
the change that made us throw an error for \ at the end of the LIKE
pattern. We throw error in the first code chunk that deals with \
but we don't do so here:
if (plen < 2)
return LIKE_FALSE;
firstpat = CHAR(p[1]);
In some cases the problem is masked because we'll eventually apply the
normal \ processing, but I think there are other cases where we'll reach
a LIKE_ABORT condition and return false without ever throwing the error.
Seems like this should be fixed. But should we back-patch that fix into
8.4? We didn't backpatch the original change for fear of breaking
existing apps, and the same argument could probably be made this time.
Should I change it in 8.4, or only 9.0?
regards, tom lane
Tom Lane wrote:
BTW, while I'm looking at this, I notice that there was an oversight in
the change that made us throw an error for \ at the end of the LIKE
pattern. We throw error in the first code chunk that deals with \
but we don't do so here:if (plen < 2)
return LIKE_FALSE;
firstpat = CHAR(p[1]);In some cases the problem is masked because we'll eventually apply the
normal \ processing, but I think there are other cases where we'll reach
a LIKE_ABORT condition and return false without ever throwing the error.
Seems like this should be fixed. But should we back-patch that fix into
8.4? We didn't backpatch the original change for fear of breaking
existing apps, and the same argument could probably be made this time.
Should I change it in 8.4, or only 9.0?
Tom has patch this and the fix will appear in the next minor release of
Postgres 8.3.X and 8.4.X.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com