[PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations

Started by Nitin Motiani11 days ago3 messageshackers
Jump to latest
#1Nitin Motiani
nitinmotiani@google.com

It was reported in [1]/messages/by-id/19474-5b86a95f3d9a7ecb@postgresql.org that LIKE on a non-deterministic collation
returns an incorrect result when the pattern contains a literal
backslash.

This was caused by escaping all backslashes in like_match.c (even when
the internal pattern contained '\\'). This behaviour has been present
since 85b7efa1cd which originally added support for LIKE with
non-deterministic collations.

This patch fixes the issue by always including the character after a
literal '\' in the final buffer. I didn't check for the end of the
string because that check is already handled in the block above when
checking for escape characters.

I also added a regression test for this issue and confirmed that it
passes with the fix.

Please take a look and let me know what you folks think.

[1]: /messages/by-id/19474-5b86a95f3d9a7ecb@postgresql.org

Thanks & Regards,
Nitin Motiani
Google

Attachments:

v1-0001-Fix-LIKE-matching-with-nondeterministic-collation.patchapplication/x-patch; name=v1-0001-Fix-LIKE-matching-with-nondeterministic-collation.patchDownload+12-4
#2Zsolt Parragi
zsolt.parragi@percona.com
In reply to: Nitin Motiani (#1)
Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations

Hello!

I verified that the patch works, but I have one concern:

I didn't check for the end of the
string because that check is already handled in the block above when
checking for escape characters.

The code isn't easy to reason about like this, it relies on specific details of the outer loop, which was only mentioned in the email itself.

This should be also explained in a comment and the commit message, or maybe instead of the current way, the loop could work similarly like how another loop uses an afterescape flag in do_like_escape (in the same file), that form seems less fragile.

#3Nitin Motiani
nitinmotiani@google.com
In reply to: Zsolt Parragi (#2)
Re: [PATCH] Fix for bug #19474: LIKE fails to match literal backslashes with nondeterministic collations

Thanks Zsolt for the review.

The code isn't easy to reason about like this, it relies on specific
details of the outer loop, which was only mentioned in the email
itself.

This should be also explained in a comment and the commit message, or
maybe instead of the current way, the loop could work similarly like
how another loop uses an afterescape flag in do_like_escape (in the
same file), that form seems less fragile.

I have changed the code to use an 'afterescape' flag like in
'do_like_escape'. I also realized that 'do_like_escape' uses NextChar
to handle multibyte encodings. So I changed the byte by byte copy to
use NextChar and then copy the whole character. I think byte-by-byte
copying should be enough for most cases, but if an encoding has '\' as
second or third byte, that might not work.

This copying can also be done with CopyAdvChar, as 'do_like_escape'
does, but that macro is not defined for all cases. So for the time
being, I just used NextChar and copied the character myself. We can
also define CopyAdvChar and ust it here for the code to be consistent
across functions.

Let me know your thoughts on the above approaches.

Regards,
Nitin Motiani
Google

Attachments:

v2-0001-Fix-LIKE-matching-with-nondeterministic-collation.patchapplication/x-patch; name=v2-0001-Fix-LIKE-matching-with-nondeterministic-collation.patchDownload+62-7