contrib/levenshtein() has a bug?
The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:
mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)
mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255
That seems odd. What's odder is:
mbrainz_db=> select levenshtein(substring(name for 100),'foo') from public.track;
ERROR: argument exceeds max length: 255
Any suggestions? I'm using the Fedora 5 rpms, so it looks like that puts
me at 8.1.4.
On Thu, Sep 28, 2006 at 12:02:34PM -0700, Ben wrote:
The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:
<snip>
mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255
The message is slightly wrong, the max length is actually one more. You
can adjust the maximum length by changing the params in
fuzzystrmatch.h and recompiling.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Ben <bench@silentmedia.com> writes:
The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:
mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)
mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255
That seems odd.
length() measures in characters whereas the limit in question is being
enforced in bytes. You got any multibyte characters in there?
(It looks to me like levenshtein() is utterly non-multibyte-aware,
which is probably a bug in itself.)
regards, tom lane
Ah, yes, you are correct.
Hm, it's too bad levenshtein() is ascii-only.
On Thu, 28 Sep 2006, Tom Lane wrote:
Show quoted text
Ben <bench@silentmedia.com> writes:
The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255That seems odd.
length() measures in characters whereas the limit in question is being
enforced in bytes. You got any multibyte characters in there?(It looks to me like levenshtein() is utterly non-multibyte-aware,
which is probably a bug in itself.)regards, tom lane
Tom Lane wrote:
Ben <bench@silentmedia.com> writes:
The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255That seems odd.
length() measures in characters whereas the limit in question is being
enforced in bytes. You got any multibyte characters in there?
I have updated the error message to mention bytes, attached.
(It looks to me like levenshtein() is utterly non-multibyte-aware,
which is probably a bug in itself.)
Is this a TODO?
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +