contrib/levenshtein() has a bug?

Started by Benover 19 years ago5 messagesgeneral
Jump to latest
#1Ben
bench@silentmedia.com

The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:

mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)

mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255

That seems odd. What's odder is:

mbrainz_db=> select levenshtein(substring(name for 100),'foo') from public.track;
ERROR: argument exceeds max length: 255

Any suggestions? I'm using the Fedora 5 rpms, so it looks like that puts
me at 8.1.4.

#2Martijn van Oosterhout
kleptog@svana.org
In reply to: Ben (#1)
Re: contrib/levenshtein() has a bug?

On Thu, Sep 28, 2006 at 12:02:34PM -0700, Ben wrote:

The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:

<snip>

mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255

The message is slightly wrong, the max length is actually one more. You
can adjust the maximum length by changing the params in
fuzzystrmatch.h and recompiling.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/

Show quoted text

From each according to his ability. To each according to his ability to litigate.

#3Tom Lane
tgl@sss.pgh.pa.us
In reply to: Ben (#1)
Re: contrib/levenshtein() has a bug?

Ben <bench@silentmedia.com> writes:

The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:

mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)

mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255

That seems odd.

length() measures in characters whereas the limit in question is being
enforced in bytes. You got any multibyte characters in there?

(It looks to me like levenshtein() is utterly non-multibyte-aware,
which is probably a bug in itself.)

regards, tom lane

#4Ben
bench@silentmedia.com
In reply to: Tom Lane (#3)
Re: contrib/levenshtein() has a bug?

Ah, yes, you are correct.

Hm, it's too bad levenshtein() is ascii-only.

On Thu, 28 Sep 2006, Tom Lane wrote:

Show quoted text

Ben <bench@silentmedia.com> writes:

The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:

mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)

mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255

That seems odd.

length() measures in characters whereas the limit in question is being
enforced in bytes. You got any multibyte characters in there?

(It looks to me like levenshtein() is utterly non-multibyte-aware,
which is probably a bug in itself.)

regards, tom lane

#5Bruce Momjian
bruce@momjian.us
In reply to: Tom Lane (#3)
Re: [GENERAL] contrib/levenshtein() has a bug?

Tom Lane wrote:

Ben <bench@silentmedia.com> writes:

The levenshtein function from contrib/fuzzystrmatch.sql has a max arg
length of 255. OK, that's cool. But check this out:

mbrainz_db=> select max(length(name)) from public.track;
max
-----
255
(1 row)

mbrainz_db=> select levenshtein(name,'foo') from public.track;
ERROR: argument exceeds max length: 255

That seems odd.

length() measures in characters whereas the limit in question is being
enforced in bytes. You got any multibyte characters in there?

I have updated the error message to mention bytes, attached.

(It looks to me like levenshtein() is utterly non-multibyte-aware,
which is probably a bug in itself.)

Is this a TODO?

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachments:

/rtmp/difftext/x-diffDownload+6-6