similarity() result for two trigram-less strings

Started by Tom Lanealmost 13 years ago2 messages
#1Tom Lane
tgl@sss.pgh.pa.us

Some further thought about bug #7867 suggested that what's probably
happening is the submitter's installation doesn't think that any of the
Cyrillic letters are letters, so that no trigrams are identified in
either string. Whereupon you get a 0/0 result from cnt_sml:

regression=# select similarity('', '');
similarity
------------
NaN
(1 row)

What should we have it return instead? In this case "1" might seem like
the natural answer, but we could easily have very different strings that
don't contain any trigrams:

regression=# select similarity('---', '#######');
similarity
------------
NaN
(1 row)

Although I can see a case for returning 1, I'm inclined to think that
returning 0 is a better idea. Thoughts?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#2Josh Berkus
josh@agliodbs.com
In reply to: Tom Lane (#1)
Re: similarity() result for two trigram-less strings

Although I can see a case for returning 1, I'm inclined to think that
returning 0 is a better idea. Thoughts?

Intuitively, I'd expect 0.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers