Lower or Upper case for F.33. pg_trgm
The following documentation comment has been logged on the website:
Page: https://www.postgresql.org/docs/14/pgtrgm.html
Description:
Hey guys,
I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:
Do you distinguish between lower and uppercase? Or do you consider all words
in lowercase?
Happy to get a short feedback from you,
Greetings, Marc
On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:
Maybe we should add something about this?
Do you distinguish between lower and uppercase? Or do you consider all words
in lowercase?
There is support for compiling pg_trgm case sensitive, but it's by default case
insensitive.
# SELECT word_similarity('word', 'WORD');
word_similarity
-----------------
1
(1 row)
Happy to get a short feedback from you,
I would recommend the pg_general mailinglist as that will be a safer way to get
general questions answered.
--
Daniel Gustafsson https://vmware.com/
Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:Maybe we should add something about this?
Yeah, it's a bit strange that none of the following strings yield any
info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there
is no mention of the ~ versus ~* difference.
Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitive
In any case a link to functions-matching.html seems indicated.
Erik Rijkers
Show quoted text
Do you distinguish between lower and uppercase? Or do you consider all words
in lowercase?There is support for compiling pg_trgm case sensitive, but it's by default case
insensitive.# SELECT word_similarity('word', 'WORD');
word_similarity
-----------------
1
(1 row)Happy to get a short feedback from you,
I would recommend the pg_general mailinglist as that will be a safer way to get
general questions answered.--
Daniel Gustafsson https://vmware.com/
On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:
Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:Maybe we should add something about this?
Yeah, it's a bit strange that none of the following strings yield any info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there is no mention of the ~ versus ~* difference.
Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitiveIn any case a link to functions-matching.html seems indicated.
Yeah, I think there is room for improvements here. Are you up for drafting a
patch for this?
--
Daniel Gustafsson https://vmware.com/
Thanks for your fast response.
Is this a question for me? I am fine with a short hint regarding the
default.
A link to another documentation is also fine.
Am Di., 16. Aug. 2022 um 13:46 Uhr schrieb Daniel Gustafsson <
daniel@yesql.se>:
Show quoted text
On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:
Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org>
wrote:
I have a question regarding the trigram algorithm and I can not find
any
information about it in your documentation:
Maybe we should add something about this?
Yeah, it's a bit strange that none of the following strings yield any
info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there is
no mention of the ~ versus ~* difference.Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitiveIn any case a link to functions-matching.html seems indicated.
Yeah, I think there is room for improvements here. Are you up for
drafting a
patch for this?--
Daniel Gustafsson https://vmware.com/
Op 16-08-2022 om 13:46 schreef Daniel Gustafsson:
On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:
Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:
On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:Maybe we should add something about this?
Yeah, it's a bit strange that none of the following strings yield any info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there is no mention of the ~ versus ~* difference.
Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitiveIn any case a link to functions-matching.html seems indicated.
Yeah, I think there is room for improvements here. Are you up for drafting a
patch for this?
How is this?
(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)
Erik
Show quoted text
--
Daniel Gustafsson https://vmware.com/
Attachments:
pgtrgm.sgml.20220816.difftext/x-patch; charset=UTF-8; name=pgtrgm.sgml.20220816.diffDownload+4-1
Erik Rijkers <er@xs4all.nl> writes:
(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)
Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.
regards, tom lane
Sounds good to me.
Am Di., 16. Aug. 2022 um 15:53 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:
Show quoted text
Erik Rijkers <er@xs4all.nl> writes:
(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.regards, tom lane
On 16 Aug 2022, at 15:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Erik Rijkers <er@xs4all.nl> writes:
(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.
Looking at this I'm leaning towards paring down the diff posted upthread with
pretty much this, I think that will provide value while avoid causing
confusion.
As a related side note, there are four instances of "case insensitive{ly}" in
the docs with all other instances using "case-insensitive{ly}". I'm inclined
to fix those four to use a dash while at it to be consistent across all pages.
--
Daniel Gustafsson https://vmware.com/
Attachments:
pg_trgm_case.diffapplication/octet-stream; name=pg_trgm_case.diff; x-unix-mode=0644Download+2-0
Daniel Gustafsson <daniel@yesql.se> writes:
Looking at this I'm leaning towards paring down the diff posted upthread with
pretty much this, I think that will provide value while avoid causing
confusion.
WFM.
As a related side note, there are four instances of "case insensitive{ly}" in
the docs with all other instances using "case-insensitive{ly}". I'm inclined
to fix those four to use a dash while at it to be consistent across all pages.
+1
regards, tom lane