Lower or Upper case for F.33. pg_trgm

Started by PG Bug reporting formover 3 years ago10 messagesdocs
Jump to latest
#1PG Bug reporting form
noreply@postgresql.org

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/14/pgtrgm.html
Description:

Hey guys,

I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:

Do you distinguish between lower and uppercase? Or do you consider all words
in lowercase?

Happy to get a short feedback from you,

Greetings, Marc

#2Daniel Gustafsson
daniel@yesql.se
In reply to: PG Bug reporting form (#1)
Re: Lower or Upper case for F.33. pg_trgm

On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:

I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:

Maybe we should add something about this?

Do you distinguish between lower and uppercase? Or do you consider all words
in lowercase?

There is support for compiling pg_trgm case sensitive, but it's by default case
insensitive.

# SELECT word_similarity('word', 'WORD');
word_similarity
-----------------
1
(1 row)

Happy to get a short feedback from you,

I would recommend the pg_general mailinglist as that will be a safer way to get
general questions answered.

--
Daniel Gustafsson https://vmware.com/

#3Erik Rijkers
er@xs4all.nl
In reply to: Daniel Gustafsson (#2)
Re: Lower or Upper case for F.33. pg_trgm

Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:

On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:

I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:

Maybe we should add something about this?

Yeah, it's a bit strange that none of the following strings yield any
info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there
is no mention of the ~ versus ~* difference.

Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitive

In any case a link to functions-matching.html seems indicated.

Erik Rijkers

Show quoted text

Do you distinguish between lower and uppercase? Or do you consider all words
in lowercase?

There is support for compiling pg_trgm case sensitive, but it's by default case
insensitive.

# SELECT word_similarity('word', 'WORD');
word_similarity
-----------------
1
(1 row)

Happy to get a short feedback from you,

I would recommend the pg_general mailinglist as that will be a safer way to get
general questions answered.

--
Daniel Gustafsson https://vmware.com/

#4Daniel Gustafsson
daniel@yesql.se
In reply to: Erik Rijkers (#3)
Re: Lower or Upper case for F.33. pg_trgm

On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:

Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:

On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:

Maybe we should add something about this?

Yeah, it's a bit strange that none of the following strings yield any info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there is no mention of the ~ versus ~* difference.

Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitive

In any case a link to functions-matching.html seems indicated.

Yeah, I think there is room for improvements here. Are you up for drafting a
patch for this?

--
Daniel Gustafsson https://vmware.com/

#5Marc M.
marcmaiwald@googlemail.com
In reply to: Daniel Gustafsson (#4)
Re: Lower or Upper case for F.33. pg_trgm

Thanks for your fast response.

Is this a question for me? I am fine with a short hint regarding the
default.
A link to another documentation is also fine.

Am Di., 16. Aug. 2022 um 13:46 Uhr schrieb Daniel Gustafsson <
daniel@yesql.se>:

Show quoted text

On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:

Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:

On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org>

wrote:

I have a question regarding the trigram algorithm and I can not find

any

information about it in your documentation:

Maybe we should add something about this?

Yeah, it's a bit strange that none of the following strings yield any

info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there is
no mention of the ~ versus ~* difference.

Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitive

In any case a link to functions-matching.html seems indicated.

Yeah, I think there is room for improvements here. Are you up for
drafting a
patch for this?

--
Daniel Gustafsson https://vmware.com/

#6Erik Rijkers
er@xs4all.nl
In reply to: Daniel Gustafsson (#4)
Re: Lower or Upper case for F.33. pg_trgm

Op 16-08-2022 om 13:46 schreef Daniel Gustafsson:

On 16 Aug 2022, at 12:54, Erik Rijkers <er@xs4all.nl> wrote:

Op 16-08-2022 om 12:36 schreef Daniel Gustafsson:

On 16 Aug 2022, at 12:17, PG Doc comments form <noreply@postgresql.org> wrote:
I have a question regarding the trigram algorithm and I can not find any
information about it in your documentation:

Maybe we should add something about this?

Yeah, it's a bit strange that none of the following strings yield any info on that page: 'case', 'sensitiv', 'upper', 'lower', and that there is no mention of the ~ versus ~* difference.

Maybe worth to (already in pgtrgm.html) give the simple hint:
~ is case-sensitive
~* is case-insensitive

In any case a link to functions-matching.html seems indicated.

Yeah, I think there is room for improvements here. Are you up for drafting a
patch for this?

How is this?

(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)

Erik

Show quoted text

--
Daniel Gustafsson https://vmware.com/

Attachments:

pgtrgm.sgml.20220816.difftext/x-patch; charset=UTF-8; name=pgtrgm.sgml.20220816.diffDownload+4-1
#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Erik Rijkers (#6)
Re: Lower or Upper case for F.33. pg_trgm

Erik Rijkers <er@xs4all.nl> writes:

(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)

Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.

regards, tom lane

#8Marc M.
marcmaiwald@googlemail.com
In reply to: Tom Lane (#7)
Re: Lower or Upper case for F.33. pg_trgm

Sounds good to me.

Am Di., 16. Aug. 2022 um 15:53 Uhr schrieb Tom Lane <tgl@sss.pgh.pa.us>:

Show quoted text

Erik Rijkers <er@xs4all.nl> writes:

(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)

Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.

regards, tom lane

#9Daniel Gustafsson
daniel@yesql.se
In reply to: Tom Lane (#7)
Re: Lower or Upper case for F.33. pg_trgm

On 16 Aug 2022, at 15:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Erik Rijkers <er@xs4all.nl> writes:

(bluntly stating 'similarity comparisons are case-insensitive' -
although I'm not really sure..)

Perhaps like "similarity comparisons are case-insensitive in a
standard build of pg_trgm", if you want to nod to the existence
of a compile option without going into detail.

Looking at this I'm leaning towards paring down the diff posted upthread with
pretty much this, I think that will provide value while avoid causing
confusion.

As a related side note, there are four instances of "case insensitive{ly}" in
the docs with all other instances using "case-insensitive{ly}". I'm inclined
to fix those four to use a dash while at it to be consistent across all pages.

--
Daniel Gustafsson https://vmware.com/

Attachments:

pg_trgm_case.diffapplication/octet-stream; name=pg_trgm_case.diff; x-unix-mode=0644Download+2-0
#10Tom Lane
tgl@sss.pgh.pa.us
In reply to: Daniel Gustafsson (#9)
Re: Lower or Upper case for F.33. pg_trgm

Daniel Gustafsson <daniel@yesql.se> writes:

Looking at this I'm leaning towards paring down the diff posted upthread with
pretty much this, I think that will provide value while avoid causing
confusion.

WFM.

As a related side note, there are four instances of "case insensitive{ly}" in
the docs with all other instances using "case-insensitive{ly}". I'm inclined
to fix those four to use a dash while at it to be consistent across all pages.

+1

regards, tom lane