Supporting = operator in gin/gist_trgm_ops
Hello,
A french user recently complained that with an index created using
gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
like
col LIKE 'something'
but not
col = 'something'
even though both clauses are technically identical. That's clearly
not a high priority thing to support, but looking at the code it seems
to me that this could be achieved quite simply: just adding a new
operator = in the opclass, with an operator strategy number that falls
back doing exactly what LikeStrategyNumber is doing and that's it.
There shouldn't be any wrong results, even using wildcards as the
recheck will remove any incorrect one.
Did I miss something? And if not would such a patch be welcome?
Julien Rouhaud <rjuju123@gmail.com> writes:
A french user recently complained that with an index created using
gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
like
col LIKE 'something'
but not
col = 'something'
Huh, I'd supposed we did that already.
even though both clauses are technically identical. That's clearly
not a high priority thing to support, but looking at the code it seems
to me that this could be achieved quite simply: just adding a new
operator = in the opclass, with an operator strategy number that falls
back doing exactly what LikeStrategyNumber is doing and that's it.
There shouldn't be any wrong results, even using wildcards as the
recheck will remove any incorrect one.
I think you may be overoptimistic about being able to use the identical
code path without regard for LIKE wildcards; but certainly it should be
possible to do this with not a lot of new code. +1.
regards, tom lane
On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
A french user recently complained that with an index created using
gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
like
col LIKE 'something'
but not
col = 'something'Huh, I'd supposed we did that already.
even though both clauses are technically identical. That's clearly
not a high priority thing to support, but looking at the code it seems
to me that this could be achieved quite simply: just adding a new
operator = in the opclass, with an operator strategy number that falls
back doing exactly what LikeStrategyNumber is doing and that's it.
There shouldn't be any wrong results, even using wildcards as the
recheck will remove any incorrect one.I think you may be overoptimistic about being able to use the identical
code path without regard for LIKE wildcards; but certainly it should be
possible to do this with not a lot of new code. +1.
Well, that's what I was thinking too, but I tried all the possible
wildcard combinations I could think of and I couldn't find any case
yielding wrong results. As far as I can see the index scans return at
least all the required rows, and all extraneous rows are correctly
removed either by heap recheck or index recheck.
I'm attaching a patch POC pach with regression tests covering those
combinations. I also found a typo in the 1.4--1.5 pg_trgm upgrade
script, so I'm also attaching a patch for that.
Attachments:
v1-0001-Fix-typo-in-1.4-1.5-pg_trm-upgrade-script.patchtext/x-patch; charset=US-ASCII; name=v1-0001-Fix-typo-in-1.4-1.5-pg_trm-upgrade-script.patchDownload
From 3539eb386b5f15dc3c454cef4fee210d014bd91e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 26 Oct 2020 11:29:45 +0800
Subject: [PATCH v1 1/2] Fix typo in 1.4--1.5 pg_trm upgrade script
---
contrib/pg_trgm/pg_trgm--1.4--1.5.sql | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
index 284f88d325..db122fce0f 100644
--- a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
+++ b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
@@ -1,4 +1,4 @@
-/* contrib/pg_trgm/pg_trgm--1.5--1.5.sql */
+/* contrib/pg_trgm/pg_trgm--1.4--1.5.sql */
-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.5'" to load this file. \quit
--
2.28.0
v1-0002-Handle-operator-in-pg_trgm.patchtext/x-patch; charset=US-ASCII; name=v1-0002-Handle-operator-in-pg_trgm.patchDownload
From cbcd983a350208dc7b9b583d2d5b1476d7acd1c2 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 26 Oct 2020 11:28:36 +0800
Subject: [PATCH v1 2/2] Handle = operator in pg_trgm.
---
contrib/pg_trgm/Makefile | 2 +-
contrib/pg_trgm/expected/pg_trgm.out | 204 ++++++++++++++++++++++++++-
contrib/pg_trgm/pg_trgm.control | 2 +-
contrib/pg_trgm/sql/pg_trgm.sql | 40 ++++++
contrib/pg_trgm/trgm.h | 1 +
contrib/pg_trgm/trgm_gin.c | 3 +
contrib/pg_trgm/trgm_gist.c | 2 +
doc/src/sgml/pgtrgm.sgml | 7 +-
8 files changed, 252 insertions(+), 9 deletions(-)
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index d75e9ada2e..1fbdc9ec1e 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -9,7 +9,7 @@ OBJS = \
trgm_regexp.o
EXTENSION = pg_trgm
-DATA = pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
+DATA = pg_trgm--1.5--1.6.sql pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql
PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 923c326c7b..20141ce7f3 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -4761,6 +4761,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -4863,7 +4869,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -4918,7 +4930,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -4961,6 +4977,93 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = 'abcdef'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = 'abcdef'::text)
+(4 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = '%line%'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = '%line%'::text)
+(4 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
@@ -5056,7 +5159,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -5111,7 +5220,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -5154,6 +5267,89 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = 'abcdef'::text)
+(2 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = '%line%'::text)
+(2 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
-- Check similarity threshold (bug #14202)
CREATE TEMP TABLE restaurants (city text);
INSERT INTO restaurants SELECT 'Warsaw' FROM generate_series(1, 10000);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index ed4487e96b..1d6a9ddf25 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,6 +1,6 @@
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.5'
+default_version = '1.6'
module_pathname = '$libdir/pg_trgm'
relocatable = true
trusted = true
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index bc2a6d525c..6a9da24d5a 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -101,6 +101,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -137,6 +143,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
@@ -175,6 +198,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
-- Check similarity threshold (bug #14202)
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index b616953462..405a1d9552 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -37,6 +37,7 @@
#define WordDistanceStrategyNumber 8
#define StrictWordSimilarityStrategyNumber 9
#define StrictWordDistanceStrategyNumber 10
+#define EqualStrategyNumber 11
typedef char trgm[3];
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index 4dbf0ffb68..fc0179b5ca 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -97,6 +97,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/*
* For wildcard search we extract all the trigrams that every
@@ -221,6 +222,7 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = true;
for (i = 0; i < nkeys; i++)
@@ -306,6 +308,7 @@ gin_trgm_triconsistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = GIN_MAYBE;
for (i = 0; i < nkeys; i++)
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index 9937ef9253..70e8a8ebcf 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -241,6 +241,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
qtrg = generate_wildcard_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
@@ -338,6 +339,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Wildcard search is inexact */
*recheck = true;
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 5365b0681e..3ce865b076 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -419,9 +419,10 @@
the purpose of very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>,
- <literal>~</literal> and <literal>~*</literal> queries. (These indexes do not
- support equality nor simple comparison operators, so you may need a
- regular B-tree index too.)
+ <literal>~</literal> and <literal>~*</literal> queries. Beginning in
+ <productname>PostgreSQL</productname> 14, these indexes also support
+ equality and simple comparison operators, so extraneous regular B-tree
+ indexes are not needed anymore.
</para>
<para>
--
2.28.0
On Mon, Oct 26, 2020 at 12:02 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
A french user recently complained that with an index created using
gin_trgm_ops (or gist_trgm_ops), you can use the index with a clause
like
col LIKE 'something'
but not
col = 'something'Huh, I'd supposed we did that already.
even though both clauses are technically identical. That's clearly
not a high priority thing to support, but looking at the code it seems
to me that this could be achieved quite simply: just adding a new
operator = in the opclass, with an operator strategy number that falls
back doing exactly what LikeStrategyNumber is doing and that's it.
There shouldn't be any wrong results, even using wildcards as the
recheck will remove any incorrect one.I think you may be overoptimistic about being able to use the identical
code path without regard for LIKE wildcards; but certainly it should be
possible to do this with not a lot of new code. +1.Well, that's what I was thinking too, but I tried all the possible
wildcard combinations I could think of and I couldn't find any case
yielding wrong results. As far as I can see the index scans return at
least all the required rows, and all extraneous rows are correctly
removed either by heap recheck or index recheck.I'm attaching a patch POC pach with regression tests covering those
combinations. I also found a typo in the 1.4--1.5 pg_trgm upgrade
script, so I'm also attaching a patch for that.
Oops, I forgot to git-add the 1.5--1.6.sql upgrade script in the previous patch.
Attachments:
v2-0001-Fix-typo-in-1.4-1.5-pg_trm-upgrade-script.patchtext/x-patch; charset=US-ASCII; name=v2-0001-Fix-typo-in-1.4-1.5-pg_trm-upgrade-script.patchDownload
From 3539eb386b5f15dc3c454cef4fee210d014bd91e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 26 Oct 2020 11:29:45 +0800
Subject: [PATCH v2 1/2] Fix typo in 1.4--1.5 pg_trm upgrade script
---
contrib/pg_trgm/pg_trgm--1.4--1.5.sql | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
index 284f88d325..db122fce0f 100644
--- a/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
+++ b/contrib/pg_trgm/pg_trgm--1.4--1.5.sql
@@ -1,4 +1,4 @@
-/* contrib/pg_trgm/pg_trgm--1.5--1.5.sql */
+/* contrib/pg_trgm/pg_trgm--1.4--1.5.sql */
-- complain if script is sourced in psql, rather than via ALTER EXTENSION
\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.5'" to load this file. \quit
--
2.28.0
v2-0002-Handle-operator-in-pg_trgm.patchtext/x-patch; charset=US-ASCII; name=v2-0002-Handle-operator-in-pg_trgm.patchDownload
From 38c6727c16814640fe9e3be0bc019a67642f7c56 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 26 Oct 2020 11:28:36 +0800
Subject: [PATCH v2 2/2] Handle = operator in pg_trgm.
---
contrib/pg_trgm/Makefile | 2 +-
contrib/pg_trgm/expected/pg_trgm.out | 204 +++++++++++++++++++++++++-
contrib/pg_trgm/pg_trgm--1.5--1.6.sql | 11 ++
contrib/pg_trgm/pg_trgm.control | 2 +-
contrib/pg_trgm/sql/pg_trgm.sql | 40 +++++
contrib/pg_trgm/trgm.h | 1 +
contrib/pg_trgm/trgm_gin.c | 3 +
contrib/pg_trgm/trgm_gist.c | 2 +
doc/src/sgml/pgtrgm.sgml | 7 +-
9 files changed, 263 insertions(+), 9 deletions(-)
create mode 100644 contrib/pg_trgm/pg_trgm--1.5--1.6.sql
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index d75e9ada2e..1fbdc9ec1e 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -9,7 +9,7 @@ OBJS = \
trgm_regexp.o
EXTENSION = pg_trgm
-DATA = pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
+DATA = pg_trgm--1.5--1.6.sql pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql
PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 923c326c7b..20141ce7f3 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -4761,6 +4761,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -4863,7 +4869,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -4918,7 +4930,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -4961,6 +4977,93 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = 'abcdef'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = 'abcdef'::text)
+(4 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = '%line%'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = '%line%'::text)
+(4 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
@@ -5056,7 +5159,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -5111,7 +5220,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -5154,6 +5267,89 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = 'abcdef'::text)
+(2 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = '%line%'::text)
+(2 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
-- Check similarity threshold (bug #14202)
CREATE TEMP TABLE restaurants (city text);
INSERT INTO restaurants SELECT 'Warsaw' FROM generate_series(1, 10000);
diff --git a/contrib/pg_trgm/pg_trgm--1.5--1.6.sql b/contrib/pg_trgm/pg_trgm--1.5--1.6.sql
new file mode 100644
index 0000000000..1e6b571a4a
--- /dev/null
+++ b/contrib/pg_trgm/pg_trgm--1.5--1.6.sql
@@ -0,0 +1,11 @@
+/* contrib/pg_trgm/pg_trgm--1.5--1.6.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.6'" to load this file. \quit
+
+
+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+ OPERATOR 11 pg_catalog.= (text, text);
+
+ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
+ OPERATOR 11 pg_catalog.= (text, text);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index ed4487e96b..1d6a9ddf25 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,6 +1,6 @@
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.5'
+default_version = '1.6'
module_pathname = '$libdir/pg_trgm'
relocatable = true
trusted = true
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index bc2a6d525c..6a9da24d5a 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -101,6 +101,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -137,6 +143,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
@@ -175,6 +198,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
-- Check similarity threshold (bug #14202)
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index b616953462..405a1d9552 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -37,6 +37,7 @@
#define WordDistanceStrategyNumber 8
#define StrictWordSimilarityStrategyNumber 9
#define StrictWordDistanceStrategyNumber 10
+#define EqualStrategyNumber 11
typedef char trgm[3];
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index 4dbf0ffb68..fc0179b5ca 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -97,6 +97,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/*
* For wildcard search we extract all the trigrams that every
@@ -221,6 +222,7 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = true;
for (i = 0; i < nkeys; i++)
@@ -306,6 +308,7 @@ gin_trgm_triconsistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = GIN_MAYBE;
for (i = 0; i < nkeys; i++)
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index 9937ef9253..70e8a8ebcf 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -241,6 +241,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
qtrg = generate_wildcard_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
@@ -338,6 +339,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Wildcard search is inexact */
*recheck = true;
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 5365b0681e..3ce865b076 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -419,9 +419,10 @@
the purpose of very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>,
- <literal>~</literal> and <literal>~*</literal> queries. (These indexes do not
- support equality nor simple comparison operators, so you may need a
- regular B-tree index too.)
+ <literal>~</literal> and <literal>~*</literal> queries. Beginning in
+ <productname>PostgreSQL</productname> 14, these indexes also support
+ equality and simple comparison operators, so extraneous regular B-tree
+ indexes are not needed anymore.
</para>
<para>
--
2.28.0
Julien Rouhaud <rjuju123@gmail.com> writes:
On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think you may be overoptimistic about being able to use the identical
code path without regard for LIKE wildcards; but certainly it should be
possible to do this with not a lot of new code. +1.
Well, that's what I was thinking too, but I tried all the possible
wildcard combinations I could think of and I couldn't find any case
yielding wrong results. As far as I can see the index scans return at
least all the required rows, and all extraneous rows are correctly
removed either by heap recheck or index recheck.
But "does it get the right answers" isn't the only figure of merit.
If the index scan visits far more rows than necessary, that's bad.
Maybe it's OK given that we only make trigrams from alphanumerics,
but I'm not quite sure.
regards, tom lane
On Mon, Oct 26, 2020 at 12:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Mon, Oct 26, 2020 at 5:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I think you may be overoptimistic about being able to use the identical
code path without regard for LIKE wildcards; but certainly it should be
possible to do this with not a lot of new code. +1.Well, that's what I was thinking too, but I tried all the possible
wildcard combinations I could think of and I couldn't find any case
yielding wrong results. As far as I can see the index scans return at
least all the required rows, and all extraneous rows are correctly
removed either by heap recheck or index recheck.But "does it get the right answers" isn't the only figure of merit.
If the index scan visits far more rows than necessary, that's bad.
Maybe it's OK given that we only make trigrams from alphanumerics,
but I'm not quite sure.
Ah, yes this might lead to bad performance if the "fake wildcard"
matches too many rows, but this shouldn't be a very common use case,
and the only alternative for that might be to create trigrams for non
alphanumerics characters. I didn't try to do that because it would
mean meaningful overhead for mainstream usage of pg_trgm, and would
also mean on-disk format break. In my opinion supporting = should be
a best effort, especially for such corner cases.
On Mon, Oct 26, 2020 at 7:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
Ah, yes this might lead to bad performance if the "fake wildcard"
matches too many rows, but this shouldn't be a very common use case,
and the only alternative for that might be to create trigrams for non
alphanumerics characters. I didn't try to do that because it would
mean meaningful overhead for mainstream usage of pg_trgm, and would
also mean on-disk format break. In my opinion supporting = should be
a best effort, especially for such corner cases.
It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm(). It some
cases it would generate more trigrams. For instance generate_trgm()
would generate '__a', '_ab', 'ab_' for '%ab%' while
generate_wildcard_trgm() would generate nothing.
Also I wonder how our costing would work if there are multiple indices
of the same column. We should clearly prefer btree than pg_trgm
gist/gin, and I believe our costing provides this. But we also should
prefer btree_gist/btree_gin than pg_trgm gist/gin, and I'm not sure
our costing provides this especially for gist.
------
Regards,
Alexander Korotkov
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
Hi,
this patch implements a useful and missing feature. Thank you.
It includes documentation, which to a non-native speaker as myself seems appropriate.
It includes comprehensive tests that cover the implemented cases.
In the thread Alexander has pointed out, quote:
"It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm()"
I will echo the sentiment, though from a slightly different and possibly not
as important point of view. The method used to extract trigrams from the query
should match the method used to extract trigrams from the values when they
get added to the index. This is gin_extract_value_trgm() and is indeed using
generate_trgm().
I have no opinion over Alexander's second comment regarding costing.
I change the status to 'Waiting on Author', but please feel free to override
my opinion if you feel I am wrong and reset it to 'Needs review'.
Cheers,
//Georgios
The new status of this patch is: Waiting on Author
On Wed, Nov 11, 2020 at 8:34 PM Georgios Kokolatos
<gkokolatos@protonmail.com> wrote:
The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not testedHi,
this patch implements a useful and missing feature. Thank you.
It includes documentation, which to a non-native speaker as myself seems appropriate.
It includes comprehensive tests that cover the implemented cases.In the thread Alexander has pointed out, quote:
"It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm()"I will echo the sentiment, though from a slightly different and possibly not
as important point of view. The method used to extract trigrams from the query
should match the method used to extract trigrams from the values when they
get added to the index. This is gin_extract_value_trgm() and is indeed using
generate_trgm().I have no opinion over Alexander's second comment regarding costing.
I change the status to 'Waiting on Author', but please feel free to override
my opinion if you feel I am wrong and reset it to 'Needs review'.
Thanks for the reminder Georgios! Thanks a lot Alexander for the review!
Indeed, I should have used generate_trgm() rather than
generate_wildcard_trgm(). IIUC, the rest of the code should still be
doing the same as [I]LikeStrategyNumber. I attach a v3 with that
modification.
For the costing, I tried this naive dataset:
CREATE TABLE t1 AS select md5(random()::text) AS val from
generate_series(1, 100000);
CREATE INDEX t1_btree ON t1 (val);
CREATE INDEX t1_gist ON t1 USING gist (val gist_trgm_ops);
Cost are like this (all default configuration, using any random existing entry):
# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Index Only Scan using t1_btree on t1 (cost=0.42..4.44 rows=1
width=33) (actual time=0.192..0.194 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Heap Fetches: 0
Planning Time: 0.133 ms
Execution Time: 0.222 ms
(5 rows)
# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Index Scan using t1_gist on t1 (cost=0.28..8.30 rows=1 width=33)
(actual time=0.542..2.359 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Planning Time: 0.189 ms
Execution Time: 2.382 ms
(4 rows)
# EXPLAIN ANALYZE SELECT * FROM t1 where val =
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=400.01..404.02 rows=1 width=33) (actual
time=2.486..2.488 rows=1 loops=1)
Recheck Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on t1_gin (cost=0.00..400.01 rows=1 width=0)
(actual time=2.474..2.474 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Planning Time: 0.206 ms
Execution Time: 2.611 ms
So assuming that this dataset is representative enough, costing indeed
prefers a btree index over a gist/gin index, which should avoid
regression with this change.
Gin is however quite off, likely because it's a bitmap index scan
rather than an index scan, so gist is preferred in this scenario.
That's not ideal, but I'm not sure that there are many people having
both gin_trgm_ops and gist_trgm_ops.
Attachments:
v3-0001-Handle-operator-in-pg_trgm.patchapplication/octet-stream; name=v3-0001-Handle-operator-in-pg_trgm.patchDownload
From 7f1657a33ae2da12126617c44699f6eab2c2b90e Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Mon, 26 Oct 2020 11:28:36 +0800
Subject: [PATCH v3] Handle = operator in pg_trgm.
---
contrib/pg_trgm/Makefile | 2 +-
contrib/pg_trgm/expected/pg_trgm.out | 204 +++++++++++++++++++++++++-
contrib/pg_trgm/pg_trgm--1.5--1.6.sql | 11 ++
contrib/pg_trgm/pg_trgm.control | 2 +-
contrib/pg_trgm/sql/pg_trgm.sql | 40 +++++
contrib/pg_trgm/trgm.h | 1 +
contrib/pg_trgm/trgm_gin.c | 3 +
contrib/pg_trgm/trgm_gist.c | 2 +
doc/src/sgml/pgtrgm.sgml | 7 +-
9 files changed, 263 insertions(+), 9 deletions(-)
create mode 100644 contrib/pg_trgm/pg_trgm--1.5--1.6.sql
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index d75e9ada2e..1fbdc9ec1e 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -9,7 +9,7 @@ OBJS = \
trgm_regexp.o
EXTENSION = pg_trgm
-DATA = pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
+DATA = pg_trgm--1.5--1.6.sql pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql
PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 923c326c7b..20141ce7f3 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -4761,6 +4761,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -4863,7 +4869,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -4918,7 +4930,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -4961,6 +4977,93 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = 'abcdef'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = 'abcdef'::text)
+(4 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = '%line%'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = '%line%'::text)
+(4 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
@@ -5056,7 +5159,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -5111,7 +5220,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -5154,6 +5267,89 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = 'abcdef'::text)
+(2 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = '%line%'::text)
+(2 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
-- Check similarity threshold (bug #14202)
CREATE TEMP TABLE restaurants (city text);
INSERT INTO restaurants SELECT 'Warsaw' FROM generate_series(1, 10000);
diff --git a/contrib/pg_trgm/pg_trgm--1.5--1.6.sql b/contrib/pg_trgm/pg_trgm--1.5--1.6.sql
new file mode 100644
index 0000000000..1e6b571a4a
--- /dev/null
+++ b/contrib/pg_trgm/pg_trgm--1.5--1.6.sql
@@ -0,0 +1,11 @@
+/* contrib/pg_trgm/pg_trgm--1.5--1.6.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.6'" to load this file. \quit
+
+
+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+ OPERATOR 11 pg_catalog.= (text, text);
+
+ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
+ OPERATOR 11 pg_catalog.= (text, text);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index ed4487e96b..1d6a9ddf25 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,6 +1,6 @@
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.5'
+default_version = '1.6'
module_pathname = '$libdir/pg_trgm'
relocatable = true
trusted = true
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index bc2a6d525c..6a9da24d5a 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -101,6 +101,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -137,6 +143,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
@@ -175,6 +198,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
-- Check similarity threshold (bug #14202)
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index b616953462..405a1d9552 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -37,6 +37,7 @@
#define WordDistanceStrategyNumber 8
#define StrictWordSimilarityStrategyNumber 9
#define StrictWordDistanceStrategyNumber 10
+#define EqualStrategyNumber 11
typedef char trgm[3];
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index 4dbf0ffb68..32fafef203 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -89,6 +89,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
case StrictWordSimilarityStrategyNumber:
+ case EqualStrategyNumber:
trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val));
break;
case ILikeStrategyNumber:
@@ -221,6 +222,7 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = true;
for (i = 0; i < nkeys; i++)
@@ -306,6 +308,7 @@ gin_trgm_triconsistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = GIN_MAYBE;
for (i = 0; i < nkeys; i++)
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index 9c0ed6ed73..c4450d8f83 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -232,6 +232,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
case StrictWordSimilarityStrategyNumber:
+ case EqualStrategyNumber:
qtrg = generate_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
@@ -338,6 +339,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Wildcard search is inexact */
*recheck = true;
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 5365b0681e..3ce865b076 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -419,9 +419,10 @@
the purpose of very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>,
- <literal>~</literal> and <literal>~*</literal> queries. (These indexes do not
- support equality nor simple comparison operators, so you may need a
- regular B-tree index too.)
+ <literal>~</literal> and <literal>~*</literal> queries. Beginning in
+ <productname>PostgreSQL</productname> 14, these indexes also support
+ equality and simple comparison operators, so extraneous regular B-tree
+ indexes are not needed anymore.
</para>
<para>
--
2.20.1
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Friday, November 13, 2020 10:50 AM, Julien Rouhaud <julien.rouhaud@free.fr> wrote:
On Wed, Nov 11, 2020 at 8:34 PM Georgios Kokolatos
gkokolatos@protonmail.com wrote:The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: not tested
Hi,
this patch implements a useful and missing feature. Thank you.
It includes documentation, which to a non-native speaker as myself seems appropriate.
It includes comprehensive tests that cover the implemented cases.
In the thread Alexander has pointed out, quote:
"It would be more efficient to generate trigrams for equal operator
using generate_trgm() instead of generate_wildcard_trgm()"
I will echo the sentiment, though from a slightly different and possibly not
as important point of view. The method used to extract trigrams from the query
should match the method used to extract trigrams from the values when they
get added to the index. This is gin_extract_value_trgm() and is indeed using
generate_trgm().
I have no opinion over Alexander's second comment regarding costing.
I change the status to 'Waiting on Author', but please feel free to override
my opinion if you feel I am wrong and reset it to 'Needs review'.Thanks for the reminder Georgios! Thanks a lot Alexander for the review!
Indeed, I should have used generate_trgm() rather than
generate_wildcard_trgm(). IIUC, the rest of the code should still be
doing the same as [I]LikeStrategyNumber. I attach a v3 with that
modification.
Great, thanks!
v3 looks good.
For the costing, I tried this naive dataset:
CREATE TABLE t1 AS select md5(random()::text) AS val from
generate_series(1, 100000);
CREATE INDEX t1_btree ON t1 (val);
CREATE INDEX t1_gist ON t1 USING gist (val gist_trgm_ops);Cost are like this (all default configuration, using any random existing entry):
EXPLAIN ANALYZE SELECT * FROM t1 where val =
=============================================
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN-----------------------------------------------
Index Only Scan using t1_btree on t1 (cost=0.42..4.44 rows=1
width=33) (actual time=0.192..0.194 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Heap Fetches: 0
Planning Time: 0.133 ms
Execution Time: 0.222 ms
(5 rows)EXPLAIN ANALYZE SELECT * FROM t1 where val =
=============================================
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN-----------------------------------------------
Index Scan using t1_gist on t1 (cost=0.28..8.30 rows=1 width=33)
(actual time=0.542..2.359 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Planning Time: 0.189 ms
Execution Time: 2.382 ms
(4 rows)EXPLAIN ANALYZE SELECT * FROM t1 where val =
=============================================
'8dcf324ce38428e4d27a363953ac1c51';
QUERY PLAN-----------------------------------------------
Bitmap Heap Scan on t1 (cost=400.01..404.02 rows=1 width=33) (actual
time=2.486..2.488 rows=1 loops=1)
Recheck Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on t1_gin (cost=0.00..400.01 rows=1 width=0)
(actual time=2.474..2.474 rows=1 loops=1)
Index Cond: (val = '8dcf324ce38428e4d27a363953ac1c51'::text)
Planning Time: 0.206 ms
Execution Time: 2.611 msSo assuming that this dataset is representative enough, costing indeed
prefers a btree index over a gist/gin index, which should avoid
regression with this change.
It sounds reasonable, although I would leave it to a bit more cost savvy
people to argue the point.
Gin is however quite off, likely because it's a bitmap index scan
rather than an index scan, so gist is preferred in this scenario.
That's not ideal, but I'm not sure that there are many people having
both gin_trgm_ops and gist_trgm_ops.
Same as above.
In short, I think v3 of the patch looks good to change to 'RFC' status.
Given the possible costing concerns, I will refrain from changing the
status just yet, to give the opportunity to more reviewers to chime in.
If in the next few days there are no more reviews, I will update the
status to 'RFC' to move the patch forward.
Thoughts?
Cheers,
//Georgios
Hi!
On Fri, Nov 13, 2020 at 1:47 PM Georgios <gkokolatos@protonmail.com> wrote:
In short, I think v3 of the patch looks good to change to 'RFC' status.
Given the possible costing concerns, I will refrain from changing the
status just yet, to give the opportunity to more reviewers to chime in.
If in the next few days there are no more reviews, I will update the
status to 'RFC' to move the patch forward.Thoughts?
I went through and revised this patch. I made the documentation
statement less categorical. pg_trgm gist/gin indexes might have lower
performance of equality operator search than B-tree. So, we can't
claim the B-tree index is always not needed. Also, simple comparison
operators are <, <=, >, >=, and they are not supported.
I also have checked that btree_gist is preferred over pg_trgm gist
index for equality search. Despite our gist cost estimate is quite
dumb, it selects btree_gist index due to its lower size. So, this
part also looks good to me.
I'm going to push this if no objections.
------
Regards,
Alexander Korotkov
Attachments:
v4-0001-Handle-equality-operator-in-contrib-pg_trgm.patchapplication/octet-stream; name=v4-0001-Handle-equality-operator-in-contrib-pg_trgm.patchDownload
From ec89eb0c03273ba6124fff3cf17c7e05874b705d Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Sat, 14 Nov 2020 08:04:19 +0300
Subject: [PATCH] Handle equality operator in contrib/pg_trgm
Obviously, in order to equality operator be satisfiable, target string must
contain all the trigrams of the search string. On this base, we implement
equality operator in GiST/GIN indexes with recheck.
Discussion: https://postgr.es/m/CAOBaU_YWwtT7tdggtROacjdOdeYHCz-tmSwuC-j-TOG-g97J0w%40mail.gmail.com
Author: Julien Rouhaud
Reviewed-by: Tom Lane, Alexander Korotkov, Georgios Kokolatos
---
contrib/pg_trgm/Makefile | 2 +-
contrib/pg_trgm/expected/pg_trgm.out | 204 +++++++++++++++++++++++++++++++++-
contrib/pg_trgm/pg_trgm--1.5--1.6.sql | 10 ++
contrib/pg_trgm/pg_trgm.control | 2 +-
contrib/pg_trgm/sql/pg_trgm.sql | 40 +++++++
contrib/pg_trgm/trgm.h | 1 +
contrib/pg_trgm/trgm_gin.c | 3 +
contrib/pg_trgm/trgm_gist.c | 4 +-
doc/src/sgml/pgtrgm.sgml | 8 +-
9 files changed, 264 insertions(+), 10 deletions(-)
create mode 100644 contrib/pg_trgm/pg_trgm--1.5--1.6.sql
diff --git a/contrib/pg_trgm/Makefile b/contrib/pg_trgm/Makefile
index d75e9ada2e4..1fbdc9ec1ef 100644
--- a/contrib/pg_trgm/Makefile
+++ b/contrib/pg_trgm/Makefile
@@ -9,7 +9,7 @@ OBJS = \
trgm_regexp.o
EXTENSION = pg_trgm
-DATA = pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
+DATA = pg_trgm--1.5--1.6.sql pg_trgm--1.4--1.5.sql pg_trgm--1.3--1.4.sql \
pg_trgm--1.3.sql pg_trgm--1.2--1.3.sql pg_trgm--1.1--1.2.sql \
pg_trgm--1.0--1.1.sql
PGFILEDESC = "pg_trgm - trigram matching"
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index 923c326c7bd..20141ce7f3d 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -4761,6 +4761,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -4863,7 +4869,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -4918,7 +4930,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -4961,6 +4977,93 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = 'abcdef'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = 'abcdef'::text)
+(4 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Bitmap Heap Scan on test2
+ Recheck Cond: (t = '%line%'::text)
+ -> Bitmap Index Scan on test2_idx_gin
+ Index Cond: (t = '%line%'::text)
+(4 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
set enable_seqscan=off;
@@ -5056,7 +5159,13 @@ select * from test2 where t ~ '(abc)*$';
quark
z foo bar
/123/-45/
-(4 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+ %li%ne 5%
+ li_e 6
+(10 rows)
select * from test2 where t ~* 'DEF';
t
@@ -5111,7 +5220,11 @@ select * from test2 where t ~ '[a-z]{3}';
abcdef
quark
z foo bar
-(3 rows)
+ line 1
+ %line 2
+ line 3%
+ %line 4%
+(7 rows)
select * from test2 where t ~* '(a{10}|b{10}|c{10}){10}';
t
@@ -5154,6 +5267,89 @@ select * from test2 where t ~ '/\d+/-\d';
/123/-45/
(1 row)
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = 'abcdef'::text)
+(2 rows)
+
+select * from test2 where t = 'abcdef';
+ t
+--------
+ abcdef
+(1 row)
+
+explain (costs off)
+ select * from test2 where t = '%line%';
+ QUERY PLAN
+------------------------------------------
+ Index Scan using test2_idx_gist on test2
+ Index Cond: (t = '%line%'::text)
+(2 rows)
+
+select * from test2 where t = '%line%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 1';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 2';
+ t
+---------
+ %line 2
+(1 row)
+
+select * from test2 where t = 'line 3%';
+ t
+---------
+ line 3%
+(1 row)
+
+select * from test2 where t = '%line 3%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%line 4%';
+ t
+----------
+ %line 4%
+(1 row)
+
+select * from test2 where t = '%line 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li_ne 5%';
+ t
+---
+(0 rows)
+
+select * from test2 where t = '%li%ne 5%';
+ t
+-----------
+ %li%ne 5%
+(1 row)
+
+select * from test2 where t = 'line 6';
+ t
+---
+(0 rows)
+
+select * from test2 where t = 'li_e 6';
+ t
+--------
+ li_e 6
+(1 row)
+
-- Check similarity threshold (bug #14202)
CREATE TEMP TABLE restaurants (city text);
INSERT INTO restaurants SELECT 'Warsaw' FROM generate_series(1, 10000);
diff --git a/contrib/pg_trgm/pg_trgm--1.5--1.6.sql b/contrib/pg_trgm/pg_trgm--1.5--1.6.sql
new file mode 100644
index 00000000000..9e74684eadd
--- /dev/null
+++ b/contrib/pg_trgm/pg_trgm--1.5--1.6.sql
@@ -0,0 +1,10 @@
+/* contrib/pg_trgm/pg_trgm--1.5--1.6.sql */
+
+-- complain if script is sourced in psql, rather than via ALTER EXTENSION
+\echo Use "ALTER EXTENSION pg_trgm UPDATE TO '1.6'" to load this file. \quit
+
+ALTER OPERATOR FAMILY gin_trgm_ops USING gin ADD
+ OPERATOR 11 pg_catalog.= (text, text);
+
+ALTER OPERATOR FAMILY gist_trgm_ops USING gist ADD
+ OPERATOR 11 pg_catalog.= (text, text);
diff --git a/contrib/pg_trgm/pg_trgm.control b/contrib/pg_trgm/pg_trgm.control
index ed4487e96b2..1d6a9ddf259 100644
--- a/contrib/pg_trgm/pg_trgm.control
+++ b/contrib/pg_trgm/pg_trgm.control
@@ -1,6 +1,6 @@
# pg_trgm extension
comment = 'text similarity measurement and index searching based on trigrams'
-default_version = '1.5'
+default_version = '1.6'
module_pathname = '$libdir/pg_trgm'
relocatable = true
trusted = true
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index bc2a6d525cc..6a9da24d5a7 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -101,6 +101,12 @@ insert into test2 values ('abcdef');
insert into test2 values ('quark');
insert into test2 values (' z foo bar');
insert into test2 values ('/123/-45/');
+insert into test2 values ('line 1');
+insert into test2 values ('%line 2');
+insert into test2 values ('line 3%');
+insert into test2 values ('%line 4%');
+insert into test2 values ('%li%ne 5%');
+insert into test2 values ('li_e 6');
create index test2_idx_gin on test2 using gin (t gin_trgm_ops);
set enable_seqscan=off;
explain (costs off)
@@ -137,6 +143,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
drop index test2_idx_gin;
create index test2_idx_gist on test2 using gist (t gist_trgm_ops);
@@ -175,6 +198,23 @@ select * from test2 where t ~ ' z foo bar';
select * from test2 where t ~ ' z foo';
select * from test2 where t ~ 'qua(?!foo)';
select * from test2 where t ~ '/\d+/-\d';
+-- test = operator
+explain (costs off)
+ select * from test2 where t = 'abcdef';
+select * from test2 where t = 'abcdef';
+explain (costs off)
+ select * from test2 where t = '%line%';
+select * from test2 where t = '%line%';
+select * from test2 where t = 'li_e 1';
+select * from test2 where t = '%line 2';
+select * from test2 where t = 'line 3%';
+select * from test2 where t = '%line 3%';
+select * from test2 where t = '%line 4%';
+select * from test2 where t = '%line 5%';
+select * from test2 where t = '%li_ne 5%';
+select * from test2 where t = '%li%ne 5%';
+select * from test2 where t = 'line 6';
+select * from test2 where t = 'li_e 6';
-- Check similarity threshold (bug #14202)
diff --git a/contrib/pg_trgm/trgm.h b/contrib/pg_trgm/trgm.h
index b616953462e..405a1d95528 100644
--- a/contrib/pg_trgm/trgm.h
+++ b/contrib/pg_trgm/trgm.h
@@ -37,6 +37,7 @@
#define WordDistanceStrategyNumber 8
#define StrictWordSimilarityStrategyNumber 9
#define StrictWordDistanceStrategyNumber 10
+#define EqualStrategyNumber 11
typedef char trgm[3];
diff --git a/contrib/pg_trgm/trgm_gin.c b/contrib/pg_trgm/trgm_gin.c
index 4dbf0ffb68a..32fafef203f 100644
--- a/contrib/pg_trgm/trgm_gin.c
+++ b/contrib/pg_trgm/trgm_gin.c
@@ -89,6 +89,7 @@ gin_extract_query_trgm(PG_FUNCTION_ARGS)
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
case StrictWordSimilarityStrategyNumber:
+ case EqualStrategyNumber:
trg = generate_trgm(VARDATA_ANY(val), VARSIZE_ANY_EXHDR(val));
break;
case ILikeStrategyNumber:
@@ -221,6 +222,7 @@ gin_trgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = true;
for (i = 0; i < nkeys; i++)
@@ -306,6 +308,7 @@ gin_trgm_triconsistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
+ case EqualStrategyNumber:
/* Check if all extracted trigrams are presented. */
res = GIN_MAYBE;
for (i = 0; i < nkeys; i++)
diff --git a/contrib/pg_trgm/trgm_gist.c b/contrib/pg_trgm/trgm_gist.c
index 2a067306354..2d4ec02f263 100644
--- a/contrib/pg_trgm/trgm_gist.c
+++ b/contrib/pg_trgm/trgm_gist.c
@@ -232,6 +232,7 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
case SimilarityStrategyNumber:
case WordSimilarityStrategyNumber:
case StrictWordSimilarityStrategyNumber:
+ case EqualStrategyNumber:
qtrg = generate_trgm(VARDATA(query),
querysize - VARHDRSZ);
break;
@@ -338,7 +339,8 @@ gtrgm_consistent(PG_FUNCTION_ARGS)
#endif
/* FALL THRU */
case LikeStrategyNumber:
- /* Wildcard search is inexact */
+ case EqualStrategyNumber:
+ /* Wildcard and equal search is inexact */
*recheck = true;
/*
diff --git a/doc/src/sgml/pgtrgm.sgml b/doc/src/sgml/pgtrgm.sgml
index 5365b0681e5..f8ca8f44f3d 100644
--- a/doc/src/sgml/pgtrgm.sgml
+++ b/doc/src/sgml/pgtrgm.sgml
@@ -419,9 +419,11 @@
the purpose of very fast similarity searches. These index types support
the above-described similarity operators, and additionally support
trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>,
- <literal>~</literal> and <literal>~*</literal> queries. (These indexes do not
- support equality nor simple comparison operators, so you may need a
- regular B-tree index too.)
+ <literal>~</literal> and <literal>~*</literal> queries. Beginning in
+ <productname>PostgreSQL</productname> 14, these indexes also support
+ equality operator (simple comparison operators are not supported).
+ Although these indexes might have lower the performance of equality operator
+ search than regular B-tree indexes.
</para>
<para>
--
2.14.3
On 2020-11-14 06:30, Alexander Korotkov wrote:
[v4-0001-Handle-equality...in-contrib-pg_trgm.patch (~]
I'm going to push this if no objections.
About the sgml, in doc/src/sgml/pgtrgm.sgml :
Beginning in <productname>PostgreSQL</productname> 14, these indexes
also support equality operator (simple comparison operators are not
supported).
should be:
Beginning in <productname>PostgreSQL</productname> 14, these indexes
also support the equality operator (simple comparison operators are not
supported).
(added 'the')
And:
Although these indexes might have lower the performance of equality
operator
search than regular B-tree indexes.
should be (I think - please check the meaning)
Although these indexes might have a lower performance with equality
operator
search than with regular B-tree indexes.
I am not sure I understood this last sentence correctly. Does this mean
the slower trgm index might be chosen over the faster btree?
Thanks,
Erik Rijkers
Hi, Erik!
On Sat, Nov 14, 2020 at 11:37 AM Erik Rijkers <er@xs4all.nl> wrote:
On 2020-11-14 06:30, Alexander Korotkov wrote:
[v4-0001-Handle-equality...in-contrib-pg_trgm.patch (~]
I'm going to push this if no objections.
About the sgml, in doc/src/sgml/pgtrgm.sgml :
Beginning in <productname>PostgreSQL</productname> 14, these indexes
also support equality operator (simple comparison operators are not
supported).should be:
Beginning in <productname>PostgreSQL</productname> 14, these indexes
also support the equality operator (simple comparison operators are not
supported).(added 'the')
And:
Although these indexes might have lower the performance of equality
operator
search than regular B-tree indexes.should be (I think - please check the meaning)
Although these indexes might have a lower performance with equality
operator
search than with regular B-tree indexes.I am not sure I understood this last sentence correctly. Does this mean
the slower trgm index might be chosen over the faster btree?
Thank you for your review.
I mean searching for an equal string with pg_trgm GiST/GIN might be
slower than the same search with B-tree. If you need both pg_trgm
similarity/pattern search and equal search on your column, you have
choice. You can run with a single pg_trgm index, but your search for
an equal string wouldn't be as fast as with B-tree. Alternatively you
can have two indexes: pg_trgm index for similarity/pattern search and
B-tree index for equality search. Second option gives you a fast
equality search, but the second B-tree index would take extra space
and maintenance overhead. For equality search, the B-tree index
should be chosen by the planner (and that was tested).
------
Regards,
Alexander Korotkov
On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
Hi, Erik!
On Sat, Nov 14, 2020 at 11:37 AM Erik Rijkers <er@xs4all.nl> wrote:
On 2020-11-14 06:30, Alexander Korotkov wrote:
[v4-0001-Handle-equality...in-contrib-pg_trgm.patch (~]
I'm going to push this if no objections.
About the sgml, in doc/src/sgml/pgtrgm.sgml :
Beginning in <productname>PostgreSQL</productname> 14, these indexes
also support equality operator (simple comparison operators are not
supported).should be:
Beginning in <productname>PostgreSQL</productname> 14, these indexes
also support the equality operator (simple comparison operators are not
supported).(added 'the')
And:
Although these indexes might have lower the performance of equality
operator
search than regular B-tree indexes.should be (I think - please check the meaning)
Although these indexes might have a lower performance with equality
operator
search than with regular B-tree indexes.I am not sure I understood this last sentence correctly. Does this mean
the slower trgm index might be chosen over the faster btree?Thank you for your review.
I mean searching for an equal string with pg_trgm GiST/GIN might be
slower than the same search with B-tree. If you need both pg_trgm
similarity/pattern search and equal search on your column, you have
choice. You can run with a single pg_trgm index, but your search for
an equal string wouldn't be as fast as with B-tree. Alternatively you
can have two indexes: pg_trgm index for similarity/pattern search and
B-tree index for equality search. Second option gives you a fast
equality search, but the second B-tree index would take extra space
and maintenance overhead. For equality search, the B-tree index
should be chosen by the planner (and that was tested).
Thanks everyone for the review, and thanks Alexander for the modifications!
I agree that it's important to document that those indexes may be less
performant than btree indexes. I also agree that there's an
extraneous "the" in the documentation. Maybe this rewrite could be
better?
Note that those indexes may not be as afficient as regulat B-tree indexes
for equality operator.
While at it, there's a small grammar error:
case EqualStrategyNumber:
- /* Wildcard search is inexact */
+ /* Wildcard and equal search is inexact */
It should be /* Wildcard and equal search are inexact */
On 2020-11-14 12:53, Julien Rouhaud wrote:
On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
<aekorotkov@gmail.com> >
Note that those indexes may not be as afficient as regulat B-tree
indexes
for equality operator.
'afficient as regulat' should be
'efficient as regular'
Sorry to be nitpicking - it's the one thing I'm really good at :P
Erik
On Sat, Nov 14, 2020 at 7:58 PM Erik Rijkers <er@xs4all.nl> wrote:
On 2020-11-14 12:53, Julien Rouhaud wrote:
On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
<aekorotkov@gmail.com> >Note that those indexes may not be as afficient as regulat B-tree
indexes
for equality operator.'afficient as regulat' should be
'efficient as regular'Sorry to be nitpicking - it's the one thing I'm really good at :P
Oops indeed :)
On Sat, Nov 14, 2020 at 8:26 PM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
On Sat, Nov 14, 2020 at 7:58 PM Erik Rijkers <er@xs4all.nl> wrote:
On 2020-11-14 12:53, Julien Rouhaud wrote:
On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
<aekorotkov@gmail.com> >Note that those indexes may not be as afficient as regulat B-tree
indexes
for equality operator.'afficient as regulat' should be
'efficient as regular'Sorry to be nitpicking - it's the one thing I'm really good at :P
Oops indeed :)
Pushed with all the corrections above. Thanks!
------
Regards,
Alexander Korotkov
On Sun, Nov 15, 2020 at 1:55 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sat, Nov 14, 2020 at 8:26 PM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
On Sat, Nov 14, 2020 at 7:58 PM Erik Rijkers <er@xs4all.nl> wrote:
On 2020-11-14 12:53, Julien Rouhaud wrote:
On Sat, Nov 14, 2020 at 6:07 PM Alexander Korotkov
<aekorotkov@gmail.com> >Note that those indexes may not be as afficient as regulat B-tree
indexes
for equality operator.'afficient as regulat' should be
'efficient as regular'Sorry to be nitpicking - it's the one thing I'm really good at :P
Oops indeed :)
Pushed with all the corrections above. Thanks!
Thanks a lot!
On 2020-11-15 06:55, Alexander Korotkov wrote:
Sorry to be nitpicking - it's the one thing I'm really good at :P
Hi Alexander,
The last touch... (you forgot the missing 'the')
thanks!
Erik Rijkers
Attachments:
pgtrgm-20201115.sgml.difftext/x-diff; name=pgtrgm-20201115.sgml.diffDownload
--- doc/src/sgml/pgtrgm.sgml.orig 2020-11-15 08:00:54.607816533 +0100
+++ doc/src/sgml/pgtrgm.sgml 2020-11-15 08:17:23.243316332 +0100
@@ -421,7 +421,7 @@
trigram-based index searches for <literal>LIKE</literal>, <literal>ILIKE</literal>,
<literal>~</literal> and <literal>~*</literal> queries. Beginning in
<productname>PostgreSQL</productname> 14, these indexes also support
- equality operator (simple comparison operators are not supported).
+ the equality operator (simple comparison operators are not supported).
Note that those indexes may not be as efficient as regular B-tree indexes
for equality operator.
</para>
On Sat, Nov 14, 2020 at 12:31 AM Alexander Korotkov <aekorotkov@gmail.com>
wrote:
I went through and revised this patch. I made the documentation
statement less categorical. pg_trgm gist/gin indexes might have lower
performance of equality operator search than B-tree. So, we can't
claim the B-tree index is always not needed. Also, simple comparison
operators are <, <=, >, >=, and they are not supported.
Is "simple comparison" here a well-known term of art? If I read the doc as
committed (which doesn't include the sentence above), and if I didn't
already know what it was saying, I would be left wondering which
comparisons those are. Could we just say "inequality operators"?
Cheers,
Jeff
On Mon, Nov 16, 2020 at 2:13 AM Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Nov 14, 2020 at 12:31 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
I went through and revised this patch. I made the documentation
statement less categorical. pg_trgm gist/gin indexes might have lower
performance of equality operator search than B-tree. So, we can't
claim the B-tree index is always not needed. Also, simple comparison
operators are <, <=, >, >=, and they are not supported.Is "simple comparison" here a well-known term of art? If I read the doc as committed (which doesn't include the sentence above), and if I didn't already know what it was saying, I would be left wondering which comparisons those are. Could we just say "inequality operators"?
You're right. "Simple comparison" is vague, let's replace it with
"inequality". Pushed, thanks!
------
Regards,
Alexander Korotkov
On Sat, 14 Nov 2020 at 18:31, Alexander Korotkov <aekorotkov@gmail.com> wrote:
I also have checked that btree_gist is preferred over pg_trgm gist
index for equality search. Despite our gist cost estimate is quite
dumb, it selects btree_gist index due to its lower size. So, this
part also looks good to me.I'm going to push this if no objections.
(Reviving old thread [1]/messages/by-id/CAPpHfducQ0U8noyb2L3VChsyBMsc5V2Ej2whmEuxmAgHa2jVXg@mail.gmail.com due to a complaint from a customer about a
performance regression after upgrading PG13 to PG15)
I think the comparisons you made may have been too simplistic. Do you
recall what your test case was?
I tried comparing btree to gist with gist_trgm_ops and found that the
cost estimates for GIST come out cheaper than btree. Btree only wins
in the most simplistic tests due to Index Only Scans. The test done in
[2]: /messages/by-id/CAOBaU_YkkhakwTG4oA886T4CQsHG5hfY+xGA3dTBdZM+DTYJWA@mail.gmail.com
create extension if not exists pg_trgm;
create table a (a varchar(250), b varchar(250), c varchar(250));
insert into a select md5(a::text),md5(a::text),md5(a::text) from
generate_Series(1,1000000)a;
create index a_a_btree on a (a);
create index a_a_gist on a using gist (a gist_trgm_ops);
vacuum freeze analyze a;
-- Gist index wins
explain (analyze, buffers) select * from a where a = '1234';
Index Scan using a_a_gist on a (cost=0.41..8.43 rows=1 width=99)
Index Cond: ((a)::text = '1234'::text)
Rows Removed by Index Recheck: 1
Buffers: shared hit=14477
Planning Time: 0.055 ms
Execution Time: 23.861 ms
(6 rows)
-- hack to disable gist index.
update pg_index set indisvalid = false where indexrelid='a_a_gist'::regclass;
explain (analyze, buffers) select * from a where a = '1234';
Index Scan using a_a_btree on a (cost=0.42..8.44 rows=1 width=99)
Index Cond: ((a)::text = '1234'::text)
Buffers: shared read=3
Planning:
Buffers: shared hit=8
Planning Time: 0.090 ms
Execution Time: 0.048 ms (497.1 times faster)
(7 rows)
-- re-enable gist.
update pg_index set indisvalid = true where indexrelid='a_a_gist'::regclass;
-- try a query that supports btree with index only scan. Btree wins.
explain (analyze, buffers) select a from a where a = '1234';
Index Only Scan using a_a_btree on a (cost=0.42..4.44 rows=1 width=33)
Index Cond: (a = '1234'::text)
Heap Fetches: 0
Buffers: shared read=3
Planning Time: 0.185 ms
Execution Time: 0.235 ms
(6 rows)
-- Disable IOS and Gist index wins again.
set enable_indexonlyscan=0;
explain (analyze, buffers) select a from a where a = '1234';
Index Scan using a_a_gist on a (cost=0.41..8.43 rows=1 width=33)
Index Cond: ((a)::text = '1234'::text)
Rows Removed by Index Recheck: 1
Buffers: shared hit=11564 read=3811
Planning Time: 0.118 ms
Execution Time: 71.270 ms (303.2 times faster)
(6 rows)
This does not seem ideal given that the select * with the btree is
~500 times faster than with the gist plan.
For now, I've recommended the GIST indexes are moved to another
tablespace with an increased random_page_cost to try to coax the
planner to use the btree index.
I wonder if we can do something to fix this so the different
tablespace idea isn't the permanent solution. I had a look to see why
the GIST costs come out cheaper. It looks like it's the startup cost
calculation that's slightly different from the btree costing. The
attached patch highlights the difference. When applied both indexes
come out at the same cost and which one is picked is down to which
index has the lower Oid. I've not studied if there's a reason why this
code is different in gist.
David
[1]: /messages/by-id/CAPpHfducQ0U8noyb2L3VChsyBMsc5V2Ej2whmEuxmAgHa2jVXg@mail.gmail.com
[2]: /messages/by-id/CAOBaU_YkkhakwTG4oA886T4CQsHG5hfY+xGA3dTBdZM+DTYJWA@mail.gmail.com
Attachments:
gist_cost.diffapplication/octet-stream; name=gist_cost.diffDownload
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 03d7fb5f48..e18aa5d02b 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -7213,7 +7213,7 @@ gistcostestimate(PlannerInfo *root, IndexPath *path, double loop_count,
*/
if (index->tuples > 1) /* avoid computing log(0) */
{
- descentCost = ceil(log(index->tuples)) * cpu_operator_cost;
+ descentCost = ceil(log(index->tuples) / log(2.0)) * cpu_operator_cost;
costs.indexStartupCost += descentCost;
costs.indexTotalCost += costs.num_sa_scans * descentCost;
}