Allow ILIKE forward matching to use btree index
Hi,
Currently, btree indexes cannot used for ILIKE (~~*) operator if the pattern
has case-varying characters although LIKE (~~) expression can be converted
to indexable clauses by the planner support function (if the collation
is "C" or operator class 'text_pattern_ops' is used).
For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND t < '123fop')"
and index scan can be used for this condition. On the other hand, "t ~~* '123foo'"
cannot be converted and sequential scan is used.
Even in this case, we can use a bitmap index scan for the condition
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')" followed by
recheck by the original condition "t ~~* '123foo'", and this could be faster
than seqscan.
However, even if the support function would return OR clauses, the current
planner implementation cannot not build bitmap scan paths using them.
The attached patches allow to ILIKE (~~*) forward matching to use btree index.
The patch 0001 enables get_index_clause_from_support to receive OR clauses
from support functions and use them to build bitmap index scan later. OR clauses
returned by support functions are collected in the new argument 'orclauses" of
match_restriction_clauses_to_index(), and they are passed to
generate_bitmap_or_paths() later to build bitmap scan paths.
The patch 0002 modifies the support function to return OR clauses as an example
above when ILIKE's pattern has case-varying characters in forward matching. The
OR clauses contains two index conditions for the upper and the lower letter of
the first case-varying character in the pattern respectively. Although the
subsequent characters are not considered in the index scans, it could enough be
faster then sequent scan.
Here is an example.
1. Create a table with random text records
=# CREATE TABLE tbl (t text);
=# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x END
FROM (SELECT i, md5(i::text) x FROM generate_series(1,5000000) i);
2. Create an index
=# CREATE INDEX ON tbl (t text_pattern_ops);
3. Before applying patch: Seq Scan was selected. It takes about 4 sec.
=# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..58712.97 rows=500 width=33) (actual time=101.096..4110.979 rows=64 loops=1)
Workers Planned: 4
Workers Launched: 4
Buffers: shared hit=10778 read=31260
-> Parallel Seq Scan on tbl (cost=0.00..57662.97 rows=125 width=33) (actual time=440.232..4101.023 rows=13 loops=5)
Filter: (t ~~* '12ab%'::text)
Rows Removed by Filter: 999987
Buffers: shared hit=10778 read=31260
Planning Time: 0.415 ms
Execution Time: 4111.051 ms
(10 rows)
4. After applying patch: Bitmap Index Scan was selected. It takes only 10 ms.
=# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
--
Bitmap Heap Scan on tbl (cost=9.38..13.40 rows=500 width=33) (actual time=3.720..9.660 rows=64 loops=1)
Recheck Cond: (((t ~>=~ '12A'::text) AND (t ~<~ '12B'::text) AND (t ~~* '12ab%'::text)) OR ((t ~>=~ '12a'::text) AND (t ~<~ '12b'::text) AND (t ~~* '12ab%'::text))
)
Filter: (t ~~* '12ab%'::text)
Rows Removed by Filter: 1205
Heap Blocks: exact=1254
Buffers: shared hit=1271
-> BitmapOr (cost=9.38..9.38 rows=1 width=0) (actual time=2.370..2.372 rows=0 loops=1)
Buffers: shared hit=17
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..4.57 rows=1 width=0) (actual time=1.192..1.193 rows=604 loops=1)
Index Cond: ((t ~>=~ '12A'::text) AND (t ~<~ '12B'::text))
Buffers: shared hit=8
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..4.57 rows=1 width=0) (actual time=1.174..1.174 rows=675 loops=1)
Index Cond: ((t ~>=~ '12a'::text) AND (t ~<~ '12b'::text))
Buffers: shared hit=9
Planning Time: 1.144 ms
Execution Time: 9.785 ms
(16 rows)
What do you think about it?
I think another application of OR-clause returning support function might be
allowing to use an index for NOT LIKE (!~~) expression because, for example,
"t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >= '123fooz')".
(The second condition is a bit loose but this would be safe and not problem
since tuples are filtered by the original condition after the index scan.)
However, it would not very useful unless the distribution is much skew so that
NOT LIKE expression's selectivity is enough small.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
Attachments:
0002-Allow-ILIKE-forward-matching-to-use-btree-index.patchtext/x-diff; name=0002-Allow-ILIKE-forward-matching-to-use-btree-index.patchDownload
From 89912ca337d16ab44823f98dcabb3466557ac8ac Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Thu, 19 Dec 2024 23:41:07 +0900
Subject: [PATCH 2/2] Allow ILIKE forward matching to use btree index
Previously, ILIKE could not use btree index if the pattern
has case-varying characters. To enable it, the planner
support function converts an ILINK operator expression to OR
clause that contains two index clauses for the upper letter and
the lower letter respectively.
For example, "t ILIKE '123foo%'" can be converted to
"(t <= '123F'AND t > '123G') OR (t <= '123f' AND t < '123g')",
and bitmap index scan plan could be used for this.
---
src/backend/utils/adt/like_support.c | 141 ++++++++++++++----
src/test/regress/expected/btree_index.out | 15 +-
.../regress/expected/collate.icu.utf8.out | 22 ++-
src/test/regress/sql/collate.icu.utf8.sql | 4 +-
4 files changed, 137 insertions(+), 45 deletions(-)
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index ee71ca89ff..06cf4eb8d2 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -66,7 +66,10 @@ typedef enum
typedef enum
{
- Pattern_Prefix_None, Pattern_Prefix_Partial, Pattern_Prefix_Exact,
+ Pattern_Prefix_None,
+ Pattern_Prefix_Partial,
+ Pattern_Prefix_Partial_IC,
+ Pattern_Prefix_Exact,
} Pattern_Prefix_Status;
static Node *like_regex_support(Node *rawreq, Pattern_Type ptype);
@@ -245,7 +248,7 @@ match_pattern_prefix(Node *leftop,
Oid opfamily,
Oid indexcollation)
{
- List *result;
+ List *result = NIL;
Const *patt;
Const *prefix;
Pattern_Prefix_Status pstatus;
@@ -259,6 +262,8 @@ match_pattern_prefix(Node *leftop,
Expr *expr;
FmgrInfo ltproc;
Const *greaterstr;
+ List *prefixes;
+ ListCell *lc;
/*
* Can't do anything with a non-constant or NULL pattern argument.
@@ -434,35 +439,112 @@ match_pattern_prefix(Node *leftop,
return NIL;
/*
- * We can always say "x >= prefix".
+ * If the pattern for case-insensiive matching has a case-varying
+ * character, make two prefixes including the upper letter and the
+ * lower letter respectively. For example, if the pattern is
+ * '123foo%', we get '123F' and '123f' as prefixes.
*/
- if (!op_in_opfamily(geopr, opfamily))
- return NIL;
- expr = make_opclause(geopr, BOOLOID, false,
- (Expr *) leftop, (Expr *) prefix,
- InvalidOid, indexcollation);
- result = list_make1(expr);
-
- /*-------
- * If we can create a string larger than the prefix, we can say
- * "x < greaterstr". NB: we rely on make_greater_string() to generate
- * a guaranteed-greater string, not just a probably-greater string.
- * In general this is only guaranteed in C locale, so we'd better be
- * using a C-locale index collation.
- *-------
- */
- if (!op_in_opfamily(ltopr, opfamily))
- return result;
- fmgr_info(get_opcode(ltopr), <proc);
- greaterstr = make_greater_string(prefix, <proc, indexcollation);
- if (greaterstr)
+ if (pstatus == Pattern_Prefix_Partial_IC)
+ {
+ int prefix_len;
+ Datum first_letter;
+ Datum upper_letter;
+ Datum lower_letter;
+ const char *prefix_l;
+ const char *prefix_u;
+
+ Assert(ptype == Pattern_Type_Like_IC);
+
+ /* get the first case-varying characketer in the pattern */
+ prefix_len = DatumGetInt32(DirectFunctionCall1Coll(textlen,
+ indexcollation,
+ prefix->constvalue));
+ first_letter = DirectFunctionCall3Coll(text_substr, indexcollation,
+ patt->constvalue,
+ Int32GetDatum(prefix_len + 1),
+ Int32GetDatum(1));
+
+
+ /* prefix followed by the upper letter */
+ upper_letter = DirectFunctionCall1Coll(upper, indexcollation,
+ first_letter);
+ prefix_u = TextDatumGetCString(DirectFunctionCall2Coll(textcat,
+ indexcollation,
+ prefix->constvalue,
+ upper_letter));
+ prefixes = list_make1(string_to_const(prefix_u, prefix->consttype));
+
+ /* prefix followed by the lower letter */
+ lower_letter = DirectFunctionCall1Coll(lower, indexcollation,
+ first_letter);
+ prefix_l = TextDatumGetCString(DirectFunctionCall2Coll(textcat,
+ indexcollation,
+ prefix->constvalue,
+ lower_letter));
+ if (!DatumGetBool(DirectFunctionCall2Coll(texteq,
+ indexcollation,
+ CStringGetTextDatum(prefix_u),
+ CStringGetTextDatum(prefix_l))))
+ prefixes = lappend(prefixes, string_to_const(prefix_l, prefix->consttype));
+ }
+ else
+ prefixes = list_make1(prefix);
+
+ foreach (lc, prefixes)
{
- expr = make_opclause(ltopr, BOOLOID, false,
- (Expr *) leftop, (Expr *) greaterstr,
+ List *clauses;
+
+ prefix = (Const *) lfirst(lc);
+
+ /*
+ * We can always say "x >= prefix".
+ */
+ if (!op_in_opfamily(geopr, opfamily))
+ return NIL;
+ expr = make_opclause(geopr, BOOLOID, false,
+ (Expr *) leftop, (Expr *) prefix,
InvalidOid, indexcollation);
- result = lappend(result, expr);
+ clauses = list_make1(expr);
+
+ /*-------
+ * If we can create a string larger than the prefix, we can say
+ * "x < greaterstr". NB: we rely on make_greater_string() to generate
+ * a guaranteed-greater string, not just a probably-greater string.
+ * In general this is only guaranteed in C locale, so we'd better be
+ * using a C-locale index collation.
+ *-------
+ */
+ if (op_in_opfamily(ltopr, opfamily))
+ {
+ fmgr_info(get_opcode(ltopr), <proc);
+ greaterstr = make_greater_string(prefix, <proc, indexcollation);
+ if (greaterstr)
+ {
+ expr = make_opclause(ltopr, BOOLOID, false,
+ (Expr *) leftop, (Expr *) greaterstr,
+ InvalidOid, indexcollation);
+ clauses = lappend(clauses, expr);
+ }
+ }
+
+ /*
+ * If case-insensitive pattern matching generated two clauses for
+ * uppder and lower letters, convert them to explicit AND because
+ * they will be under OR later.
+ */
+ if (pstatus == Pattern_Prefix_Partial_IC && list_length(prefixes) > 1)
+ result = lappend(result, make_ands_explicit(clauses));
+ else
+ result = clauses;
}
+ /*
+ * If case-insensitive pattern matching generated two clauses for upper and
+ * lower letters, OR them.
+ */
+ if (pstatus == Pattern_Prefix_Partial_IC && list_length(prefixes) > 1)
+ result = list_make1(make_orclause(result));
+
return result;
}
@@ -997,6 +1079,7 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
match_pos;
bool is_multibyte = (pg_database_encoding_max_length() > 1);
pg_locale_t locale = 0;
+ bool has_case_varying = false;
/* the right-hand const is type text or bytea */
Assert(typeid == BYTEAOID || typeid == TEXTOID);
@@ -1058,7 +1141,10 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
/* Stop if case-varying character (it's sort of a wildcard) */
if (case_insensitive &&
pattern_char_isalpha(patt[pos], is_multibyte, locale))
+ {
+ has_case_varying = true;
break;
+ }
match[match_pos++] = patt[pos];
}
@@ -1077,6 +1163,9 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
pfree(patt);
pfree(match);
+ if (case_insensitive && has_case_varying)
+ return Pattern_Prefix_Partial_IC;
+
/* in LIKE, an empty pattern is an exact match! */
if (pos == pattlen)
return Pattern_Prefix_Exact; /* reached end of pattern, so exact */
diff --git a/src/test/regress/expected/btree_index.out b/src/test/regress/expected/btree_index.out
index def78ef858..244d6d999e 100644
--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -332,14 +332,19 @@ select proname from pg_proc where proname ilike '00%foo' order by 1;
explain (costs off)
select proname from pg_proc where proname ilike 'ri%foo' order by 1;
- QUERY PLAN
-----------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Sort Key: proname
- -> Seq Scan on pg_proc
- Disabled: true
+ -> Bitmap Heap Scan on pg_proc
+ Recheck Cond: (((proname >= 'R'::text) AND (proname < 'S'::text) AND (proname ~~* 'ri%foo'::text)) OR ((proname >= 'r'::text) AND (proname < 's'::text) AND (proname ~~* 'ri%foo'::text)))
Filter: (proname ~~* 'ri%foo'::text)
-(5 rows)
+ -> BitmapOr
+ -> Bitmap Index Scan on pg_proc_proname_args_nsp_index
+ Index Cond: ((proname >= 'R'::text) AND (proname < 'S'::text))
+ -> Bitmap Index Scan on pg_proc_proname_args_nsp_index
+ Index Cond: ((proname >= 'r'::text) AND (proname < 's'::text))
+(10 rows)
reset enable_seqscan;
reset enable_indexscan;
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d4f327636f..f13f9e1245 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -988,14 +988,13 @@ SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_t
set enable_seqscan = off;
explain (costs off)
select * from collate_test1 where b ilike 'abc';
- QUERY PLAN
--------------------------------
- Seq Scan on collate_test1
- Disabled: true
+ QUERY PLAN
+------------------------------------------------------
+ Index Scan using collate_test1_idx3 on collate_test1
Filter: (b ~~* 'abc'::text)
-(3 rows)
+(2 rows)
-select * from collate_test1 where b ilike 'abc';
+select * from collate_test1 where b ilike 'abc' order by 1;
a | b
---+-----
1 | abc
@@ -1004,14 +1003,13 @@ select * from collate_test1 where b ilike 'abc';
explain (costs off)
select * from collate_test1 where b ilike 'ABC';
- QUERY PLAN
--------------------------------
- Seq Scan on collate_test1
- Disabled: true
+ QUERY PLAN
+------------------------------------------------------
+ Index Scan using collate_test1_idx3 on collate_test1
Filter: (b ~~* 'ABC'::text)
-(3 rows)
+(2 rows)
-select * from collate_test1 where b ilike 'ABC';
+select * from collate_test1 where b ilike 'ABC' order by 1;
a | b
---+-----
1 | abc
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 5ee2da4e0e..948201c3e1 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -344,10 +344,10 @@ SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_t
set enable_seqscan = off;
explain (costs off)
select * from collate_test1 where b ilike 'abc';
-select * from collate_test1 where b ilike 'abc';
+select * from collate_test1 where b ilike 'abc' order by 1;
explain (costs off)
select * from collate_test1 where b ilike 'ABC';
-select * from collate_test1 where b ilike 'ABC';
+select * from collate_test1 where b ilike 'ABC' order by 1;
reset enable_seqscan;
--
2.34.1
0001-Allow-planner-support-function-to-return-index-condi.patchtext/x-diff; name=0001-Allow-planner-support-function-to-return-index-condi.patchDownload
From 1732b4fce53cb72445dd19bf61cb97c0c12b417c Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 18 Dec 2024 12:04:55 +0900
Subject: [PATCH 1/2] Allow planner support function to return index conditions
in OR clause
Previously, as a response to SupportRequestIndexCondition, planner
support functions could return only directly indexable condition but
not multiple index conditions formed in OR clause. To increase the
opportunity to use index in support function uses, this commit enables
get_index_clause_from_support receive OR clauses from support functions,
and use them to build bitmap index scan later
---
src/backend/optimizer/path/indxpath.c | 118 ++++++++++++++++++--------
1 file changed, 83 insertions(+), 35 deletions(-)
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5d102a0d37..cc27f678ab 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -134,7 +134,8 @@ static double adjust_rowcount_for_semijoins(PlannerInfo *root,
static double approximate_joinrel_size(PlannerInfo *root, Relids relids);
static void match_restriction_clauses_to_index(PlannerInfo *root,
IndexOptInfo *index,
- IndexClauseSet *clauseset);
+ IndexClauseSet *clauseset,
+ List **orclauses);
static void match_join_clauses_to_index(PlannerInfo *root,
RelOptInfo *rel, IndexOptInfo *index,
IndexClauseSet *clauseset,
@@ -145,15 +146,18 @@ static void match_eclass_clauses_to_index(PlannerInfo *root,
static void match_clauses_to_index(PlannerInfo *root,
List *clauses,
IndexOptInfo *index,
- IndexClauseSet *clauseset);
+ IndexClauseSet *clauseset,
+ List **orclauses);
static void match_clause_to_index(PlannerInfo *root,
RestrictInfo *rinfo,
IndexOptInfo *index,
- IndexClauseSet *clauseset);
+ IndexClauseSet *clauseset,
+ List **orclauses);
static IndexClause *match_clause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List **orclauses);
static bool IsBooleanOpfamily(Oid opfamily);
static IndexClause *match_boolean_index_clause(PlannerInfo *root,
RestrictInfo *rinfo,
@@ -161,17 +165,20 @@ static IndexClause *match_boolean_index_clause(PlannerInfo *root,
static IndexClause *match_opclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List **orclauses);
static IndexClause *match_funcclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List **orclauses);
static IndexClause *get_index_clause_from_support(PlannerInfo *root,
RestrictInfo *rinfo,
Oid funcid,
int indexarg,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List* *orclauses);
static IndexClause *match_saopclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
@@ -244,6 +251,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
List *bitindexpaths;
List *bitjoinpaths;
List *joinorclauses;
+ List *orclauses;
IndexClauseSet rclauseset;
IndexClauseSet jclauseset;
IndexClauseSet eclauseset;
@@ -254,7 +262,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
return;
/* Bitmap paths are collected and then dealt with at the end */
- bitindexpaths = bitjoinpaths = joinorclauses = NIL;
+ bitindexpaths = bitjoinpaths = joinorclauses = orclauses = NIL;
/* Examine each index in turn */
foreach(lc, rel->indexlist)
@@ -274,9 +282,11 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
/*
* Identify the restriction clauses that can match the index.
+ * Also, collect OR clauses returnd by support functions for later.
*/
MemSet(&rclauseset, 0, sizeof(rclauseset));
- match_restriction_clauses_to_index(root, index, &rclauseset);
+ match_restriction_clauses_to_index(root, index,
+ &rclauseset, &orclauses);
/*
* Build index paths from the restriction clauses. These will be
@@ -321,7 +331,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
* restriction list. Add these to bitindexpaths.
*/
indexpaths = generate_bitmap_or_paths(root, rel,
- rel->baserestrictinfo, NIL);
+ rel->baserestrictinfo, NULL);
bitindexpaths = list_concat(bitindexpaths, indexpaths);
/*
@@ -332,6 +342,14 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
joinorclauses, rel->baserestrictinfo);
bitjoinpaths = list_concat(bitjoinpaths, indexpaths);
+ /*
+ * Likewise, generate BitmapOrPaths for any suitable OR-clauses present in
+ * the orclauses returned by a support function. Add these to bitjoinpaths.
+ */
+ indexpaths = generate_bitmap_or_paths(root, rel,
+ orclauses, rel->baserestrictinfo);
+ bitjoinpaths = list_concat(bitjoinpaths, indexpaths);
+
/*
* If we found anything usable, generate a BitmapHeapPath for the most
* promising combination of restriction bitmap index paths. Note there
@@ -1145,7 +1163,7 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
* Identify the restriction clauses that can match the index.
*/
MemSet(&clauseset, 0, sizeof(clauseset));
- match_clauses_to_index(root, clauses, index, &clauseset);
+ match_clauses_to_index(root, clauses, index, &clauseset, NULL);
/*
* If no matches so far, and the index predicate isn't useful, we
@@ -1157,7 +1175,7 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
/*
* Add "other" restriction clauses to the clauseset.
*/
- match_clauses_to_index(root, other_clauses, index, &clauseset);
+ match_clauses_to_index(root, other_clauses, index, &clauseset, NULL);
/*
* Construct paths if possible.
@@ -2398,15 +2416,18 @@ approximate_joinrel_size(PlannerInfo *root, Relids relids)
/*
* match_restriction_clauses_to_index
* Identify restriction clauses for the rel that match the index.
- * Matching clauses are added to *clauseset.
+ * Matching clauses are added to *clauseset. Also, OR clauses
+ * returned by support functions are collected in orclauses.
*/
static void
match_restriction_clauses_to_index(PlannerInfo *root,
IndexOptInfo *index,
- IndexClauseSet *clauseset)
+ IndexClauseSet *clauseset,
+ List **orclauses)
{
/* We can ignore clauses that are implied by the index predicate */
- match_clauses_to_index(root, index->indrestrictinfo, index, clauseset);
+ match_clauses_to_index(root, index->indrestrictinfo, index,
+ clauseset, orclauses);
}
/*
@@ -2436,7 +2457,7 @@ match_join_clauses_to_index(PlannerInfo *root,
if (restriction_is_or_clause(rinfo))
*joinorclauses = lappend(*joinorclauses, rinfo);
else
- match_clause_to_index(root, rinfo, index, clauseset);
+ match_clause_to_index(root, rinfo, index, clauseset, NULL);
}
}
@@ -2474,20 +2495,21 @@ match_eclass_clauses_to_index(PlannerInfo *root, IndexOptInfo *index,
* since for non-btree indexes the EC's equality operators might not
* be in the index opclass (cf ec_member_matches_indexcol).
*/
- match_clauses_to_index(root, clauses, index, clauseset);
+ match_clauses_to_index(root, clauses, index, clauseset, NULL);
}
}
/*
* match_clauses_to_index
* Perform match_clause_to_index() for each clause in a list.
- * Matching clauses are added to *clauseset.
+ * Matching clauses are added to *clauseset. Also, OR clauses
+ * returned by support functions are collected in orclauses.
*/
static void
match_clauses_to_index(PlannerInfo *root,
List *clauses,
IndexOptInfo *index,
- IndexClauseSet *clauseset)
+ IndexClauseSet *clauseset, List **orclauses)
{
ListCell *lc;
@@ -2495,7 +2517,7 @@ match_clauses_to_index(PlannerInfo *root,
{
RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
- match_clause_to_index(root, rinfo, index, clauseset);
+ match_clause_to_index(root, rinfo, index, clauseset, orclauses);
}
}
@@ -2505,7 +2527,8 @@ match_clauses_to_index(PlannerInfo *root,
*
* If the clause is usable, add an IndexClause entry for it to the appropriate
* list in *clauseset. (*clauseset must be initialized to zeroes before first
- * call.)
+ * call.) Also, OR clauses returned by support functions are collected in
+ * orclauses.
*
* Note: in some circumstances we may find the same RestrictInfos coming from
* multiple places. Defend against redundant outputs by refusing to add a
@@ -2520,7 +2543,8 @@ static void
match_clause_to_index(PlannerInfo *root,
RestrictInfo *rinfo,
IndexOptInfo *index,
- IndexClauseSet *clauseset)
+ IndexClauseSet *clauseset,
+ List **orclauses)
{
int indexcol;
@@ -2559,7 +2583,8 @@ match_clause_to_index(PlannerInfo *root,
iclause = match_clause_to_indexcol(root,
rinfo,
indexcol,
- index);
+ index,
+ orclauses);
if (iclause)
{
/* Success, so record it */
@@ -2635,6 +2660,7 @@ match_clause_to_index(PlannerInfo *root,
* 'rinfo' is the clause to be tested (as a RestrictInfo node).
* 'indexcol' is a column number of 'index' (counting from 0).
* 'index' is the index of interest.
+ * 'orclauses" stores OR clauses returned by planner support functions.
*
* Returns an IndexClause if the clause can be used with this index key,
* or NULL if not.
@@ -2646,7 +2672,8 @@ static IndexClause *
match_clause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
IndexClause *iclause;
Expr *clause = rinfo->clause;
@@ -2677,11 +2704,13 @@ match_clause_to_indexcol(PlannerInfo *root,
*/
if (IsA(clause, OpExpr))
{
- return match_opclause_to_indexcol(root, rinfo, indexcol, index);
+ return match_opclause_to_indexcol(root, rinfo, indexcol, index,
+ orclauses);
}
else if (IsA(clause, FuncExpr))
{
- return match_funcclause_to_indexcol(root, rinfo, indexcol, index);
+ return match_funcclause_to_indexcol(root, rinfo, indexcol, index,
+ orclauses);
}
else if (IsA(clause, ScalarArrayOpExpr))
{
@@ -2839,7 +2868,8 @@ static IndexClause *
match_opclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
IndexClause *iclause;
OpExpr *clause = (OpExpr *) rinfo->clause;
@@ -2895,7 +2925,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
/*
* If we didn't find a member of the index's opfamily, try the support
- * function for the operator's underlying function.
+ * function for the operator's underlying function. OR clauses returned
+ * by the support function are collected in orclauses.
*/
set_opfuncid(clause); /* make sure we have opfuncid */
return get_index_clause_from_support(root,
@@ -2903,7 +2934,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
clause->opfuncid,
0, /* indexarg on left */
indexcol,
- index);
+ index,
+ orclauses);
}
if (match_index_to_operand(rightop, indexcol, index) &&
@@ -2935,7 +2967,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
/*
* If we didn't find a member of the index's opfamily, try the support
- * function for the operator's underlying function.
+ * function for the operator's underlying function. OR clauses returned
+ * by the support function are collected in orclauses.
*/
set_opfuncid(clause); /* make sure we have opfuncid */
return get_index_clause_from_support(root,
@@ -2943,7 +2976,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
clause->opfuncid,
1, /* indexarg on right */
indexcol,
- index);
+ index,
+ orclauses);
}
return NULL;
@@ -2958,7 +2992,8 @@ static IndexClause *
match_funcclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
FuncExpr *clause = (FuncExpr *) rinfo->clause;
int indexarg;
@@ -2968,7 +3003,8 @@ match_funcclause_to_indexcol(PlannerInfo *root,
* We have no built-in intelligence about function clauses, but if there's
* a planner support function, it might be able to do something. But, to
* cut down on wasted planning cycles, only call the support function if
- * at least one argument matches the target index column.
+ * at least one argument matches the target index column. Also, OR clauses
+ * returned by the support function are collected in orclauses.
*
* Note that we don't insist on the other arguments being pseudoconstants;
* the support function has to check that. This is to allow cases where
@@ -2986,7 +3022,8 @@ match_funcclause_to_indexcol(PlannerInfo *root,
clause->funcid,
indexarg,
indexcol,
- index);
+ index,
+ orclauses);
}
indexarg++;
@@ -2999,6 +3036,8 @@ match_funcclause_to_indexcol(PlannerInfo *root,
* get_index_clause_from_support()
* If the function has a planner support function, try to construct
* an IndexClause using indexquals created by the support function.
+ * If the support function returns OR clauses, collect them into
+ * orclauses for bitmap scans.
*/
static IndexClause *
get_index_clause_from_support(PlannerInfo *root,
@@ -3006,7 +3045,8 @@ get_index_clause_from_support(PlannerInfo *root,
Oid funcid,
int indexarg,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
Oid prosupport = get_func_support(funcid);
SupportRequestIndexCondition req;
@@ -3045,6 +3085,14 @@ get_index_clause_from_support(PlannerInfo *root,
{
Expr *clause = (Expr *) lfirst(lc);
+ if (is_orclause(clause))
+ {
+ if (orclauses)
+ *orclauses = lappend(*orclauses,
+ make_simple_restrictinfo(root, clause));
+ continue;
+ }
+
indexquals = lappend(indexquals,
make_simple_restrictinfo(root, clause));
}
--
2.34.1
On Fri, 20 Dec 2024 03:22:26 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
Hi,
Currently, btree indexes cannot used for ILIKE (~~*) operator if the pattern
has case-varying characters although LIKE (~~) expression can be converted
to indexable clauses by the planner support function (if the collation
is "C" or operator class 'text_pattern_ops' is used).For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND t < '123fop')"
and index scan can be used for this condition. On the other hand, "t ~~* '123foo'"
cannot be converted and sequential scan is used.Even in this case, we can use a bitmap index scan for the condition
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')" followed by
recheck by the original condition "t ~~* '123foo'", and this could be faster
than seqscan.However, even if the support function would return OR clauses, the current
planner implementation cannot not build bitmap scan paths using them.The attached patches allow to ILIKE (~~*) forward matching to use btree index.
The patch 0001 enables get_index_clause_from_support to receive OR clauses
from support functions and use them to build bitmap index scan later. OR clauses
returned by support functions are collected in the new argument 'orclauses" of
match_restriction_clauses_to_index(), and they are passed to
generate_bitmap_or_paths() later to build bitmap scan paths.The patch 0002 modifies the support function to return OR clauses as an example
above when ILIKE's pattern has case-varying characters in forward matching. The
OR clauses contains two index conditions for the upper and the lower letter of
the first case-varying character in the pattern respectively. Although the
subsequent characters are not considered in the index scans, it could enough be
faster then sequent scan.Here is an example.
1. Create a table with random text records
=# CREATE TABLE tbl (t text);
=# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x END
FROM (SELECT i, md5(i::text) x FROM generate_series(1,5000000) i);2. Create an index
=# CREATE INDEX ON tbl (t text_pattern_ops);3. Before applying patch: Seq Scan was selected. It takes about 4 sec.
=# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%';
I am sorry, the example in my previous main was wrong. I showed the plan
with enable_index_scan = off. Indead, if the pattern starts with
case-insensitive characters like '12', an index scan can be used even with
ILIKE.
postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..69129.61 rows=500 width=33) (actual time=36.317..4034.770 rows=1188 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=99 read=41939
-> Parallel Seq Scan on tbl (cost=0.00..68079.61 rows=208 width=33) (actual time=19.908..4023.668 rows=396 loops=3)
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 1666271
Buffers: shared hit=99 read=41939
Planning Time: 0.726 ms
Execution Time: 4035.101 ms
(10 rows)
4. After applying patch: Bitmap Index Scan was selected.
postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=12563.66..58314.53 rows=500 width=33) (actual time=156.485..1266.789 rows=1188 loops=1)
Recheck Cond: (((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text) AND (t ~~* 'abc%'::text)) OR ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text) AND (t ~~* 'abc%'::text)))
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 311473
Heap Blocks: exact=42010
Buffers: shared hit=1 read=44600
-> BitmapOr (cost=12563.66..12563.66 rows=297029 width=0) (actual time=136.979..136.980 rows=0 loops=1)
Buffers: shared hit=1 read=2590
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=80.548..80.549 rows=158375 loops=1)
Index Cond: ((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text))
Buffers: shared read=1272
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=56.427..56.427 rows=157042 loops=1)
Index Cond: ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text))
Buffers: shared hit=1 read=1318
Planning Time: 0.592 ms
Execution Time: 1267.162 ms
(16 rows)
What do you think about it?
I think another application of OR-clause returning support function might be
allowing to use an index for NOT LIKE (!~~) expression because, for example,
"t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >= '123fooz')".
(The second condition is a bit loose but this would be safe and not problem
since tuples are filtered by the original condition after the index scan.)
However, it would not very useful unless the distribution is much skew so that
NOT LIKE expression's selectivity is enough small.
Regards,
Yugo Nagata
--
Yugo Nagata <nagata@sraoss.co.jp>
On Tue, 24 Dec 2024 16:04:42 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:
On Fri, 20 Dec 2024 03:22:26 +0900
Yugo Nagata <nagata@sraoss.co.jp> wrote:Hi,
Currently, btree indexes cannot used for ILIKE (~~*) operator if the pattern
has case-varying characters although LIKE (~~) expression can be converted
to indexable clauses by the planner support function (if the collation
is "C" or operator class 'text_pattern_ops' is used).For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND t < '123fop')"
and index scan can be used for this condition. On the other hand, "t ~~* '123foo'"
cannot be converted and sequential scan is used.Even in this case, we can use a bitmap index scan for the condition
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')" followed by
recheck by the original condition "t ~~* '123foo'", and this could be faster
than seqscan.However, even if the support function would return OR clauses, the current
planner implementation cannot not build bitmap scan paths using them.The attached patches allow to ILIKE (~~*) forward matching to use btree index.
The patch 0001 enables get_index_clause_from_support to receive OR clauses
from support functions and use them to build bitmap index scan later. OR clauses
returned by support functions are collected in the new argument 'orclauses" of
match_restriction_clauses_to_index(), and they are passed to
generate_bitmap_or_paths() later to build bitmap scan paths.The patch 0002 modifies the support function to return OR clauses as an example
above when ILIKE's pattern has case-varying characters in forward matching. The
OR clauses contains two index conditions for the upper and the lower letter of
the first case-varying character in the pattern respectively. Although the
subsequent characters are not considered in the index scans, it could enough be
faster then sequent scan.Here is an example.
1. Create a table with random text records
=# CREATE TABLE tbl (t text);
=# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x END
FROM (SELECT i, md5(i::text) x FROM generate_series(1,5000000) i);2. Create an index
=# CREATE INDEX ON tbl (t text_pattern_ops);3. Before applying patch: Seq Scan was selected. It takes about 4 sec.
=# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%';
I am sorry, the example in my previous main was wrong. I showed the plan
with enable_index_scan = off. Indead, if the pattern starts with
case-insensitive characters like '12', an index scan can be used even with
ILIKE.postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..69129.61 rows=500 width=33) (actual time=36.317..4034.770 rows=1188 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=99 read=41939
-> Parallel Seq Scan on tbl (cost=0.00..68079.61 rows=208 width=33) (actual time=19.908..4023.668 rows=396 loops=3)
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 1666271
Buffers: shared hit=99 read=41939
Planning Time: 0.726 ms
Execution Time: 4035.101 ms
(10 rows)4. After applying patch: Bitmap Index Scan was selected.
postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=12563.66..58314.53 rows=500 width=33) (actual time=156.485..1266.789 rows=1188 loops=1)
Recheck Cond: (((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text) AND (t ~~* 'abc%'::text)) OR ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text) AND (t ~~* 'abc%'::text)))
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 311473
Heap Blocks: exact=42010
Buffers: shared hit=1 read=44600
-> BitmapOr (cost=12563.66..12563.66 rows=297029 width=0) (actual time=136.979..136.980 rows=0 loops=1)
Buffers: shared hit=1 read=2590
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=80.548..80.549 rows=158375 loops=1)
Index Cond: ((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text))
Buffers: shared read=1272
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71 rows=148515 width=0) (actual time=56.427..56.427 rows=157042 loops=1)
Index Cond: ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text))
Buffers: shared hit=1 read=1318
Planning Time: 0.592 ms
Execution Time: 1267.162 ms
(16 rows)What do you think about it?
I think another application of OR-clause returning support function might be
allowing to use an index for NOT LIKE (!~~) expression because, for example,
"t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >= '123fooz')".
(The second condition is a bit loose but this would be safe and not problem
since tuples are filtered by the original condition after the index scan.)
However, it would not very useful unless the distribution is much skew so that
NOT LIKE expression's selectivity is enough small.
I added a new patch 0003 that enables ILIKE forward matching to use a SP-Gist index.
Similar to 0002, this generates BitmapOr Index Scan for two index clauses that use
"^@" operator for upper letter and lower letter pattern respectively.
* Before applying the patch:
postgres=# explain analyze select* from tbl where t ilike 'abc%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..18899.52 rows=101 width=33) (actual time=41.799..930.382 rows=253 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=96 read=12533
-> Parallel Seq Scan on tbl (cost=0.00..17889.42 rows=42 width=33) (actual time=26.041..917.570 rows=84 loops=3)
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 336582
Buffers: shared hit=96 read=12533
Planning Time: 0.690 ms
Execution Time: 930.545 ms
(10 rows)
* After applying the patch:
postgres=# explain analyze select* from tbl where t ilike 'abc%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=3307.96..16702.11 rows=101 width=33) (actual time=69.449..215.636 rows=253 loops=1)
Recheck Cond: (((t ^@ 'A'::text) AND (t ~~* 'abc%'::text)) OR ((t ^@ 'a'::text) AND (t ~~* 'abc%'::text)))
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 62992
Heap Blocks: exact=12437
Buffers: shared hit=18529
-> BitmapOr (cost=3307.96..3307.96 rows=61212 width=0) (actual time=62.567..62.568 rows=0 loops=1)
Buffers: shared hit=6092
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96 rows=30606 width=0) (actual time=41.893..41.893 rows=31536 loops=1)
Index Cond: (t ^@ 'A'::text)
Buffers: shared hit=2461
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96 rows=30606 width=0) (actual time=20.671..20.671 rows=31709 loops=1)
Index Cond: (t ^@ 'a'::text)
Buffers: shared hit=3631
Planning Time: 1.414 ms
Execution Time: 215.903 ms
(16 rows)
--
Yugo NAGATA <nagata@sraoss.co.jp>
Attachments:
v2-0003-Allow-ILIKE-forward-matching-to-use-spgist-index.patchtext/x-diff; name=v2-0003-Allow-ILIKE-forward-matching-to-use-spgist-index.patchDownload
From a57da6ca83f85bdf692af6623f36a00b5facce97 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Tue, 14 Jan 2025 20:13:38 +0900
Subject: [PATCH v2 3/3] Allow ILIKE forward matching to use spgist index
Previously, ILIKE could not use spgist index if the pattern
has case-varying characters. To enable it, the planner
support function converts an ILINK operator expression to OR
clause that contains two index clauses for the upper letter and
the lower letter respectively.
---
src/backend/utils/adt/like_support.c | 63 +++++++++++++++++-----------
1 file changed, 38 insertions(+), 25 deletions(-)
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 2344615038..d9af798aa0 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -414,31 +414,8 @@ match_pattern_prefix(Node *leftop,
return NIL;
/*
- * Otherwise, we have a nonempty required prefix of the values. Some
- * opclasses support prefix checks directly, otherwise we'll try to
- * generate a range constraint.
- */
- if (OidIsValid(preopr) && op_in_opfamily(preopr, opfamily))
- {
- expr = make_opclause(preopr, BOOLOID, false,
- (Expr *) leftop, (Expr *) prefix,
- InvalidOid, indexcollation);
- result = list_make1(expr);
- return result;
- }
-
- /*
- * Since we need a range constraint, it's only going to work reliably if
- * the index is collation-insensitive or has "C" collation. Note that
- * here we are looking at the index's collation, not the expression's
- * collation -- this test is *not* dependent on the LIKE/regex operator's
- * collation.
- */
- if (collation_aware &&
- !pg_newlocale_from_collation(indexcollation)->collate_is_c)
- return NIL;
-
- /*
+ * Otherwise, we have a nonempty required prefix of the values.
+ *
* If the pattern for case-insensiive matching has a case-varying
* character, make two prefixes including the upper letter and the
* lower letter respectively. For example, if the pattern is
@@ -490,6 +467,42 @@ match_pattern_prefix(Node *leftop,
else
prefixes = list_make1(prefix);
+ /*
+ * Some opclasses support prefix checks directly, otherwise we'll try to
+ * generate a range constraint.
+ */
+ if (OidIsValid(preopr) && op_in_opfamily(preopr, opfamily))
+ {
+ foreach (lc, prefixes)
+ {
+ prefix = (Const *) lfirst(lc);
+ expr = make_opclause(preopr, BOOLOID, false,
+ (Expr *) leftop, (Expr *) prefix,
+ InvalidOid, indexcollation);
+ result = lappend(result, expr);
+ }
+
+ /*
+ * If case-insensitive pattern matching generated two clauses for
+ * upper and lower letters, OR them.
+ */
+ if (pstatus == Pattern_Prefix_Partial_IC && list_length(prefixes) > 1)
+ result = list_make1(make_orclause(result));
+
+ return result;
+ }
+
+ /*
+ * Since we need a range constraint, it's only going to work reliably if
+ * the index is collation-insensitive or has "C" collation. Note that
+ * here we are looking at the index's collation, not the expression's
+ * collation -- this test is *not* dependent on the LIKE/regex operator's
+ * collation.
+ */
+ if (collation_aware &&
+ !pg_newlocale_from_collation(indexcollation)->collate_is_c)
+ return NIL;
+
foreach (lc, prefixes)
{
List *clauses;
--
2.34.1
v2-0002-Allow-ILIKE-forward-matching-to-use-btree-index.patchtext/x-diff; name=v2-0002-Allow-ILIKE-forward-matching-to-use-btree-index.patchDownload
From 906282444a5f9eba1067d3515de7ab594ef0d900 Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Thu, 19 Dec 2024 23:41:07 +0900
Subject: [PATCH v2 2/3] Allow ILIKE forward matching to use btree index
Previously, ILIKE could not use btree index if the pattern
has case-varying characters. To enable it, the planner
support function converts an ILINK operator expression to OR
clause that contains two index clauses for the upper letter and
the lower letter respectively.
For example, "t ILIKE '123foo%'" can be converted to
"(t <= '123F'AND t > '123G') OR (t <= '123f' AND t < '123g')",
and bitmap index scan plan could be used for this.
---
src/backend/utils/adt/like_support.c | 141 ++++++++++++++----
src/test/regress/expected/btree_index.out | 15 +-
.../regress/expected/collate.icu.utf8.out | 22 ++-
src/test/regress/sql/collate.icu.utf8.sql | 4 +-
4 files changed, 137 insertions(+), 45 deletions(-)
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 8fdc677371..2344615038 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -66,7 +66,10 @@ typedef enum
typedef enum
{
- Pattern_Prefix_None, Pattern_Prefix_Partial, Pattern_Prefix_Exact,
+ Pattern_Prefix_None,
+ Pattern_Prefix_Partial,
+ Pattern_Prefix_Partial_IC,
+ Pattern_Prefix_Exact,
} Pattern_Prefix_Status;
static Node *like_regex_support(Node *rawreq, Pattern_Type ptype);
@@ -245,7 +248,7 @@ match_pattern_prefix(Node *leftop,
Oid opfamily,
Oid indexcollation)
{
- List *result;
+ List *result = NIL;
Const *patt;
Const *prefix;
Pattern_Prefix_Status pstatus;
@@ -259,6 +262,8 @@ match_pattern_prefix(Node *leftop,
Expr *expr;
FmgrInfo ltproc;
Const *greaterstr;
+ List *prefixes;
+ ListCell *lc;
/*
* Can't do anything with a non-constant or NULL pattern argument.
@@ -434,35 +439,112 @@ match_pattern_prefix(Node *leftop,
return NIL;
/*
- * We can always say "x >= prefix".
+ * If the pattern for case-insensiive matching has a case-varying
+ * character, make two prefixes including the upper letter and the
+ * lower letter respectively. For example, if the pattern is
+ * '123foo%', we get '123F' and '123f' as prefixes.
*/
- if (!op_in_opfamily(geopr, opfamily))
- return NIL;
- expr = make_opclause(geopr, BOOLOID, false,
- (Expr *) leftop, (Expr *) prefix,
- InvalidOid, indexcollation);
- result = list_make1(expr);
-
- /*-------
- * If we can create a string larger than the prefix, we can say
- * "x < greaterstr". NB: we rely on make_greater_string() to generate
- * a guaranteed-greater string, not just a probably-greater string.
- * In general this is only guaranteed in C locale, so we'd better be
- * using a C-locale index collation.
- *-------
- */
- if (!op_in_opfamily(ltopr, opfamily))
- return result;
- fmgr_info(get_opcode(ltopr), <proc);
- greaterstr = make_greater_string(prefix, <proc, indexcollation);
- if (greaterstr)
+ if (pstatus == Pattern_Prefix_Partial_IC)
+ {
+ int prefix_len;
+ Datum first_letter;
+ Datum upper_letter;
+ Datum lower_letter;
+ const char *prefix_l;
+ const char *prefix_u;
+
+ Assert(ptype == Pattern_Type_Like_IC);
+
+ /* get the first case-varying characketer in the pattern */
+ prefix_len = DatumGetInt32(DirectFunctionCall1Coll(textlen,
+ indexcollation,
+ prefix->constvalue));
+ first_letter = DirectFunctionCall3Coll(text_substr, indexcollation,
+ patt->constvalue,
+ Int32GetDatum(prefix_len + 1),
+ Int32GetDatum(1));
+
+
+ /* prefix followed by the upper letter */
+ upper_letter = DirectFunctionCall1Coll(upper, indexcollation,
+ first_letter);
+ prefix_u = TextDatumGetCString(DirectFunctionCall2Coll(textcat,
+ indexcollation,
+ prefix->constvalue,
+ upper_letter));
+ prefixes = list_make1(string_to_const(prefix_u, prefix->consttype));
+
+ /* prefix followed by the lower letter */
+ lower_letter = DirectFunctionCall1Coll(lower, indexcollation,
+ first_letter);
+ prefix_l = TextDatumGetCString(DirectFunctionCall2Coll(textcat,
+ indexcollation,
+ prefix->constvalue,
+ lower_letter));
+ if (!DatumGetBool(DirectFunctionCall2Coll(texteq,
+ indexcollation,
+ CStringGetTextDatum(prefix_u),
+ CStringGetTextDatum(prefix_l))))
+ prefixes = lappend(prefixes, string_to_const(prefix_l, prefix->consttype));
+ }
+ else
+ prefixes = list_make1(prefix);
+
+ foreach (lc, prefixes)
{
- expr = make_opclause(ltopr, BOOLOID, false,
- (Expr *) leftop, (Expr *) greaterstr,
+ List *clauses;
+
+ prefix = (Const *) lfirst(lc);
+
+ /*
+ * We can always say "x >= prefix".
+ */
+ if (!op_in_opfamily(geopr, opfamily))
+ return NIL;
+ expr = make_opclause(geopr, BOOLOID, false,
+ (Expr *) leftop, (Expr *) prefix,
InvalidOid, indexcollation);
- result = lappend(result, expr);
+ clauses = list_make1(expr);
+
+ /*-------
+ * If we can create a string larger than the prefix, we can say
+ * "x < greaterstr". NB: we rely on make_greater_string() to generate
+ * a guaranteed-greater string, not just a probably-greater string.
+ * In general this is only guaranteed in C locale, so we'd better be
+ * using a C-locale index collation.
+ *-------
+ */
+ if (op_in_opfamily(ltopr, opfamily))
+ {
+ fmgr_info(get_opcode(ltopr), <proc);
+ greaterstr = make_greater_string(prefix, <proc, indexcollation);
+ if (greaterstr)
+ {
+ expr = make_opclause(ltopr, BOOLOID, false,
+ (Expr *) leftop, (Expr *) greaterstr,
+ InvalidOid, indexcollation);
+ clauses = lappend(clauses, expr);
+ }
+ }
+
+ /*
+ * If case-insensitive pattern matching generated two clauses for
+ * uppder and lower letters, convert them to explicit AND because
+ * they will be under OR later.
+ */
+ if (pstatus == Pattern_Prefix_Partial_IC && list_length(prefixes) > 1)
+ result = lappend(result, make_ands_explicit(clauses));
+ else
+ result = clauses;
}
+ /*
+ * If case-insensitive pattern matching generated two clauses for upper and
+ * lower letters, OR them.
+ */
+ if (pstatus == Pattern_Prefix_Partial_IC && list_length(prefixes) > 1)
+ result = list_make1(make_orclause(result));
+
return result;
}
@@ -997,6 +1079,7 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
match_pos;
bool is_multibyte = (pg_database_encoding_max_length() > 1);
pg_locale_t locale = 0;
+ bool has_case_varying = false;
/* the right-hand const is type text or bytea */
Assert(typeid == BYTEAOID || typeid == TEXTOID);
@@ -1058,7 +1141,10 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
/* Stop if case-varying character (it's sort of a wildcard) */
if (case_insensitive &&
pattern_char_isalpha(patt[pos], is_multibyte, locale))
+ {
+ has_case_varying = true;
break;
+ }
match[match_pos++] = patt[pos];
}
@@ -1077,6 +1163,9 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
pfree(patt);
pfree(match);
+ if (case_insensitive && has_case_varying)
+ return Pattern_Prefix_Partial_IC;
+
/* in LIKE, an empty pattern is an exact match! */
if (pos == pattlen)
return Pattern_Prefix_Exact; /* reached end of pattern, so exact */
diff --git a/src/test/regress/expected/btree_index.out b/src/test/regress/expected/btree_index.out
index 8879554c3f..6ffad39dad 100644
--- a/src/test/regress/expected/btree_index.out
+++ b/src/test/regress/expected/btree_index.out
@@ -459,14 +459,19 @@ select proname from pg_proc where proname ilike '00%foo' order by 1;
explain (costs off)
select proname from pg_proc where proname ilike 'ri%foo' order by 1;
- QUERY PLAN
-----------------------------------------------
+ QUERY PLAN
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort
Sort Key: proname
- -> Seq Scan on pg_proc
- Disabled: true
+ -> Bitmap Heap Scan on pg_proc
+ Recheck Cond: (((proname >= 'R'::text) AND (proname < 'S'::text) AND (proname ~~* 'ri%foo'::text)) OR ((proname >= 'r'::text) AND (proname < 's'::text) AND (proname ~~* 'ri%foo'::text)))
Filter: (proname ~~* 'ri%foo'::text)
-(5 rows)
+ -> BitmapOr
+ -> Bitmap Index Scan on pg_proc_proname_args_nsp_index
+ Index Cond: ((proname >= 'R'::text) AND (proname < 'S'::text))
+ -> Bitmap Index Scan on pg_proc_proname_args_nsp_index
+ Index Cond: ((proname >= 'r'::text) AND (proname < 's'::text))
+(10 rows)
reset enable_seqscan;
reset enable_indexscan;
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index d4f327636f..f13f9e1245 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -988,14 +988,13 @@ SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_t
set enable_seqscan = off;
explain (costs off)
select * from collate_test1 where b ilike 'abc';
- QUERY PLAN
--------------------------------
- Seq Scan on collate_test1
- Disabled: true
+ QUERY PLAN
+------------------------------------------------------
+ Index Scan using collate_test1_idx3 on collate_test1
Filter: (b ~~* 'abc'::text)
-(3 rows)
+(2 rows)
-select * from collate_test1 where b ilike 'abc';
+select * from collate_test1 where b ilike 'abc' order by 1;
a | b
---+-----
1 | abc
@@ -1004,14 +1003,13 @@ select * from collate_test1 where b ilike 'abc';
explain (costs off)
select * from collate_test1 where b ilike 'ABC';
- QUERY PLAN
--------------------------------
- Seq Scan on collate_test1
- Disabled: true
+ QUERY PLAN
+------------------------------------------------------
+ Index Scan using collate_test1_idx3 on collate_test1
Filter: (b ~~* 'ABC'::text)
-(3 rows)
+(2 rows)
-select * from collate_test1 where b ilike 'ABC';
+select * from collate_test1 where b ilike 'ABC' order by 1;
a | b
---+-----
1 | abc
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 5ee2da4e0e..948201c3e1 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -344,10 +344,10 @@ SELECT relname, pg_get_indexdef(oid) FROM pg_class WHERE relname LIKE 'collate_t
set enable_seqscan = off;
explain (costs off)
select * from collate_test1 where b ilike 'abc';
-select * from collate_test1 where b ilike 'abc';
+select * from collate_test1 where b ilike 'abc' order by 1;
explain (costs off)
select * from collate_test1 where b ilike 'ABC';
-select * from collate_test1 where b ilike 'ABC';
+select * from collate_test1 where b ilike 'ABC' order by 1;
reset enable_seqscan;
--
2.34.1
v2-0001-Allow-planner-support-function-to-return-index-co.patchtext/x-diff; name=v2-0001-Allow-planner-support-function-to-return-index-co.patchDownload
From 7d86be3dc57f5c8d3eb0f45f57dadd7e8a58cd8a Mon Sep 17 00:00:00 2001
From: Yugo Nagata <nagata@sraoss.co.jp>
Date: Wed, 18 Dec 2024 12:04:55 +0900
Subject: [PATCH v2 1/3] Allow planner support function to return index
conditions in OR clause
Previously, as a response to SupportRequestIndexCondition, planner
support functions could return only directly indexable condition but
not multiple index conditions formed in OR clause. To increase the
opportunity to use index in support function uses, this commit enables
get_index_clause_from_support receive OR clauses from support functions,
and use them to build bitmap index scan later
---
src/backend/optimizer/path/indxpath.c | 118 ++++++++++++++++++--------
1 file changed, 83 insertions(+), 35 deletions(-)
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 5f428e835b..628cf7d753 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -134,7 +134,8 @@ static double adjust_rowcount_for_semijoins(PlannerInfo *root,
static double approximate_joinrel_size(PlannerInfo *root, Relids relids);
static void match_restriction_clauses_to_index(PlannerInfo *root,
IndexOptInfo *index,
- IndexClauseSet *clauseset);
+ IndexClauseSet *clauseset,
+ List **orclauses);
static void match_join_clauses_to_index(PlannerInfo *root,
RelOptInfo *rel, IndexOptInfo *index,
IndexClauseSet *clauseset,
@@ -145,15 +146,18 @@ static void match_eclass_clauses_to_index(PlannerInfo *root,
static void match_clauses_to_index(PlannerInfo *root,
List *clauses,
IndexOptInfo *index,
- IndexClauseSet *clauseset);
+ IndexClauseSet *clauseset,
+ List **orclauses);
static void match_clause_to_index(PlannerInfo *root,
RestrictInfo *rinfo,
IndexOptInfo *index,
- IndexClauseSet *clauseset);
+ IndexClauseSet *clauseset,
+ List **orclauses);
static IndexClause *match_clause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List **orclauses);
static bool IsBooleanOpfamily(Oid opfamily);
static IndexClause *match_boolean_index_clause(PlannerInfo *root,
RestrictInfo *rinfo,
@@ -161,17 +165,20 @@ static IndexClause *match_boolean_index_clause(PlannerInfo *root,
static IndexClause *match_opclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List **orclauses);
static IndexClause *match_funcclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List **orclauses);
static IndexClause *get_index_clause_from_support(PlannerInfo *root,
RestrictInfo *rinfo,
Oid funcid,
int indexarg,
int indexcol,
- IndexOptInfo *index);
+ IndexOptInfo *index,
+ List* *orclauses);
static IndexClause *match_saopclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
@@ -244,6 +251,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
List *bitindexpaths;
List *bitjoinpaths;
List *joinorclauses;
+ List *orclauses;
IndexClauseSet rclauseset;
IndexClauseSet jclauseset;
IndexClauseSet eclauseset;
@@ -254,7 +262,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
return;
/* Bitmap paths are collected and then dealt with at the end */
- bitindexpaths = bitjoinpaths = joinorclauses = NIL;
+ bitindexpaths = bitjoinpaths = joinorclauses = orclauses = NIL;
/* Examine each index in turn */
foreach(lc, rel->indexlist)
@@ -274,9 +282,11 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
/*
* Identify the restriction clauses that can match the index.
+ * Also, collect OR clauses returnd by support functions for later.
*/
MemSet(&rclauseset, 0, sizeof(rclauseset));
- match_restriction_clauses_to_index(root, index, &rclauseset);
+ match_restriction_clauses_to_index(root, index,
+ &rclauseset, &orclauses);
/*
* Build index paths from the restriction clauses. These will be
@@ -321,7 +331,7 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
* restriction list. Add these to bitindexpaths.
*/
indexpaths = generate_bitmap_or_paths(root, rel,
- rel->baserestrictinfo, NIL);
+ rel->baserestrictinfo, NULL);
bitindexpaths = list_concat(bitindexpaths, indexpaths);
/*
@@ -332,6 +342,14 @@ create_index_paths(PlannerInfo *root, RelOptInfo *rel)
joinorclauses, rel->baserestrictinfo);
bitjoinpaths = list_concat(bitjoinpaths, indexpaths);
+ /*
+ * Likewise, generate BitmapOrPaths for any suitable OR-clauses present in
+ * the orclauses returned by a support function. Add these to bitjoinpaths.
+ */
+ indexpaths = generate_bitmap_or_paths(root, rel,
+ orclauses, rel->baserestrictinfo);
+ bitjoinpaths = list_concat(bitjoinpaths, indexpaths);
+
/*
* If we found anything usable, generate a BitmapHeapPath for the most
* promising combination of restriction bitmap index paths. Note there
@@ -1145,7 +1163,7 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
* Identify the restriction clauses that can match the index.
*/
MemSet(&clauseset, 0, sizeof(clauseset));
- match_clauses_to_index(root, clauses, index, &clauseset);
+ match_clauses_to_index(root, clauses, index, &clauseset, NULL);
/*
* If no matches so far, and the index predicate isn't useful, we
@@ -1157,7 +1175,7 @@ build_paths_for_OR(PlannerInfo *root, RelOptInfo *rel,
/*
* Add "other" restriction clauses to the clauseset.
*/
- match_clauses_to_index(root, other_clauses, index, &clauseset);
+ match_clauses_to_index(root, other_clauses, index, &clauseset, NULL);
/*
* Construct paths if possible.
@@ -2398,15 +2416,18 @@ approximate_joinrel_size(PlannerInfo *root, Relids relids)
/*
* match_restriction_clauses_to_index
* Identify restriction clauses for the rel that match the index.
- * Matching clauses are added to *clauseset.
+ * Matching clauses are added to *clauseset. Also, OR clauses
+ * returned by support functions are collected in orclauses.
*/
static void
match_restriction_clauses_to_index(PlannerInfo *root,
IndexOptInfo *index,
- IndexClauseSet *clauseset)
+ IndexClauseSet *clauseset,
+ List **orclauses)
{
/* We can ignore clauses that are implied by the index predicate */
- match_clauses_to_index(root, index->indrestrictinfo, index, clauseset);
+ match_clauses_to_index(root, index->indrestrictinfo, index,
+ clauseset, orclauses);
}
/*
@@ -2436,7 +2457,7 @@ match_join_clauses_to_index(PlannerInfo *root,
if (restriction_is_or_clause(rinfo))
*joinorclauses = lappend(*joinorclauses, rinfo);
else
- match_clause_to_index(root, rinfo, index, clauseset);
+ match_clause_to_index(root, rinfo, index, clauseset, NULL);
}
}
@@ -2474,20 +2495,21 @@ match_eclass_clauses_to_index(PlannerInfo *root, IndexOptInfo *index,
* since for non-btree indexes the EC's equality operators might not
* be in the index opclass (cf ec_member_matches_indexcol).
*/
- match_clauses_to_index(root, clauses, index, clauseset);
+ match_clauses_to_index(root, clauses, index, clauseset, NULL);
}
}
/*
* match_clauses_to_index
* Perform match_clause_to_index() for each clause in a list.
- * Matching clauses are added to *clauseset.
+ * Matching clauses are added to *clauseset. Also, OR clauses
+ * returned by support functions are collected in orclauses.
*/
static void
match_clauses_to_index(PlannerInfo *root,
List *clauses,
IndexOptInfo *index,
- IndexClauseSet *clauseset)
+ IndexClauseSet *clauseset, List **orclauses)
{
ListCell *lc;
@@ -2495,7 +2517,7 @@ match_clauses_to_index(PlannerInfo *root,
{
RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
- match_clause_to_index(root, rinfo, index, clauseset);
+ match_clause_to_index(root, rinfo, index, clauseset, orclauses);
}
}
@@ -2505,7 +2527,8 @@ match_clauses_to_index(PlannerInfo *root,
*
* If the clause is usable, add an IndexClause entry for it to the appropriate
* list in *clauseset. (*clauseset must be initialized to zeroes before first
- * call.)
+ * call.) Also, OR clauses returned by support functions are collected in
+ * orclauses.
*
* Note: in some circumstances we may find the same RestrictInfos coming from
* multiple places. Defend against redundant outputs by refusing to add a
@@ -2520,7 +2543,8 @@ static void
match_clause_to_index(PlannerInfo *root,
RestrictInfo *rinfo,
IndexOptInfo *index,
- IndexClauseSet *clauseset)
+ IndexClauseSet *clauseset,
+ List **orclauses)
{
int indexcol;
@@ -2559,7 +2583,8 @@ match_clause_to_index(PlannerInfo *root,
iclause = match_clause_to_indexcol(root,
rinfo,
indexcol,
- index);
+ index,
+ orclauses);
if (iclause)
{
/* Success, so record it */
@@ -2635,6 +2660,7 @@ match_clause_to_index(PlannerInfo *root,
* 'rinfo' is the clause to be tested (as a RestrictInfo node).
* 'indexcol' is a column number of 'index' (counting from 0).
* 'index' is the index of interest.
+ * 'orclauses" stores OR clauses returned by planner support functions.
*
* Returns an IndexClause if the clause can be used with this index key,
* or NULL if not.
@@ -2646,7 +2672,8 @@ static IndexClause *
match_clause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
IndexClause *iclause;
Expr *clause = rinfo->clause;
@@ -2677,11 +2704,13 @@ match_clause_to_indexcol(PlannerInfo *root,
*/
if (IsA(clause, OpExpr))
{
- return match_opclause_to_indexcol(root, rinfo, indexcol, index);
+ return match_opclause_to_indexcol(root, rinfo, indexcol, index,
+ orclauses);
}
else if (IsA(clause, FuncExpr))
{
- return match_funcclause_to_indexcol(root, rinfo, indexcol, index);
+ return match_funcclause_to_indexcol(root, rinfo, indexcol, index,
+ orclauses);
}
else if (IsA(clause, ScalarArrayOpExpr))
{
@@ -2839,7 +2868,8 @@ static IndexClause *
match_opclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
IndexClause *iclause;
OpExpr *clause = (OpExpr *) rinfo->clause;
@@ -2895,7 +2925,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
/*
* If we didn't find a member of the index's opfamily, try the support
- * function for the operator's underlying function.
+ * function for the operator's underlying function. OR clauses returned
+ * by the support function are collected in orclauses.
*/
set_opfuncid(clause); /* make sure we have opfuncid */
return get_index_clause_from_support(root,
@@ -2903,7 +2934,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
clause->opfuncid,
0, /* indexarg on left */
indexcol,
- index);
+ index,
+ orclauses);
}
if (match_index_to_operand(rightop, indexcol, index) &&
@@ -2935,7 +2967,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
/*
* If we didn't find a member of the index's opfamily, try the support
- * function for the operator's underlying function.
+ * function for the operator's underlying function. OR clauses returned
+ * by the support function are collected in orclauses.
*/
set_opfuncid(clause); /* make sure we have opfuncid */
return get_index_clause_from_support(root,
@@ -2943,7 +2976,8 @@ match_opclause_to_indexcol(PlannerInfo *root,
clause->opfuncid,
1, /* indexarg on right */
indexcol,
- index);
+ index,
+ orclauses);
}
return NULL;
@@ -2958,7 +2992,8 @@ static IndexClause *
match_funcclause_to_indexcol(PlannerInfo *root,
RestrictInfo *rinfo,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
FuncExpr *clause = (FuncExpr *) rinfo->clause;
int indexarg;
@@ -2968,7 +3003,8 @@ match_funcclause_to_indexcol(PlannerInfo *root,
* We have no built-in intelligence about function clauses, but if there's
* a planner support function, it might be able to do something. But, to
* cut down on wasted planning cycles, only call the support function if
- * at least one argument matches the target index column.
+ * at least one argument matches the target index column. Also, OR clauses
+ * returned by the support function are collected in orclauses.
*
* Note that we don't insist on the other arguments being pseudoconstants;
* the support function has to check that. This is to allow cases where
@@ -2986,7 +3022,8 @@ match_funcclause_to_indexcol(PlannerInfo *root,
clause->funcid,
indexarg,
indexcol,
- index);
+ index,
+ orclauses);
}
indexarg++;
@@ -2999,6 +3036,8 @@ match_funcclause_to_indexcol(PlannerInfo *root,
* get_index_clause_from_support()
* If the function has a planner support function, try to construct
* an IndexClause using indexquals created by the support function.
+ * If the support function returns OR clauses, collect them into
+ * orclauses for bitmap scans.
*/
static IndexClause *
get_index_clause_from_support(PlannerInfo *root,
@@ -3006,7 +3045,8 @@ get_index_clause_from_support(PlannerInfo *root,
Oid funcid,
int indexarg,
int indexcol,
- IndexOptInfo *index)
+ IndexOptInfo *index,
+ List **orclauses)
{
Oid prosupport = get_func_support(funcid);
SupportRequestIndexCondition req;
@@ -3045,6 +3085,14 @@ get_index_clause_from_support(PlannerInfo *root,
{
Expr *clause = (Expr *) lfirst(lc);
+ if (is_orclause(clause))
+ {
+ if (orclauses)
+ *orclauses = lappend(*orclauses,
+ make_simple_restrictinfo(root, clause));
+ continue;
+ }
+
indexquals = lappend(indexquals,
make_simple_restrictinfo(root, clause));
}
--
2.34.1
On Wed, 2025-01-15 at 01:40 +0900, Yugo NAGATA wrote:
For example, "t ~~ '123foo%'" is converted to "(t >= '123foo' AND
t < '123fop')"
and index scan can be used for this condition. On the other hand,
"t ~~* '123foo'"
cannot be converted and sequential scan is used.Even in this case, we can use a bitmap index scan for the
condition
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')"
followed by
recheck by the original condition "t ~~* '123foo'", and this
could be faster
than seqscan.
In theory, there could be many OR clauses:
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
(t >= '123foo' AND t < '123fop') OR
The patch 0001 enables get_index_clause_from_support to receive
OR clauses
from support functions and use them to build bitmap index scan
later. OR clauses
returned by support functions are collected in the new argument
'orclauses" of
match_restriction_clauses_to_index(), and they are passed to
generate_bitmap_or_paths() later to build bitmap scan paths.
If I understand correctly, this
Show quoted text
The patch 0002 modifies the support function to return OR clauses
as an example
above when ILIKE's pattern has case-varying characters in forward
matching. The
OR clauses contains two index conditions for the upper and the
lower letter of
the first case-varying character in the pattern respectively.
Although the
subsequent characters are not considered in the index scans, it
could enough be
faster then sequent scan.Here is an example.
1. Create a table with random text records
=# CREATE TABLE tbl (t text);
=# INSERT INTO tbl SELECT CASE WHEN i%2=1 THEN upper(x) ELSE x
END
FROM (SELECT i, md5(i::text) x FROM
generate_series(1,5000000) i);2. Create an index
=# CREATE INDEX ON tbl (t text_pattern_ops);3. Before applying patch: Seq Scan was selected. It takes about 4
sec.=# EXPLIN ANALYZE SELECT * FROM tbl WHERE t ILIKE '12ab%';
I am sorry, the example in my previous main was wrong. I showed the
plan
with enable_index_scan = off. Indead, if the pattern starts with
case-insensitive characters like '12', an index scan can be used
even with
ILIKE.postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY
PLAN
-------------------------------------------------------------------
------------------------------------------------------
Gather (cost=1000.00..69129.61 rows=500 width=33) (actual
time=36.317..4034.770 rows=1188 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=99 read=41939
-> Parallel Seq Scan on tbl (cost=0.00..68079.61 rows=208
width=33) (actual time=19.908..4023.668 rows=396 loops=3)
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 1666271
Buffers: shared hit=99 read=41939
Planning Time: 0.726 ms
Execution Time: 4035.101 ms
(10 rows)4. After applying patch: Bitmap Index Scan was selected.
postgres=# EXPLAIN ANALYZE SELECT * FROM tbl WHERE t ILIKE 'abc%';
QUERY
PLAN
-------------------------------------------------------------------
-------------------------------------------------------------------
------------------------
Bitmap Heap Scan on tbl (cost=12563.66..58314.53 rows=500
width=33) (actual time=156.485..1266.789 rows=1188 loops=1)
Recheck Cond: (((t ~>=~ 'A'::text) AND (t ~<~ 'B'::text) AND (t
~~* 'abc%'::text)) OR ((t ~>=~ 'a'::text) AND (t ~<~ 'b'::text) AND
(t ~~* 'abc%'::text)))
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 311473
Heap Blocks: exact=42010
Buffers: shared hit=1 read=44600
-> BitmapOr (cost=12563.66..12563.66 rows=297029 width=0)
(actual time=136.979..136.980 rows=0 loops=1)
Buffers: shared hit=1 read=2590
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71
rows=148515 width=0) (actual time=80.548..80.549 rows=158375
loops=1)
Index Cond: ((t ~>=~ 'A'::text) AND (t ~<~
'B'::text))
Buffers: shared read=1272
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..6281.71
rows=148515 width=0) (actual time=56.427..56.427 rows=157042
loops=1)
Index Cond: ((t ~>=~ 'a'::text) AND (t ~<~
'b'::text))
Buffers: shared hit=1 read=1318
Planning Time: 0.592 ms
Execution Time: 1267.162 ms
(16 rows)
What do you think about it?
I think another application of OR-clause returning support
function might be
allowing to use an index for NOT LIKE (!~~) expression because,
for example,
"t !~~ '123foo%'" could be converted to "(t < '123foo' OR t >=
'123fooz')".
(The second condition is a bit loose but this would be safe and
not problem
since tuples are filtered by the original condition after the
index scan.)
However, it would not very useful unless the distribution is much
skew so that
NOT LIKE expression's selectivity is enough small.I added a new patch 0003 that enables ILIKE forward matching to use a
SP-Gist index.
Similar to 0002, this generates BitmapOr Index Scan for two index
clauses that use
"^@" operator for upper letter and lower letter pattern respectively.* Before applying the patch:
postgres=# explain analyze select* from tbl where t ilike 'abc%';
QUERY
PLAN
---------------------------------------------------------------------
-------------------------------------------------
Gather (cost=1000.00..18899.52 rows=101 width=33) (actual
time=41.799..930.382 rows=253 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=96 read=12533
-> Parallel Seq Scan on tbl (cost=0.00..17889.42 rows=42
width=33) (actual time=26.041..917.570 rows=84 loops=3)
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 336582
Buffers: shared hit=96 read=12533
Planning Time: 0.690 ms
Execution Time: 930.545 ms
(10 rows)* After applying the patch:
postgres=# explain analyze select* from tbl where t ilike 'abc%';
QUERY
PLAN
---------------------------------------------------------------------
----------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=3307.96..16702.11 rows=101 width=33)
(actual time=69.449..215.636 rows=253 loops=1)
Recheck Cond: (((t ^@ 'A'::text) AND (t ~~* 'abc%'::text)) OR ((t
^@ 'a'::text) AND (t ~~* 'abc%'::text)))
Filter: (t ~~* 'abc%'::text)
Rows Removed by Filter: 62992
Heap Blocks: exact=12437
Buffers: shared hit=18529
-> BitmapOr (cost=3307.96..3307.96 rows=61212 width=0) (actual
time=62.567..62.568 rows=0 loops=1)
Buffers: shared hit=6092
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96
rows=30606 width=0) (actual time=41.893..41.893 rows=31536 loops=1)
Index Cond: (t ^@ 'A'::text)
Buffers: shared hit=2461
-> Bitmap Index Scan on tbl_t_idx (cost=0.00..1653.96
rows=30606 width=0) (actual time=20.671..20.671 rows=31709 loops=1)
Index Cond: (t ^@ 'a'::text)
Buffers: shared hit=3631
Planning Time: 1.414 ms
Execution Time: 215.903 ms
(16 rows)
My apologies, I sent the previous email prematurely. Let me try again:
On Wed, 2025-01-15 at 14:34 -0800, Jeff Davis wrote:
On Wed, 2025-01-15 at 01:40 +0900, Yugo NAGATA wrote:
For example, "t ~~ '123foo%'" is converted to "(t >= '123foo'
AND
t < '123fop')"
and index scan can be used for this condition. On the other
hand,
"t ~~* '123foo'"
cannot be converted and sequential scan is used.Even in this case, we can use a bitmap index scan for the
condition
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')"
followed by
recheck by the original condition "t ~~* '123foo'", and this
could be faster
than seqscan.
In theory, there could be many OR clauses:
(t >= '123foo' AND t < '123fop') OR
(t >= '123Foo' AND t < '123Fop') OR
(t >= '123fOo' AND t < '123fOp') OR
(t >= '123FOo' AND t < '123FOp') OR
...
How should that be limited?
Regards,
Jeff Davis
On Wed, 15 Jan 2025 14:40:19 -0800
Jeff Davis <pgsql@j-davis.com> wrote:
My apologies, I sent the previous email prematurely. Let me try again:
On Wed, 2025-01-15 at 14:34 -0800, Jeff Davis wrote:
On Wed, 2025-01-15 at 01:40 +0900, Yugo NAGATA wrote:
For example, "t ~~ '123foo%'" is converted to "(t >= '123foo'
AND
t < '123fop')"
and index scan can be used for this condition. On the other
hand,
"t ~~* '123foo'"
cannot be converted and sequential scan is used.Even in this case, we can use a bitmap index scan for the
condition
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')"
followed by
recheck by the original condition "t ~~* '123foo'", and this
could be faster
than seqscan.In theory, there could be many OR clauses:
(t >= '123foo' AND t < '123fop') OR
(t >= '123Foo' AND t < '123Fop') OR
(t >= '123fOo' AND t < '123fOp') OR
(t >= '123FOo' AND t < '123FOp') OR
...How should that be limited?
Instead of generating complete patterns considering every case-varying characters,
two clauses considering only the first case-varying character are generated.
For example, for the condition "t ILIKE '123foo%'", the generated condition is
"(t >= '123f' AND t < '123g') OR "(t >= '123F' AND t < '123G')".
Rows meeting "(t >= '123f' AND t < '123g')" includes those whose "t" start with
'123f', that is meeting the following;
(t >= '123foo' AND t < '123fop') OR
(t >= '123fOo' AND t < '123fOp') OR
...
, and rows meeting "(t >= '123F' AND t < '123G')" includes those whose "t" start
with '123F', that is meeting the following;
(t >= '123Foo' AND t < '123Fop') OR
(t >= '123FOo' AND t < '123FOp') OR
...
It is required that the second case-varying character and later are checked after
the index scan, and some rows are filtered, but it will be still faster than full
sequential scan.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>
On Thu, 2025-01-16 at 14:53 +0900, Yugo NAGATA wrote:
Instead of generating complete patterns considering every case-
varying characters,
two clauses considering only the first case-varying character are
generated.
Did you consider other approaches that integrate more deeply into the
indexing infrastructure? This feels almost like a skip scan, which
Petter Geoghegan is working on. If you model the disjunctions as skips,
and provide the right API that the index AM can use, then there would
be no limit.
For example:
start scanning at '123FOO'
when you encounter '123FOP' skip to '123FOo'
continue scanning
when you encounter '123FOp' skip to '123FoO'
...
Also, I'm working on better Unicode support to handle multiple case
variants in patterns. For instance, the Greek letter Sigma has three
case variants (one upper and two lower). What's the right API so that
the index AM knows which case variants will sort first, so that the
skips don't go backward?
Regards,
Jeff Davis
On Thu, 16 Jan 2025 09:41:35 -0800
Jeff Davis <pgsql@j-davis.com> wrote:
On Thu, 2025-01-16 at 14:53 +0900, Yugo NAGATA wrote:
Instead of generating complete patterns considering every case-
varying characters,
two clauses considering only the first case-varying character are
generated.Did you consider other approaches that integrate more deeply into the
indexing infrastructure? This feels almost like a skip scan, which
Petter Geoghegan is working on. If you model the disjunctions as skips,
and provide the right API that the index AM can use, then there would
be no limit.For example:
start scanning at '123FOO'
when you encounter '123FOP' skip to '123FOo'
continue scanning
when you encounter '123FOp' skip to '123FoO'
...
That seems true there is similarity between ILIKE search and skip scan,
although, on my understand, skip scan is for multi-column indexes
rather than text. I have no concrete idea how the infrastructure for skip
scan to improve ILIKE, anyway.
Also, I'm working on better Unicode support to handle multiple case
variants in patterns. For instance, the Greek letter Sigma has three
case variants (one upper and two lower). What's the right API so that
the index AM knows which case variants will sort first, so that the
skips don't go backward?
That seems true there is similarity between ILIKE search and skip scan,
although, on my understand, skip scan is for multi-column indexes
rather than text. I have no concrete idea how the infrastructure for skip
scan to improve ILIKE, anyway.
Also, I'm working on better Unicode support to handle multiple case
variants in patterns. For instance, the Greek letter Sigma has three
case variants (one upper and two lower). What's the right API so that
the index AM knows which case variants will sort first, so that the
skips don't go backward?
I don't have idea for handling letters with multiple case variants in my patch,
either. I overlooked such pattern at all. So, I'll withdraw the proposal, for now.
Regards,
Yugo Nagata
--
Yugo NAGATA <nagata@sraoss.co.jp>