Unexpected planner behavior with *_pattern_ops index matching
Greetings.
I'm in the process of indexing a virtual file system (on 9.2.9, build
info below) and I ran into what I perceive as an inconsistency in the
way index viability is assessed by the planner.
Perhaps I'm misinterpreting the docs, but it seems like stable functions
don't behave as per
http://www.postgresql.org/docs/9.2/static/xfunc-volatility.html when
*_pattern_ops operator classes are used (not sure about others).
Steps I followed to reproduce the anomaly:
geoop_prototype=# -- Build/platform Info:
geoop_prototype=# SELECT version();
version
--------------------------------------------------------------------------------------------------------------
PostgreSQL 9.2.9 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
(1 row)
geoop_prototype=#
----------------------------------------------------------------
geoop_prototype=#
geoop_prototype=# -- Small sample of data volume/distribution from the
involved table and column.
geoop_prototype=# SELECT COUNT(0) FROM inode_segments WHERE (full_path
IS NOT NULL);
count
--------
291019
(1 row)
geoop_prototype=# SELECT char_length(full_path) FROM inode_segments
WHERE (full_path IS NOT NULL) ORDER BY random() LIMIT 10;
char_length
-------------
80
126
108
75
116
71
70
76
137
113
(10 rows)
geoop_prototype=#
----------------------------------------------------------------
geoop_prototype=#
geoop_prototype=# -- Query plan without the operator class-specific
index. As expected the left anchored regex prompts a table scan
geoop_prototype=# EXPLAIN SELECT st_ino FROM inode_segments WHERE
full_path ~ '^/THIS/MATCHES/NOTHING/';
QUERY PLAN
-----------------------------------------------------------------------
Seq Scan on "inode_segments" (cost=0.00..27401.85 rows=29 width=8)
Filter: (("full_path")::"text" ~ '^/THIS/MATCHES/NOTHING/'::"text")
(2 rows)
geoop_prototype=#
----------------------------------------------------------------
geoop_prototype=#
geoop_prototype=# -- We now define an ad-hoc index
geoop_prototype=# CREATE INDEX ix_inode_segments_filter_by_subtree ON
gorfs.inode_segments USING BTREE(full_path varchar_pattern_ops);
CREATE INDEX
geoop_prototype=#
----------------------------------------------------------------
geoop_prototype=#
geoop_prototype=# -- Same query as above. Predictably, the index is now
being scanned instead
geoop_prototype=# EXPLAIN SELECT st_ino FROM inode_segments WHERE
full_path ~ '^/THIS/MATCHES/NOTHING/';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using "ix_inode_segments_filter_by_subtree" on
"inode_segments" (cost=0.00..8.49 rows=29 width=8)
Index Cond: ((("full_path")::"text" ~>=~
'/THIS/MATCHES/NOTHING/'::"text") AND (("full_path")::"text" ~<~
'/THIS/MATCHES/NOTHING0'::"text"))
Filter: (("full_path")::"text" ~ '^/THIS/MATCHES/NOTHING/'::"text")
(3 rows)
geoop_prototype=#
----------------------------------------------------------------
geoop_prototype=#
geoop_prototype=# -- However, when the comparison value comes from a
function that is marked as STABLE, the planner reverts to a full scan
geoop_prototype=# EXPLAIN SELECT st_ino FROM inode_segments WHERE
full_path ~ CONCAT('^/THIS/MATCHES/NOTHING/');
QUERY PLAN
-------------------------------------------------------------------------
Seq Scan on "inode_segments" (cost=0.00..28789.02 rows=29 width=8)
Filter: (("full_path")::"text" ~ "concat"('^/THIS/MATCHES/NOTHING/'))
(2 rows)
geoop_prototype=#
----------------------------------------------------------------
geoop_prototype=#
geoop_prototype=# -- Immutable functions are not affected...
geoop_prototype=# EXPLAIN SELECT st_ino FROM inode_segments WHERE
full_path ~ UPPER('^/THIS/MATCHES/NOTHING/');
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using "ix_inode_segments_filter_by_subtree" on
"inode_segments" (cost=0.00..8.49 rows=29 width=8)
Index Cond: ((("full_path")::"text" ~>=~
'/THIS/MATCHES/NOTHING/'::"text") AND (("full_path")::"text" ~<~
'/THIS/MATCHES/NOTHING0'::"text"))
Filter: (("full_path")::"text" ~ '^/THIS/MATCHES/NOTHING/'::"text")
(3 rows)
----------------------------------------------------------------
geoop_prototype=# -- ... nor are other operator classes (with
preexisting index. Note that CONCAT is again being used here)
geoop_prototype=# EXPLAIN SELECT st_ino FROM inode_segments WHERE
full_path > CONCAT('/THIS/MATCHES/NOTHINA/');
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Index Scan using "uc_no_duplicate_full_paths" on "inode_segments"
(cost=0.00..167.32 rows=418 width=8)
Index Cond: (("full_path")::"text" > '/THIS/MATCHES/NOTHINA/'::"text")
(2 rows)
As you can see, CONCAT()'s output isn't deemed suitable for an index
scan. The same happens for all type-compatible STABLE functions i tried.
Am I missing something here?
TIA and Regards
Fabio Venchiarutti
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Fabio Ugo Venchiarutti <fabio@vuole.me> writes:
As you can see, CONCAT()'s output isn't deemed suitable for an index
scan. The same happens for all type-compatible STABLE functions i tried.
Conversion of the pattern to an index qualification requires that the
pattern be a plan-time constant. STABLE functions, by definition,
are not that.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Conversion of the pattern to an index qualification requires that the
pattern be a plan-time constant. STABLE functions, by definition,
are not that.
Thank you very much
I guess it is correct to assume that the same applies to regular
expressions stored in pl/pgsql variables/arguments then, as they're
inspected after the plan is cached?
Is dynamic SQL the recommended/only way around this?
Regards
Fabio Ugo Venchiarutti
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Fabio Ugo Venchiarutti <fabio@vuole.me> writes:
Conversion of the pattern to an index qualification requires that the
pattern be a plan-time constant. STABLE functions, by definition,
are not that.
I guess it is correct to assume that the same applies to regular
expressions stored in pl/pgsql variables/arguments then, as they're
inspected after the plan is cached?
Not necessarily. Recent PG versions will generate custom plans (ie,
plans for the query with parameter values substituted as constants)
if that consistently offers a significant win over the generic plan.
Which it would as long as the parameter value always reduces to a
reasonably long left-anchored pattern.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general