Question on Trigram GIST indexes
I was trying to make Postgresql use a trigram gist index on a varchar
field, but to no avail.
Specifically, I was trying to replicate what is done in this blog post:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html
I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8.
My full table definition is
CREATE TABLE "TEST"
(
"RECID" bigint NOT NULL DEFAULT next_id(),
"TST_PAYLOAD" character varying(255),
CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID")
USING INDEX TABLESPACE local
)
WITH (
OIDS=FALSE
);
CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX"
ON "TEST"
USING btree
("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops)
TABLESPACE local;
CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX"
ON "TEST"
USING gist
("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops)
TABLESPACE local;
CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX"
ON "TEST"
USING gin
("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops)
TABLESPACE local;
The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase.
I have pg_tgrm installed - actually all extensions are present.
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index
as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST
index but do a full table scan instead.
(I am looking for names like 'SEATTLE')
I also tried dropping the btree index but that has no influence on the
behavior.
I'd be grateful if anybody could explain to me what I am doing wrong.
Thanks in advance.
ERR ORR wrote:
Specifically, I was trying to replicate what is done in this blog post:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index
as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST
index but do a full table scan instead.
(I am looking for names like 'SEATTLE')
Have you run VACUUM ANALYZE with the index and data in place (as
shown in the blog post?
Another conspicuous difference is your explicit use of a COLLATE
clause in the index declaration.
-Kevin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Import Notes
Resolved by subject fallback
(forwarded to pgsql-general after it went to Kevin Grittner alone)
On 22 December 2012 22:46, Kevin Grittner <kgrittn@mail.com> wrote:
ERR ORR wrote:
Specifically, I was trying to replicate what is done in this blog post:
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree
index
as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use theGIST
index but do a full table scan instead.
(I am looking for names like 'SEATTLE')Have you run VACUUM ANALYZE with the index and data in place (as
shown in the blog post?Another conspicuous difference is your explicit use of a COLLATE
clause in the index declaration.-Kevin
a) Yes, I ran VACUUM ANALYZE after creating the indexes.
b) The COLLATE pg_catalog."default" clause is inserted by the DB, I run the
CREATE INDEX command without that.
"Default" collation for all my DBs in Postgres is en_US.UTF-8 and both the
system (Linux FC17) and the DB
use encoding UTF8.
I have texts/strings in different languages/charsets, so UTF8 looked like
the best decision to me, instead of, say, ISO-8859-15, which is limited to
just some European charsets. Specifically I am storing strings in European
languages (corresponding to the ISO-8859 series) including diacrites line
äöüñáéíóú ..., Russian, Arabic, Chinese etc. in one column instead of
making different columns/tables and using them via a view because that's my
use case and UTF8 should accommodate that IMHO (or is that an abuse of the
DB?)
Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is
possible? (Oracle doesn't allow that iirc)
Thanks for any insights, pointers ...
R.
Import Notes
Reply to msg id not found: CALtFtEJ6ycrxXXJywGnB8cW1-YXCrrWEGGFSGwzL1YBfoAAU2g@mail.gmail.com
ERR ORR <rd0002@gmail.com> writes:
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree
index as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the
GIST index but do a full table scan instead.
Are you sure it "should" use the index for that? That query doesn't
look very selective to me --- it might well be deciding that a seqscan
is cheaper. You could try forcing the issue with enable_seqscan = off
to see if the query is really unable to match the index, or it just
doesn't like the cost estimate.
Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is
possible? (Oracle doesn't allow that iirc)
FWIW, I think you do want the index to have the database's default
collation, otherwise it could only match LIKE clauses that explicitly
specify the same non-default collation.
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
* I think it "should" use that index based on trying to follow that
exercise.
* The part about changing the collation was an idea in the course of trying
out different things.
** enable_seqscan* is off, and the *sharedmem* and *temp_buffers* are set
so high that most things happen in RAM.
I wonder what it that the other gentleman, Merlin, found out in the
documentation and if he would share that.
I've also tried this on another table I have, with and without other
indexes, but no success :-(
Wondering ...
On 23 January 2013 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
ERR ORR <rd0002@gmail.com> writes:
Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree
index as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the
GIST index but do a full table scan instead.Are you sure it "should" use the index for that? That query doesn't
look very selective to me --- it might well be deciding that a seqscan
is cheaper. You could try forcing the issue with enable_seqscan = off
to see if the query is really unable to match the index, or it just
doesn't like the cost estimate.Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is
possible? (Oracle doesn't allow that iirc)FWIW, I think you do want the index to have the database's default
collation, otherwise it could only match LIKE clauses that explicitly
specify the same non-default collation.regards, tom lane