text type has no default operator class for GIN?

Started by Bob Gobeilleover 16 years ago7 messagesgeneral
Jump to latest
#1Bob Gobeille
bob.gobeille@hp.com

CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
("ufile_name");

ERROR: data type text has no default operator class for access method
"gin"
HINT: You must specify an operator class for the index or define a
default operator class for the data type.

This is on a new 8.3 install.
Why is this happening? Isn't creating a GIN index on a text type a
common thing to do?

I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text
Search).

Do I really have to CREATE OPERATOR CLASS to make this work?
I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html

many thanks,
bob

#2Bob Gobeille
bob.gobeille@hp.com
In reply to: Bob Gobeille (#1)
Re: text type has no default operator class for GIN?

On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote:

CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
("ufile_name");

ERROR: data type text has no default operator class for access method
"gin"
HINT: You must specify an operator class for the index or define a
default operator class for the data type.

This is on a new 8.3 install.
Why is this happening? Isn't creating a GIN index on a text type a
common thing to do?

I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text
Search).

Do I really have to CREATE OPERATOR CLASS to make this work?
I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html

I still don't understand the above, but the following works:

create index ufile_name_ginidx on uploadtree using
gin(to_tsvector('english', ufile_name));

Bob

#3Sam Mason
sam@samason.me.uk
In reply to: Bob Gobeille (#2)
Re: text type has no default operator class for GIN?

On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:

On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote:

CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
("ufile_name");

ERROR: data type text has no default operator class for access method
"gin"
HINT: You must specify an operator class for the index or define a
default operator class for the data type.

[..]

I still don't understand the above, but the following works:

create index ufile_name_ginidx on uploadtree using
gin(to_tsvector('english', ufile_name));

Not sure if understand very well myself, but GIN indexes can only speed
up specific access patterns and these are exposed through various
different operators.

When PG refuses to create a GIN index on a plain TEXT column it's saying
that it doesn't how to use those operators with a values of TEXT type.
As soon as you pull this value apart (with the to_tsvector) you end up
with something that PG can get some traction on and all is good.

Maybe a useful question to ask is, what are you expecting PG do to when
you create a GIN index on this TEXT column?

--
Sam http://samason.me.uk/

#4Tom Lane
tgl@sss.pgh.pa.us
In reply to: Sam Mason (#3)
Re: text type has no default operator class for GIN?

Sam Mason <sam@samason.me.uk> writes:

On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:

CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
("ufile_name");
ERROR: data type text has no default operator class for access method
"gin"

Not sure if understand very well myself, but GIN indexes can only speed
up specific access patterns and these are exposed through various
different operators.

What GIN indexes are good for is indexing equality queries on the
components of something the database otherwise thinks of as a single
object. For instance you can GIN-index searches for arrays containing
a particular value as a member.

Now type text doesn't have any built-in notion of a component, other
than individual characters, which aren't normally that interesting
to search for. What I suppose the OP has in mind is full-text
searching, which is looking for component *words*. But "word" is a
very language- and context-dependent concept. And defining which words
are to be considered equal for searching purposes is even more so.
If we'd hard-wired one notion of "word" into datatype text, it wouldn't
be very flexible. The point of the tsvector layer is to have a
configurable way to extract searchable words from a chunk of text.
There are also some implementation advantages like not having to repeat
that processing constantly during a search --- but the main point is
having a place to define what a word is and what search equality means.

regards, tom lane

#5Sam Mason
sam@samason.me.uk
In reply to: Tom Lane (#4)
Re: text type has no default operator class for GIN?

On Tue, Aug 18, 2009 at 08:21:49PM -0400, Tom Lane wrote:

Sam Mason <sam@samason.me.uk> writes:

On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:

CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
("ufile_name");
ERROR: data type text has no default operator class for access method
"gin"

Not sure if understand very well myself, but GIN indexes can only speed
up specific access patterns and these are exposed through various
different operators.

What GIN indexes are good for is indexing equality queries on the
components of something the database otherwise thinks of as a single
object. For instance you can GIN-index searches for arrays containing
a particular value as a member.

Yup, that's a much better description than I could muster!

What I suppose the OP has in mind is full-text
searching, which is looking for component *words*. But "word" is a
very language- and context-dependent concept.

That's what I was trying to get the OP to think about when I said "what
are you expecting PG do to when you create a GIN index on this TEXT
column" but reading it back now I was being my normal oblique self.

Writing nice emails is depressingly difficult!

--
Sam http://samason.me.uk/

#6Bob Gobeille
bob.gobeille@hp.com
In reply to: Tom Lane (#4)
Re: text type has no default operator class for GIN?

On Aug 18, 2009, at 6:21 PM, Tom Lane wrote:

What GIN indexes are good for is indexing equality queries on the
components of something the database otherwise thinks of as a single
object. For instance you can GIN-index searches for arrays containing
a particular value as a member.

Now type text doesn't have any built-in notion of a component, other
than individual characters, which aren't normally that interesting
to search for. What I suppose the OP has in mind is full-text
searching, which is looking for component *words*. But "word" is a
very language- and context-dependent concept. And defining which
words
are to be considered equal for searching purposes is even more so.
If we'd hard-wired one notion of "word" into datatype text, it
wouldn't
be very flexible. The point of the tsvector layer is to have a
configurable way to extract searchable words from a chunk of text.
There are also some implementation advantages like not having to
repeat
that processing constantly during a search --- but the main point is
having a place to define what a word is and what search equality
means.

Yes, I was looking for full text searching in english. Since my
postgresql.conf contained:

default_text_search_config = 'pg_catalog.english'

doesn't this specify the parser, dictionary, and template to use for
full text searching in english? I should have mentioned the above in
my post but since it was in the original conf file (debian install) I
didn't think of it.

Bob Gobeille
bobg@fossology.org

#7Tom Lane
tgl@sss.pgh.pa.us
In reply to: Bob Gobeille (#6)
Re: text type has no default operator class for GIN?

Bob Gobeille <bob.gobeille@hp.com> writes:

Yes, I was looking for full text searching in english. Since my
postgresql.conf contained:

default_text_search_config = 'pg_catalog.english'

doesn't this specify the parser, dictionary, and template to use for
full text searching in english?

It does, but that's a default for tsvector, not bare text.

regards, tom lane