Searching a gin index

Started by James Dooleyabout 17 years ago11 messagesgeneral
Jump to latest
#1James Dooley
jamdooley@gmail.com

Hi again,

I have set my configuration as default and I have created a GIN index on
three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed. Also
changing the word car to cars returns nothing even though

select * from plainto_tsquery('cars')

returns 'car'

Could you please point me in the right direction ?

James

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: James Dooley (#1)
Re: Searching a gin index

James,

you provide us a little useful information. If you really want help
you need to provide all information about your setup, sample queries and
explain analyze, test data illustrated your problem.

I, personally, usually prefer to have copy-pasteable information.
I think we all are quite busy, so don't expect to get help if you can't
give yours attention.

For example, this post would be reduced to the simple test query

test=# select version();
version
------------------------------------------------------------------------
PostgreSQL 8.3.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.3
(1 row)

test=# select strip(to_tsvector('english', 'cars and vehicles')) @@
plainto_tsquery('english', 'cars');
?column?
----------
t

If not, you have problem.

Oleg
On Fri, 6 Feb 2009, James Dooley wrote:

Hi again,

I have set my configuration as default and I have created a GIN index on
three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed. Also
changing the word car to cars returns nothing even though

select * from plainto_tsquery('cars')

returns 'car'

Could you please point me in the right direction ?

James

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Richard Huxton
dev@archonet.com
In reply to: James Dooley (#1)
Re: Searching a gin index

James Dooley wrote:

Hi again,

I have set my configuration as default and I have created a GIN index on
three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed.

PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.

So, if you had:
CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.

The following should work for you as a starting point:

CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));

INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;

ANALYSE tsearch_tbl;

CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );

EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
   Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
   ->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
         Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
 Total runtime: 0.121 ms

Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).

Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.

--
Richard Huxton
Archonet Ltd

#4James Dooley
jamdooley@gmail.com
In reply to: Richard Huxton (#3)
Re: Searching a gin index

Oleg, but I am only interested in whether or not the syntax of my
search-query is correct.

Having created the index as I mentioned above, would the correct way of
searching and using that index be

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

or should it be as Richard just mentioned

... AND to_tsvector(title || '' || description || '' || name) @@
plainto_tsquery('car')

or some other way ?

On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:

Show quoted text

James Dooley wrote:

Hi again,

I have set my configuration as default and I have created a GIN index on
three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@

plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed.

PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.

So, if you had:
CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.

The following should work for you as a starting point:

CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));

INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;

ANALYSE tsearch_tbl;

CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );

EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
Total runtime: 0.121 ms

Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).

Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.

--
Richard Huxton
Archonet Ltd

#5Richard Huxton
dev@archonet.com
In reply to: James Dooley (#4)
Re: Searching a gin index

James - don't repeat the whole of the previous message in your replies,
trim it to the relevant part. The message is already archived on the list.

James Dooley wrote:

Oleg, but I am only interested in whether or not the syntax of my
search-query is correct.

Having created the index as I mentioned above, would the correct way of
searching and using that index be

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

That looks like a correct syntax, but it's unlikely to use your index -
your index was created on to_tsvector('my_config', ...)

or should it be as Richard just mentioned

... AND to_tsvector(title || '' || description || '' || name) @@
plainto_tsquery('car')

or some other way ?

Try playing around with the example I gave, and try variations on it. Do
you understand why you have to supply a config when creating the
functional index? Do you understand the difference between
plainto_tsquery() and to_tsquery()?

--
Richard Huxton
Archonet Ltd

#6Oleg Bartunov
oleg@sai.msu.su
In reply to: James Dooley (#4)
Re: Searching a gin index

James,

syntax is documented on
http://www.postgresql.org/docs/8.3/static/textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH
and in the Introduction
http://www.postgresql.org/docs/8.3/static/textsearch-intro.html#TEXTSEARCH-MATCHING
text-search operator was specified for tsvector @@ tsquery.
You did wrong twice, you didn't specified type tsvector and you
forgot about coalesce.

There is general rule for partial indexes - you should use the same
expression in query as you used in create index command.

Oleg
On Fri, 6 Feb 2009, James Dooley wrote:

Oleg, but I am only interested in whether or not the syntax of my
search-query is correct.

Having created the index as I mentioned above, would the correct way of
searching and using that index be

... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')

or should it be as Richard just mentioned

... AND to_tsvector(title || '' || description || '' || name) @@
plainto_tsquery('car')

or some other way ?

On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:

James Dooley wrote:

Hi again,

I have set my configuration as default and I have created a GIN index on
three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@

plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed.

PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.

So, if you had:
CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.

The following should work for you as a starting point:

CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));

INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;

ANALYSE tsearch_tbl;

CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );

EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
Total runtime: 0.121 ms

Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).

Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.

--
Richard Huxton
Archonet Ltd

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#7James Dooley
jamdooley@gmail.com
In reply to: Richard Huxton (#5)
Re: Searching a gin index

On Fri, Feb 6, 2009 at 4:01 PM, Richard Huxton <dev@archonet.com> wrote:

James - don't repeat the whole of the previous message in your replies,
trim it to the relevant part. The message is already archived on the list.

James Dooley wrote:

Oleg, but I am only interested in whether or not the syntax of my
search-query is correct.

Having created the index as I mentioned above, would the correct way of
searching and using that index be

... AND (title || '' || description || '' || name) @@

plainto_tsquery('car')

That looks like a correct syntax, but it's unlikely to use your index -
your index was created on to_tsvector('my_config', ...)

Got it, I will use the same function and parameters as the one created on
the index. As a matter a fact now the index seems to be used after I added
the strip(to_tsvector(...))

Good!!

or should it be as Richard just mentioned

... AND to_tsvector(title || '' || description || '' || name) @@
plainto_tsquery('car')

or some other way ?

Try playing around with the example I gave, and try variations on it. Do
you understand why you have to supply a config when creating the
functional index? Do you understand the difference between
plainto_tsquery() and to_tsquery()?

Yes, I do understand the difference.

--
Richard Huxton
Archonet Ltd

Now everything is working, I added my config to

plainto_tsquery('my_config', 'cars')

and I get results back. I find that strange though since I set the default
to be by config, but appearantly that was just temporary and for the PID
that changed it. Running from my app seemed to keep using the old
configuration.

Thanks all for your help.

Now I have a complete and working fulltext search!

PostgreSQL rules!!

#8Richard Huxton
dev@archonet.com
In reply to: James Dooley (#7)
Re: Searching a gin index

James Dooley wrote:

Now everything is working, I added my config to

plainto_tsquery('my_config', 'cars')

and I get results back. I find that strange though since I set the default
to be by config, but appearantly that was just temporary and for the PID
that changed it. Running from my app seemed to keep using the old
configuration.

Not quite clear on what you mean here, but you always need to specify
the config to use a functional index. The default config when you run
the query isn't guaranteed to be the same as the default when you create
the index. Actually, you can change it mid-transaction if you like.

If you want to set the default permanently do something like "ALTER
DATABASE mydb SET .... = ..."

--
Richard Huxton
Archonet Ltd

#9James Dooley
jamdooley@gmail.com
In reply to: Oleg Bartunov (#6)
Re: Searching a gin index

BTW, Oleg I don't need colasque since those values can't be null.

On Fri, Feb 6, 2009 at 4:11 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Show quoted text

James,

syntax is documented on

http://www.postgresql.org/docs/8.3/static/textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH
and in the Introduction
http://www.postgresql.org/docs/8.3/static/textsearch-intro.html#TEXTSEARCH-MATCHING
text-search operator was specified for tsvector @@ tsquery.
You did wrong twice, you didn't specified type tsvector and you forgot
about coalesce.

There is general rule for partial indexes - you should use the same
expression in query as you used in create index command.

Oleg
On Fri, 6 Feb 2009, James Dooley wrote:

Oleg, but I am only interested in whether or not the syntax of my

search-query is correct.

Having created the index as I mentioned above, would the correct way of
searching and using that index be

... AND (title || '' || description || '' || name) @@
plainto_tsquery('car')

or should it be as Richard just mentioned

... AND to_tsvector(title || '' || description || '' || name) @@
plainto_tsquery('car')

or some other way ?

On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:

James Dooley wrote:

Hi again,

I have set my configuration as default and I have created a GIN index on
three columns, like this

create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))

Searching these columns the way I have

... AND (title || '' || description || '' || name) @@

plainto_tsquery('car')

seems not to be correct since it's taking as much time as non-indexed.

PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.

So, if you had:
CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.

The following should work for you as a starting point:

CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));

INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;

ANALYSE tsearch_tbl;

CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );

EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tsearch_tbl  (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
->  Bitmap Index Scan on tsearch_tbl_words_idx  (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
Total runtime: 0.121 ms

Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).

Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.

--
Richard Huxton
Archonet Ltd

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/&lt;http://www.sai.msu.su/%7Emegera/&gt;
phone: +007(495)939-16-83, +007(495)939-23-83

#10James Dooley
jamdooley@gmail.com
In reply to: Richard Huxton (#8)
Re: Searching a gin index

Richard, I was refereing to the plainto_tsquery

Oleg, I only need colasque if any of those columns can be null, which is not
the case here. Correct?

#11Oleg Bartunov
oleg@sai.msu.su
In reply to: James Dooley (#10)
Re: Searching a gin index

On Fri, 6 Feb 2009, James Dooley wrote:

Richard, I was refereing to the plainto_tsquery

Oleg, I only need colasque if any of those columns can be null, which is not
the case here. Correct?

yes, but for safety and generality I'd use coalesce.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83