Fulltext - multiple single column indexes

Started by esembaabout 17 years ago8 messagesgeneral
Jump to latest
#1esemba
esemba@gmail.com

Hi,
I have table with several columns and need to perform fulltext search over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated columns,
so I've created several single column indexes (one for each column I want to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

This query works, but EXPLAIN has shown me, that postgres doesn't use the
indexes, so the query over a table with several thousands of records last
very long time. I've figured out, that indexes probably cannot be used this
way. What is a recommendation for this scenario?
Indexes over static number of columns work fine, but I can't use them,
because in my application logic I want to let user choose which columns to
search.

Thank you for your reply.
--
View this message in context: http://www.nabble.com/Fulltext---multiple-single-column-indexes-tp22611952p22611952.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#2Richard Huxton
dev@archonet.com
In reply to: esemba (#1)
Re: Fulltext - multiple single column indexes

esemba wrote:

Hi,
I have table with several columns and need to perform fulltext search over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated columns,
so I've created several single column indexes (one for each column I want to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

This query works, but EXPLAIN has shown me, that postgres doesn't use the
indexes

[snip]

You're right in concluding this isn't really going to work. You could
have separate indexes for each column and check them all:

SELECT ... WHERE col1 @@ ... OR col2 @@ ...

Where it thinks it is sensible, PG should use a bitmap and combine the
different index scans. If you already have single-column indexes this
makes a lot of sense.

Alternatively, you could add a fulltext_blocks table with a "source"
column and keep it up to date via triggers. That way you could search
something like:

SELECT some_id FROM fulltext_blocks WHERE words @@ ... AND source IN
('col11', 'col2');

This is more effort, but has the advantage that you can add scores to
each column if you require. It also lets you be really clever and say to
users "you searched for 'foo' on columns 1,2,3 - no matches. There are
matches on other columns - show you these?"

HTH
--
Richard Huxton
Archonet Ltd

#3Oleg Bartunov
oleg@sai.msu.su
In reply to: esemba (#1)
Re: Fulltext - multiple single column indexes

On Thu, 19 Mar 2009, esemba wrote:

Hi,
I have table with several columns and need to perform fulltext search over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated columns,
so I've created several single column indexes (one for each column I want to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

alter table YOURTABLE add columnt fts tsvector;
update YOURTABLE set fts=
to_tsvector('cs', coalesce(annotation, '')) ||
to_tsvector('cs', coalesce(resume, '')) || ...
create index fts_idx on YOURTABLE using gin(fts);
vacuum analyze YOURTABLE;
select * from YOURTABLE where to_tsquery('cs', 'Query text') @@ fts;

This query works, but EXPLAIN has shown me, that postgres doesn't use the
indexes, so the query over a table with several thousands of records last
very long time. I've figured out, that indexes probably cannot be used this
way. What is a recommendation for this scenario?
Indexes over static number of columns work fine, but I can't use them,
because in my application logic I want to let user choose which columns to
search.

Thank you for your reply.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#4esemba
esemba@gmail.com
In reply to: Oleg Bartunov (#3)
Re: Fulltext - multiple single column indexes

Well, thank you both for response, but I'm not sure, I understand Oleg's
solution. This would work, but where is the variability of searched columns?
In your example, I create new indexed column with concatenated vectors of 2
columns. But I sometimes new to search only annotation, sometimes resume,
sometomes both.

Oleg Bartunov wrote:

On Thu, 19 Mar 2009, esemba wrote:

Hi,
I have table with several columns and need to perform fulltext search
over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated
columns,
so I've created several single column indexes (one for each column I want
to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

alter table YOURTABLE add columnt fts tsvector;
update YOURTABLE set fts=
to_tsvector('cs', coalesce(annotation, '')) ||
to_tsvector('cs', coalesce(resume, '')) || ...
create index fts_idx on YOURTABLE using gin(fts);
vacuum analyze YOURTABLE;
select * from YOURTABLE where to_tsquery('cs', 'Query text') @@ fts;

This query works, but EXPLAIN has shown me, that postgres doesn't use the
indexes, so the query over a table with several thousands of records last
very long time. I've figured out, that indexes probably cannot be used
this
way. What is a recommendation for this scenario?
Indexes over static number of columns work fine, but I can't use them,
because in my application logic I want to let user choose which columns
to
search.

Thank you for your reply.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
View this message in context: http://www.nabble.com/Fulltext---multiple-single-column-indexes-tp22611952p22617663.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: esemba (#4)
Re: Fulltext - multiple single column indexes

On Fri, 20 Mar 2009, esemba wrote:

Well, thank you both for response, but I'm not sure, I understand Oleg's
solution. This would work, but where is the variability of searched columns?
In your example, I create new indexed column with concatenated vectors of 2
columns. But I sometimes new to search only annotation, sometimes resume,
sometomes both.

if you assign different labels to the concatenated columns, you can
specify in query which columns you're interested in. Also, you
can explicitly specify weight=0 for columns you're not interested.

Oleg Bartunov wrote:

On Thu, 19 Mar 2009, esemba wrote:

Hi,
I have table with several columns and need to perform fulltext search
over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated
columns,
so I've created several single column indexes (one for each column I want
to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

alter table YOURTABLE add columnt fts tsvector;
update YOURTABLE set fts=
to_tsvector('cs', coalesce(annotation, '')) ||
to_tsvector('cs', coalesce(resume, '')) || ...
create index fts_idx on YOURTABLE using gin(fts);
vacuum analyze YOURTABLE;
select * from YOURTABLE where to_tsquery('cs', 'Query text') @@ fts;

This query works, but EXPLAIN has shown me, that postgres doesn't use the
indexes, so the query over a table with several thousands of records last
very long time. I've figured out, that indexes probably cannot be used
this
way. What is a recommendation for this scenario?
Indexes over static number of columns work fine, but I can't use them,
because in my application logic I want to let user choose which columns
to
search.

Thank you for your reply.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#6esemba
esemba@gmail.com
In reply to: Oleg Bartunov (#5)
Re: Fulltext - multiple single column indexes

Thanks, this works quite fine, but I've ran into some problems so far:
- It's not possible to specify more than 4 labels (just ABCD)
- In query I have to specify searched vectors for each lexem. I think It
would be better to specify searched vectors per-query.

Oleg Bartunov wrote:

On Fri, 20 Mar 2009, esemba wrote:

Well, thank you both for response, but I'm not sure, I understand Oleg's
solution. This would work, but where is the variability of searched
columns?
In your example, I create new indexed column with concatenated vectors of
2
columns. But I sometimes new to search only annotation, sometimes resume,
sometomes both.

if you assign different labels to the concatenated columns, you can
specify in query which columns you're interested in. Also, you
can explicitly specify weight=0 for columns you're not interested.

Oleg Bartunov wrote:

On Thu, 19 Mar 2009, esemba wrote:

Hi,
I have table with several columns and need to perform fulltext search
over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated
columns,
so I've created several single column indexes (one for each column I
want
to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

alter table YOURTABLE add columnt fts tsvector;
update YOURTABLE set fts=
to_tsvector('cs', coalesce(annotation, '')) ||
to_tsvector('cs', coalesce(resume, '')) || ...
create index fts_idx on YOURTABLE using gin(fts);
vacuum analyze YOURTABLE;
select * from YOURTABLE where to_tsquery('cs', 'Query text') @@ fts;

This query works, but EXPLAIN has shown me, that postgres doesn't use
the
indexes, so the query over a table with several thousands of records
last
very long time. I've figured out, that indexes probably cannot be used
this
way. What is a recommendation for this scenario?
Indexes over static number of columns work fine, but I can't use them,
because in my application logic I want to let user choose which columns
to
search.

Thank you for your reply.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
View this message in context: http://www.nabble.com/Fulltext---multiple-single-column-indexes-tp22611952p22627255.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

#7Oleg Bartunov
oleg@sai.msu.su
In reply to: esemba (#6)
Re: Fulltext - multiple single column indexes

On Fri, 20 Mar 2009, esemba wrote:

Thanks, this works quite fine, but I've ran into some problems so far:
- It's not possible to specify more than 4 labels (just ABCD)

this is well known limitation and we are certainly should think about it

- In query I have to specify searched vectors for each lexem. I think It
would be better to specify searched vectors per-query.

we provide low level interface, it's up to you to write your very own
query processing.

Oleg Bartunov wrote:

On Fri, 20 Mar 2009, esemba wrote:

Well, thank you both for response, but I'm not sure, I understand Oleg's
solution. This would work, but where is the variability of searched
columns?
In your example, I create new indexed column with concatenated vectors of
2
columns. But I sometimes new to search only annotation, sometimes resume,
sometomes both.

if you assign different labels to the concatenated columns, you can
specify in query which columns you're interested in. Also, you
can explicitly specify weight=0 for columns you're not interested.

Oleg Bartunov wrote:

On Thu, 19 Mar 2009, esemba wrote:

Hi,
I have table with several columns and need to perform fulltext search
over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated
columns,
so I've created several single column indexes (one for each column I
want
to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

alter table YOURTABLE add columnt fts tsvector;
update YOURTABLE set fts=
to_tsvector('cs', coalesce(annotation, '')) ||
to_tsvector('cs', coalesce(resume, '')) || ...
create index fts_idx on YOURTABLE using gin(fts);
vacuum analyze YOURTABLE;
select * from YOURTABLE where to_tsquery('cs', 'Query text') @@ fts;

This query works, but EXPLAIN has shown me, that postgres doesn't use
the
indexes, so the query over a table with several thousands of records
last
very long time. I've figured out, that indexes probably cannot be used
this
way. What is a recommendation for this scenario?
Indexes over static number of columns work fine, but I can't use them,
because in my application logic I want to let user choose which columns
to
search.

Thank you for your reply.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#8esemba
esemba@gmail.com
In reply to: Oleg Bartunov (#7)
Re: Fulltext - multiple single column indexes

Ok, so what solution for the 4 column limitation would you suggest? I'll
probably create two four-column indexes and OR search over them.

Oleg Bartunov wrote:

On Fri, 20 Mar 2009, esemba wrote:

Thanks, this works quite fine, but I've ran into some problems so far:
- It's not possible to specify more than 4 labels (just ABCD)

this is well known limitation and we are certainly should think about it

- In query I have to specify searched vectors for each lexem. I think It
would be better to specify searched vectors per-query.

we provide low level interface, it's up to you to write your very own
query processing.

Oleg Bartunov wrote:

On Fri, 20 Mar 2009, esemba wrote:

Well, thank you both for response, but I'm not sure, I understand
Oleg's
solution. This would work, but where is the variability of searched
columns?
In your example, I create new indexed column with concatenated vectors
of
2
columns. But I sometimes new to search only annotation, sometimes
resume,
sometomes both.

if you assign different labels to the concatenated columns, you can
specify in query which columns you're interested in. Also, you
can explicitly specify weight=0 for columns you're not interested.

Oleg Bartunov wrote:

On Thu, 19 Mar 2009, esemba wrote:

Hi,
I have table with several columns and need to perform fulltext search
over
volatile number of columns.
I can't use multicolumn gist index or gin index over concatenated
columns,
so I've created several single column indexes (one for each column I
want
to
search) and now I need to query them like this:

to_tsvector('cs', coalesce(annotation, '')) || to_tsvector('cs',
coalesce(resume, '')) || ...
@@ to_tsquery('cs', 'Query text');

alter table YOURTABLE add columnt fts tsvector;
update YOURTABLE set fts=
to_tsvector('cs', coalesce(annotation, '')) ||
to_tsvector('cs', coalesce(resume, '')) || ...
create index fts_idx on YOURTABLE using gin(fts);
vacuum analyze YOURTABLE;
select * from YOURTABLE where to_tsquery('cs', 'Query text') @@ fts;

This query works, but EXPLAIN has shown me, that postgres doesn't use
the
indexes, so the query over a table with several thousands of records
last
very long time. I've figured out, that indexes probably cannot be
used
this
way. What is a recommendation for this scenario?
Indexes over static number of columns work fine, but I can't use
them,
because in my application logic I want to let user choose which
columns
to
search.

Thank you for your reply.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
View this message in context: http://www.nabble.com/Fulltext---multiple-single-column-indexes-tp22611952p22633855.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.