Full Text Index Scanning
I'm in the process of migrating a project from Oracle to Postgres and have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly efficient,
it's much faster than full-scanning the raw data and is relatively quick.
It doesn't seem that Postgres works this way. Attempting to do this returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"
The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).
Is this something Postgres can do? Or is there a different way to do scan
the index?
TIA,
Matt
Matt, I'd try to use prefix search on original string concatenated with reverse string:
Just tried on some spare table
knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |
1. create index
knn=# create index clean_name_tlz_idx on spot_toulouse using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french', clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* | et:*');
Select looks cumbersome, but you can always write wrapper functions.
The only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same,
but again, it's possible to obtain tsvector by custom function, which
aware about reversing.
Good luck and let me know if this help you.
Oleg
On Fri, 28 Jan 2011, Matt Warner wrote:
I'm in the process of migrating a project from Oracle to Postgres and have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly efficient,
it's much faster than full-scanning the raw data and is relatively quick.It doesn't seem that Postgres works this way. Attempting to do this returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).Is this something Postgres can do? Or is there a different way to do scan
the index?TIA,
Matt
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Thanks Oleg. I'm going to have to experiment with this so that I understand
it better.
Matt
On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Show quoted text
Matt, I'd try to use prefix search on original string concatenated with
reverse string:Just tried on some spare table
knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* |
et:*');Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.Good luck and let me know if this help you.
Oleg
On Fri, 28 Jan 2011, Matt Warner wrote:
I'm in the process of migrating a project from Oracle to Postgres and have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively quick.It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).Is this something Postgres can do? Or is there a different way to do scan
the index?TIA,
Matt
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Reverse isn't a built-in Postgres function, so I found one and installed it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.
Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.
Thanks,
Matt
On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com>wrote:
Show quoted text
Thanks Oleg. I'm going to have to experiment with this so that I understand
it better.Matt
On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Matt, I'd try to use prefix search on original string concatenated with
reverse string:Just tried on some spare table
knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* |
et:*');Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.Good luck and let me know if this help you.
Oleg
On Fri, 28 Jan 2011, Matt Warner wrote:
I'm in the process of migrating a project from Oracle to Postgres and
have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively quick.It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).Is this something Postgres can do? Or is there a different way to do scan
the index?TIA,
Matt
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
What version of Pg you run ? Try latest version.
Oleg
On Sat, 29 Jan 2011, Matt Warner wrote:
Reverse isn't a built-in Postgres function, so I found one and installed it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.Thanks,
Matt
On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com>wrote:
Thanks Oleg. I'm going to have to experiment with this so that I understand
it better.Matt
On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Matt, I'd try to use prefix search on original string concatenated with
reverse string:Just tried on some spare table
knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* |
et:*');Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.Good luck and let me know if this help you.
Oleg
On Fri, 28 Jan 2011, Matt Warner wrote:
I'm in the process of migrating a project from Oracle to Postgres and
have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively quick.It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).Is this something Postgres can do? Or is there a different way to do scan
the index?TIA,
Matt
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
9.0.2
On Sat, Jan 29, 2011 at 9:35 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Show quoted text
What version of Pg you run ? Try latest version.
Oleg
On Sat, 29 Jan 2011, Matt Warner wrote:
Reverse isn't a built-in Postgres function, so I found one and installed
it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.Thanks,
Matt
On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com
wrote:
Thanks Oleg. I'm going to have to experiment with this so that I
understand
it better.Matt
On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Matt, I'd try to use prefix search on original string concatenated with
reverse string:
Just tried on some spare table
knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' ||
reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:*
|
et:*');Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit
confused,
since coordinates of original and reversed words will be not the same,
but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.Good luck and let me know if this help you.
Oleg
On Fri, 28 Jan 2011, Matt Warner wrote:
I'm in the process of migrating a project from Oracle to Postgres and
have
run into a feature question. I know that Postgres has a full-text
search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively
quick.It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).Is this something Postgres can do? Or is there a different way to do
scan
the index?TIA,
Matt
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Doesn't seem to work either. Maybe something changed in 9.1?
create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLE
On Sun, Jan 30, 2011 at 3:28 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:
Show quoted text
I used 9.1dev, but you can try immutable function (from
http://andreas.scherbaum.la/blog/archives/10-Reverse-a-text-in-PostgreSQL.html
)create function reverse(text) returns text as $$
select case when length($1)>0
then substring($1, length($1), 1) || reverse(substring($1, 1,
length($1)-1))else '' end $$ language sql immutable strict;
On Sat, 29 Jan 2011, Matt Warner wrote:
9.0.2
On Sat, Jan 29, 2011 at 9:35 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:
What version of Pg you run ? Try latest version.
Oleg
On Sat, 29 Jan 2011, Matt Warner wrote:
Reverse isn't a built-in Postgres function, so I found one and installed
it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.Is there a specific version of the reverse function you're using? Or am
I
just missing something obvious? This is Postgres 9, BTW.Thanks,
Matt
On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com
wrote:
Thanks Oleg. I'm going to have to experiment with this so that I
understand
it better.Matt
On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su>
wrote:Matt, I'd try to use prefix search on original string concatenated
withreverse string:
Just tried on some spare table
knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' ||
reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@
to_tsquery('french','the:*
|
et:*');Select looks cumbersome, but you can always write wrapper functions.
The
only drawback I see for now is that ranking function will a bit
confused,
since coordinates of original and reversed words will be not the same,
but
again, it's possible to obtain tsvector by custom function, which
aware
about reversing.Good luck and let me know if this help you.
Oleg
On Fri, 28 Jan 2011, Matt Warner wrote:
I'm in the process of migrating a project from Oracle to Postgres and
have
run into a feature question. I know that Postgres has a full-text
search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively
quick.It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).Is this something Postgres can do? Or is there a different way to do
scan
the index?TIA,
Matt
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru
),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Import Notes
Reply to msg id not found: Pine.LNX.4.64.1101301427360.31836@sn.sai.msu.ru
Matt Warner <matt@warnertechnology.com> writes:
Doesn't seem to work either. Maybe something changed in 9.1?
create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLE
That's not the same case he tested. The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter. It should work, AFAICS, with the
two-parameter form.
regards, tom lane
Aha! Thanks for pointing that out. It's indexing now.
Thanks!
Matt
On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Matt Warner <matt@warnertechnology.com> writes:
Doesn't seem to work either. Maybe something changed in 9.1?
create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLEThat's not the same case he tested. The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter. It should work, AFAICS, with the
two-parameter form.regards, tom lane
If I understand this, it looks like this approach allows me to match the
beginnings and endings of words, but not the middle sections. Is that
correct? That is, if I search for "jag" I will find "jaeger" but not
"lobenjager".
Or am I (again) not understanding how this works?
TIA,
Matt
On Sun, Jan 30, 2011 at 9:59 AM, Matt Warner <matt@warnertechnology.com>wrote:
Show quoted text
Aha! Thanks for pointing that out. It's indexing now.
Thanks!
Matt
On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Matt Warner <matt@warnertechnology.com> writes:
Doesn't seem to work either. Maybe something changed in 9.1?
create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLEThat's not the same case he tested. The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter. It should work, AFAICS, with the
two-parameter form.regards, tom lane
Matt Warner <matt@warnertechnology.com> writes:
If I understand this, it looks like this approach allows me to match the
beginnings and endings of words, but not the middle sections.
Yeah, probably. You might consider using contrib/pg_trgm instead if
you need arbitrary substrings.
regards, tom lane
Thanks. pg_trgm looks interesting, but after installing the pg_trgm.sql, I
get error messages when following the documentation.
sggeeorg=> create index test_idx on test using gist(columnname
gist_trgm_ops);
ERROR: operator class "gist_trgm_ops" does not exist for access method
"gist"
STATEMENT: create index test_idx on test using
gist(columnname gist_trgm_ops);
ERROR: operator class "gist_trgm_ops" does not exist for access method
"gist"
On Sun, Jan 30, 2011 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Show quoted text
Matt Warner <matt@warnertechnology.com> writes:
If I understand this, it looks like this approach allows me to match the
beginnings and endings of words, but not the middle sections.Yeah, probably. You might consider using contrib/pg_trgm instead if
you need arbitrary substrings.regards, tom lane