Full Text Index Scanning

Started by Matt Warnerabout 15 years ago12 messagesgeneral
Jump to latest
#1Matt Warner
matt@warnertechnology.com

I'm in the process of migrating a project from Oracle to Postgres and have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Matt Warner (#1)
Re: Full Text Index Scanning

Matt, I'd try to use prefix search on original string concatenated with reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |

1. create index
knn=# create index clean_name_tlz_idx on spot_toulouse using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french', clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* | et:*');

Select looks cumbersome, but you can always write wrapper functions.
The only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same,
but again, it's possible to obtain tsvector by custom function, which
aware about reversing.

Good luck and let me know if this help you.

Oleg

On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Matt Warner
matt@warnertechnology.com
In reply to: Oleg Bartunov (#2)
Re: Full Text Index Scanning

Thanks Oleg. I'm going to have to experiment with this so that I understand
it better.

Matt

On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Show quoted text

Matt, I'd try to use prefix search on original string concatenated with
reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |

1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* |
et:*');

Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.

Good luck and let me know if this help you.

Oleg

On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and have

run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#4Matt Warner
matt@warnertechnology.com
In reply to: Matt Warner (#3)
Re: Full Text Index Scanning

Reverse isn't a built-in Postgres function, so I found one and installed it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.

Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.

Thanks,

Matt

On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com>wrote:

Show quoted text

Thanks Oleg. I'm going to have to experiment with this so that I understand
it better.

Matt

On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Matt, I'd try to use prefix search on original string concatenated with
reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |

1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* |
et:*');

Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.

Good luck and let me know if this help you.

Oleg

On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and

have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#5Oleg Bartunov
oleg@sai.msu.su
In reply to: Matt Warner (#4)
Re: Full Text Index Scanning

What version of Pg you run ? Try latest version.

Oleg

On Sat, 29 Jan 2011, Matt Warner wrote:

Reverse isn't a built-in Postgres function, so I found one and installed it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.

Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.

Thanks,

Matt

On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com>wrote:

Thanks Oleg. I'm going to have to experiment with this so that I understand
it better.

Matt

On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Matt, I'd try to use prefix search on original string concatenated with
reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |

1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' || reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:* |
et:*');

Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit confused,
since coordinates of original and reversed words will be not the same, but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.

Good luck and let me know if this help you.

Oleg

On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and

have
run into a feature question. I know that Postgres has a full-text search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively quick.

It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do scan
the index?

TIA,

Matt

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#6Matt Warner
matt@warnertechnology.com
In reply to: Oleg Bartunov (#5)
Re: Full Text Index Scanning

9.0.2

On Sat, Jan 29, 2011 at 9:35 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Show quoted text

What version of Pg you run ? Try latest version.

Oleg

On Sat, 29 Jan 2011, Matt Warner wrote:

Reverse isn't a built-in Postgres function, so I found one and installed

it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.

Is there a specific version of the reverse function you're using? Or am I
just missing something obvious? This is Postgres 9, BTW.

Thanks,

Matt

On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com

wrote:

Thanks Oleg. I'm going to have to experiment with this so that I

understand
it better.

Matt

On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Matt, I'd try to use prefix search on original string concatenated with

reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |

1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' ||
reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@ to_tsquery('french','the:*
|
et:*');

Select looks cumbersome, but you can always write wrapper functions. The
only drawback I see for now is that ranking function will a bit
confused,
since coordinates of original and reversed words will be not the same,
but
again, it's possible to obtain tsvector by custom function, which aware
about reversing.

Good luck and let me know if this help you.

Oleg

On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and

have
run into a feature question. I know that Postgres has a full-text
search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively
quick.

It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do
scan
the index?

TIA,

Matt

Regards,

Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#7Matt Warner
matt@warnertechnology.com
In reply to: Matt Warner (#1)
Re: Full Text Index Scanning

Doesn't seem to work either. Maybe something changed in 9.1?

create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLE

On Sun, Jan 30, 2011 at 3:28 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:

Show quoted text

I used 9.1dev, but you can try immutable function (from
http://andreas.scherbaum.la/blog/archives/10-Reverse-a-text-in-PostgreSQL.html
)

create function reverse(text) returns text as $$

select case when length($1)>0

then substring($1, length($1), 1) || reverse(substring($1, 1,
length($1)-1))

else '' end $$ language sql immutable strict;

On Sat, 29 Jan 2011, Matt Warner wrote:

9.0.2

On Sat, Jan 29, 2011 at 9:35 AM, Oleg Bartunov <oleg@sai.msu.su> wrote:

What version of Pg you run ? Try latest version.

Oleg

On Sat, 29 Jan 2011, Matt Warner wrote:

Reverse isn't a built-in Postgres function, so I found one and installed

it.
However, attempting to use it in creating an index gets me the message
"ERROR: functions in index expression must be marked IMMUTABLE", even
though the function declaration already has the immutable argument.

Is there a specific version of the reverse function you're using? Or am
I
just missing something obvious? This is Postgres 9, BTW.

Thanks,

Matt

On Sat, Jan 29, 2011 at 6:46 AM, Matt Warner <matt@warnertechnology.com

wrote:

Thanks Oleg. I'm going to have to experiment with this so that I

understand
it better.

Matt

On Fri, Jan 28, 2011 at 1:12 PM, Oleg Bartunov <oleg@sai.msu.su>
wrote:

Matt, I'd try to use prefix search on original string concatenated
with

reverse string:

Just tried on some spare table

knn=# \d spot_toulouse
Table "public.spot_toulouse"
Column | Type | Modifiers
---------------------+-------------------+-----------
clean_name | character varying |

1. create index knn=# create index clean_name_tlz_idx on spot_toulouse
using gin(to_tsvector('french', clean_name || ' ' ||
reverse(clean_name)));
2.
select clean_name from spot_toulouse where to_tsvector('french',
clean_name|| ' ' || reverse(clean_name) ) @@
to_tsquery('french','the:*
|
et:*');

Select looks cumbersome, but you can always write wrapper functions.
The
only drawback I see for now is that ranking function will a bit
confused,
since coordinates of original and reversed words will be not the same,
but
again, it's possible to obtain tsvector by custom function, which
aware
about reversing.

Good luck and let me know if this help you.

Oleg

On Fri, 28 Jan 2011, Matt Warner wrote:

I'm in the process of migrating a project from Oracle to Postgres and

have

run into a feature question. I know that Postgres has a full-text
search
feature, but it does not allow scanning the index (as opposed to the
data).
Specifically, in Oracle you can do "select * from table where
contains(colname,'%part_of_word%')>1". While this isn't terribly
efficient,
it's much faster than full-scanning the raw data and is relatively
quick.

It doesn't seem that Postgres works this way. Attempting to do this
returns
no rows: "select * from table where to_tsvector(colname) @@
to_tsquery('%part_of_word%')"

The reason I want to do this is that the partial word search does not
involve dictionary words (it's scanning names).

Is this something Postgres can do? Or is there a different way to do
scan
the index?

TIA,

Matt

Regards,

Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru
),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,

Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#8Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matt Warner (#7)
Re: Full Text Index Scanning

Matt Warner <matt@warnertechnology.com> writes:

Doesn't seem to work either. Maybe something changed in 9.1?
create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLE

That's not the same case he tested. The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter. It should work, AFAICS, with the
two-parameter form.

regards, tom lane

#9Matt Warner
matt@warnertechnology.com
In reply to: Tom Lane (#8)
Re: Full Text Index Scanning

Aha! Thanks for pointing that out. It's indexing now.

Thanks!

Matt

On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Matt Warner <matt@warnertechnology.com> writes:

Doesn't seem to work either. Maybe something changed in 9.1?
create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLE

That's not the same case he tested. The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter. It should work, AFAICS, with the
two-parameter form.

regards, tom lane

#10Matt Warner
matt@warnertechnology.com
In reply to: Matt Warner (#9)
Re: Full Text Index Scanning

If I understand this, it looks like this approach allows me to match the
beginnings and endings of words, but not the middle sections. Is that
correct? That is, if I search for "jag" I will find "jaeger" but not
"lobenjager".

Or am I (again) not understanding how this works?

TIA,

Matt

On Sun, Jan 30, 2011 at 9:59 AM, Matt Warner <matt@warnertechnology.com>wrote:

Show quoted text

Aha! Thanks for pointing that out. It's indexing now.

Thanks!

Matt

On Sun, Jan 30, 2011 at 9:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Matt Warner <matt@warnertechnology.com> writes:

Doesn't seem to work either. Maybe something changed in 9.1?
create index test_idx on testtable using gin(to_tsvector(wordcolumn||'
'||reverse(wordcolumn)));
ERROR: functions in index expression must be marked IMMUTABLE

That's not the same case he tested. The single-parameter form of
to_tsvector isn't immutable, because it depends on the default text
search configuration parameter. It should work, AFAICS, with the
two-parameter form.

regards, tom lane

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Matt Warner (#10)
Re: Full Text Index Scanning

Matt Warner <matt@warnertechnology.com> writes:

If I understand this, it looks like this approach allows me to match the
beginnings and endings of words, but not the middle sections.

Yeah, probably. You might consider using contrib/pg_trgm instead if
you need arbitrary substrings.

regards, tom lane

#12Matt Warner
matt@warnertechnology.com
In reply to: Tom Lane (#11)
Re: Full Text Index Scanning

Thanks. pg_trgm looks interesting, but after installing the pg_trgm.sql, I
get error messages when following the documentation.

sggeeorg=> create index test_idx on test using gist(columnname
gist_trgm_ops);
ERROR: operator class "gist_trgm_ops" does not exist for access method
"gist"
STATEMENT: create index test_idx on test using
gist(columnname gist_trgm_ops);
ERROR: operator class "gist_trgm_ops" does not exist for access method
"gist"

On Sun, Jan 30, 2011 at 10:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Show quoted text

Matt Warner <matt@warnertechnology.com> writes:

If I understand this, it looks like this approach allows me to match the
beginnings and endings of words, but not the middle sections.

Yeah, probably. You might consider using contrib/pg_trgm instead if
you need arbitrary substrings.

regards, tom lane