tsearch thoughts

Started by Christopher Kings-Lynneabout 23 years ago4 messages

chriskl@familyhealth.com.au

about 23 years ago

Is there any reason why the tseach indexes couldn't be modified to just work
on TEXT fields and not TXTIDX fields. Is there really a reason to have the
TXTIDX type?

I mean, when the index is created over the text column, instead of just
indexing the text as-is, index the txt2txtidx'd version...?

That would vastly reduce the complexity of tsearch, and would make the
indexed text invisible, as it is in most other fti implementations...?

I tried to simulate this myself, although ideally it would be invisible to
the user:

test=# create table test (a text);
CREATE
test=# CREATE INDEX my_idx ON test USING gist(txt2txtidx(a));
ERROR: DefineIndex: index function must be marked iscachable

So the index isn't iscachable - why's that?

Say it was marked iscachable, then I'd be able to query like this:

SELECT * FROM test WHERE txt2txtidx(test) ## 'apple';

This would mean that the index on-disk file would be large, but the table
file would stay small. It would also vastly reduce the size of pg_dumps...

Could we move towards something like:

CREATE FULLTEXT INDEX my_idx ON test (a);

Or something?

Chris

Oleg Bartunov

oleg@sai.msu.su

about 23 years ago

In reply to: Christopher Kings-Lynne (#1)

Re: tsearch thoughts

On Sat, 30 Nov 2002, Christopher Kings-Lynne wrote:

Is there any reason why the tseach indexes couldn't be modified to just work
on TEXT fields and not TXTIDX fields. Is there really a reason to have the
TXTIDX type?

I mean, when the index is created over the text column, instead of just
indexing the text as-is, index the txt2txtidx'd version...?

That would vastly reduce the complexity of tsearch, and would make the
indexed text invisible, as it is in most other fti implementations...?

Chris,

This is sort of we had thought about full text searching in postgres and
what should happens with maturity of tsearch. We began from contrib/module
just to get some experience and still need to do some research on
underlying algorithms. Also, remember current GiST is not concurrent and
we plan to work on this issue. We're very busy and need somebody to help
us with interface (dictionaries, parser, postgresql internal interface).

I tried to simulate this myself, although ideally it would be invisible to
the user:

test=# create table test (a text);
CREATE
test=# CREATE INDEX my_idx ON test USING gist(txt2txtidx(a));
ERROR: DefineIndex: index function must be marked iscachable

So the index isn't iscachable - why's that?

I don't remember the reason, but you may try to define it as 'iscachable'
in tsearch.sql.

Say it was marked iscachable, then I'd be able to query like this:

SELECT * FROM test WHERE txt2txtidx(test) ## 'apple';

This would mean that the index on-disk file would be large, but the table
file would stay small. It would also vastly reduce the size of pg_dumps...

Could we move towards something like:

CREATE FULLTEXT INDEX my_idx ON test (a);

Or something?

Chris

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Christopher Kings-Lynne

chriskl@familyhealth.com.au

about 23 years ago

In reply to: Oleg Bartunov (#2)

Re: tsearch thoughts

This is sort of we had thought about full text searching in postgres and
what should happens with maturity of tsearch. We began from contrib/module
just to get some experience and still need to do some research on
underlying algorithms. Also, remember current GiST is not concurrent and
we plan to work on this issue. We're very busy and need somebody to help
us with interface (dictionaries, parser, postgresql internal interface).

Hi Oleg,

I'm busy too :)

Is there for instance a specific thing that need work?

Chris

Teodor Sigaev

teodor@stack.net

about 23 years ago

In reply to: Christopher Kings-Lynne (#1)

Re: tsearch thoughts

I mean, when the index is created over the text column, instead of just
indexing the text as-is, index the txt2txtidx'd version...?

For two reasons:
1. gist_txtidx_ops create with loss information (for less size), so any
operation with index must be checked
with original txtidx value. The way " REATE INDEX my_idx ON test USING
gist(txt2txtidx(a))" may decreas performance :(
2 OpenFTS. We wanted that txtidx works with OpenFTS. And adding dictionaries,
txt2txtidx, trigger, type mquery_txt etc
was an experiment.
--
Teodor Sigaev
teodor@stack.net