TSearch2 Questions

Started by Hannes Dorbathover 20 years ago4 messagesgeneral

light@theendofthetunnel.de

over 20 years ago

A few stupid questions:

Where to get the latest version?

Is http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ a dead site
and the latest versions are always "silently" distributed with PG inside
the contrib dir?

How can I find out what version of TSearch2 I'm running?

Is there active development?

Are the patches provided on the site above for backup still needed, or
are they already included in the versions that ship with 8.0.x? If not,
why not? =)

Or the better question, are any of those patches listed under
"Development" included in the version that ships with recent PG versions?

I'm playing a bit with it ATM. Indexing one Gigabyte of plain text
worked well, with 10 GB I yet have some performance problems. I read the
TSearch Tuning Guide and will start optimizing some things, but is it a
realistic goal to index ~90GB plain text and get sub-second response
times on hardware that ~4000 EUR can buy?

Thanks in advance

--
Regards,
Hannes Dorbath

Oleg Bartunov

oleg@sai.msu.su

over 20 years ago

In reply to: Hannes Dorbath (#1)

Re: TSearch2 Questions

On Mon, 21 Nov 2005, Hannes Dorbath wrote:

A few stupid questions:

Where to get the latest version?

Is http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ a dead site and
the latest versions are always "silently" distributed with PG inside the
contrib dir?

You should always use tsearch2 distributed with postgresql.
We keep our version for testing purposes. Sometimes we publish backpatches
(from CVS HEAD) for stable releases.

How can I find out what version of TSearch2 I'm running?

Is there active development?

It's actively developed, see CVS HEAD commits. Main problem attacked is
fully UTF-8 support. Also, we plan some other improvements.
See http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo

Are the patches provided on the site above for backup still needed, or are
they already included in the versions that ship with 8.0.x? If not, why not?
=)

All patches already applied .

Or the better question, are any of those patches listed under "Development"
included in the version that ships with recent PG versions?

right now, there is no patches you should be aware of. We plan to release
UTF-8 support patch for 8.1 release.

I'm playing a bit with it ATM. Indexing one Gigabyte of plain text worked
well, with 10 GB I yet have some performance problems. I read the TSearch
Tuning Guide and will start optimizing some things, but is it a realistic
goal to index ~90GB plain text and get sub-second response times on hardware
that ~4000 EUR can buy?

What's ATM ? As for the sub-second response times it'd very depend on
your data and queries. It'd be certainly possible with our tsearch daemon
which we postponed, because we inclined to implement inverted indices first
and then build fts index on top of inverted index. But this is long-term
plan.

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Bruno Wolff III

bruno@wolff.to

over 20 years ago

In reply to: Oleg Bartunov (#2)

Re: TSearch2 Questions

On Mon, Nov 21, 2005 at 16:50:00 +0300,
Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 21 Nov 2005, Hannes Dorbath wrote:

I'm playing a bit with it ATM. Indexing one Gigabyte of plain text worked
well, with 10 GB I yet have some performance problems. I read the TSearch
Tuning Guide and will start optimizing some things, but is it a realistic
goal to index ~90GB plain text and get sub-second response times on
hardware that ~4000 EUR can buy?

What's ATM ? As for the sub-second response times it'd very depend on
your data and queries. It'd be certainly possible with our tsearch daemon
which we postponed, because we inclined to implement inverted indices first
and then build fts index on top of inverted index. But this is long-term
plan.

I believe in this context, 'ATM' is an ancronym for 'at the moment' which
has little impact on the meaning of the paragraph.

Hannes Dorbath

light@theendofthetunnel.de

over 20 years ago

In reply to: Bruno Wolff III (#3)

Re: TSearch2 Questions

On 21.11.2005 18:24, Bruno Wolff III wrote:

On Mon, Nov 21, 2005 at 16:50:00 +0300, Oleg Bartunov <oleg@sai.msu.su> wrote:

On Mon, 21 Nov 2005, Hannes Dorbath wrote:

I'm playing a bit with it ATM. Indexing one Gigabyte of plain text worked
well, with 10 GB I yet have some performance problems. I read the TSearch
Tuning Guide and will start optimizing some things, but is it a realistic
goal to index ~90GB plain text and get sub-second response times on
hardware that ~4000 EUR can buy?

What's ATM ? As for the sub-second response times it'd very depend on
your data and queries. It'd be certainly possible with our tsearch daemon
which we postponed, because we inclined to implement inverted indices first
and then build fts index on top of inverted index. But this is long-term
plan.

I believe in this context, 'ATM' is an ancronym for 'at the moment' which
has little impact on the meaning of the paragraph.

For whatever reason I cannot find Oleg's reply on this server, so I
reply to this post instead. Thanks for your time Oleg, your answers
really helped me. I still have two questions about compound words and
UTF-8, but I'll create a new specific post.

Thanks again.

--
Regards,
Hannes Dorbath