Tsearch2 / Create rule on select

Started by Jean-Michel Pouréabout 19 years ago7 messagesgeneral
Jump to latest
#1Jean-Michel Pouré
jmpoure@poure.com

Dear Friends,

I am very impressed by TSearch2 and would like to thank Oleg and the
team for their hard work.

I would like to migrate a phpBB forum with more that 200.000 messages to
TSeach. Full text searches have become a bottleneck. When there are more
than 200 users, the server slow dramatically. I need TSearch!

So ... I installed TSearch2, a French dictionnary, everything is okay.
My database is PostgreSQL 8.2.3, UTF-8.

My questions now:
1) Should I migrate phpBB to TSeach2 in PHP code, rewriting the queries,
which is quite obvious or easy.

OR

2) Is there a smart way to catch queries on the fly in PostgreSQL, for
example using

"CREATE RULE "_RETURN" AS
ON SELECT TO t1
DO INSTEAD "

or the like.

The idea would be that when I search on a text field, PostgreSQL rule system would rewrite the query to seach using Tsearch2.
Is there any tool in TSeach2 doing this query rewriting rule?
Obviously, it does not seem to exist, but I would like to be sure.

3) Gin indexes
Are Gin indexes recommended for large databases?
How to calculate the size in memory of an index?

Kind regards,
Jean-Michel

#2Magnus Hagander
magnus@hagander.net
In reply to: Jean-Michel Pouré (#1)
Re: Tsearch2 / Create rule on select

On Thu, Mar 08, 2007 at 09:56:23PM +0100, Jean-Michel Pour� wrote:

Dear Friends,

I am very impressed by TSearch2 and would like to thank Oleg and the
team for their hard work.

I would like to migrate a phpBB forum with more that 200.000 messages to
TSeach. Full text searches have become a bottleneck. When there are more
than 200 users, the server slow dramatically. I need TSearch!

So ... I installed TSearch2, a French dictionnary, everything is okay.
My database is PostgreSQL 8.2.3, UTF-8.

My questions now:
1) Should I migrate phpBB to TSeach2 in PHP code, rewriting the queries,
which is quite obvious or easy.

Yes.

OR

2) Is there a smart way to catch queries on the fly in PostgreSQL, for
example using

"CREATE RULE "_RETURN" AS
ON SELECT TO t1
DO INSTEAD "

or the like.

I don't think you can do that on-the-fly. But even if you can, if it's
reasonably easy to rewrite the query in the PHP, you should do that
instead IMHO.

3) Gin indexes
Are Gin indexes recommended for large databases?

Yes. At least if you have lots of search activity, they can be a *lot*
faster than GiST (see for example
http://people.planetpostgresql.org/mha/index.php?/archives/112-GIN-performance-postgresql.org-websearch-part-2.html)

How to calculate the size in memory of an index?

No idea. Assuming you want to do it beforehand. otherwise, just create
the index and see how large it got?

//Magnus

#3Jean-Michel Pouré
jmpoure@poure.com
In reply to: Magnus Hagander (#2)
Re: Tsearch2 / Create rule on select

Le vendredi 09 mars 2007 à 10:58 +0100, Magnus Hagander a écrit :

No idea. Assuming you want to do it beforehand. otherwise, just create
the index and see how large it got?

Thank you for your comments. I will add TSeach2 support to phpBB 3.x
soon.

I had incredible response time on simple queries on more than plain-text
200.000 row. About 1 millisecond! Incredible but true.

explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@
'jmp'::tsquery limit 100;

Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824
rows=100 loops=1)
-> Index Scan using idxfti_idx on phpbb_posts_text
(cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418
rows=100 loops=1)
Index Cond: (idxfti @@ '''jmp'''::tsquery)
[color=red]Total runtime: 1.068 ms

TSearch2 will be used by wikimedia shortly (it is supported in beta
version).

IMHO, it would be better if PostgreSQL parser was able to find Tseach
indexes alone, rewritting the query automatically. Maybe it is on the
radar list of hackers.

Kind regards,
Jean-Michel

#4Oleg Bartunov
oleg@sai.msu.su
In reply to: Jean-Michel Pouré (#3)
Re: Tsearch2 / Create rule on select

On Fri, 9 Mar 2007, Jean-Michel Pour? wrote:

Le vendredi 09 mars 2007 ЪЪ 10:58 +0100, Magnus Hagander a ЪЪcrit :

No idea. Assuming you want to do it beforehand. otherwise, just create
the index and see how large it got?

Thank you for your comments. I will add TSeach2 support to phpBB 3.x
soon.

I had incredible response time on simple queries on more than plain-text
200.000 row. About 1 millisecond! Incredible but true.

explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@
'jmp'::tsquery limit 100;

Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824
rows=100 loops=1)
-> Index Scan using idxfti_idx on phpbb_posts_text
(cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418
rows=100 loops=1)
Index Cond: (idxfti @@ '''jmp'''::tsquery)
[color=red]Total runtime: 1.068 ms

TSearch2 will be used by wikimedia shortly (it is supported in beta
version).

I've already seen tsearch2 in 1.9.X release.

IMHO, it would be better if PostgreSQL parser was able to find Tseach
indexes alone, rewritting the query automatically. Maybe it is on the
radar list of hackers.

btw, we're working on new FTS feature of PostgreSQL, it's basically
tsearch integrated into the pg core. But it has some new features and
new SQL commands for configuring of FTS, so if the matter is not pressing I'd recommend
to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc

Kind regards,
Jean-Michel

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#5Jean-Michel Pouré
jmpoure@poure.com
In reply to: Oleg Bartunov (#4)
Re: Tsearch2 / Create rule on select

Le samedi 10 mars 2007 à 08:32 +0300, Oleg Bartunov a écrit :

btw, we're working on new FTS feature of PostgreSQL, it's basically
tsearch integrated into the pg core. But it has some new features and
new SQL commands for configuring of FTS, so if the matter is not
pressing I'd recommend
to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc

Great. Thank you very much Oleg.
As suggested, I will play with tsearch_core-0.37.gz patch.

After testing first, I might be interested in using tseach_core on a
production database. It is a community forum with 200.000 messages.
tseach-core would only be used for searching messages. There is no money
at stake, the database is backed-up twice a day using cron jobs.

Whenever a bug happens, are there chances that PostgreSQL database gets
perverted? On only the parser? If there is no chance to destroy the
database, I may be interested in using Tsearch-core in production,
quickly.

When will tsearch-core become part of CVS-head?

Kind regards,
Jean-Michel

#6cedric
cedric@over-blog.com
In reply to: Oleg Bartunov (#4)
Re: Tsearch2 / Create rule on select

Le samedi 10 mars 2007 06:32, Oleg Bartunov a écrit :

On Fri, 9 Mar 2007, Jean-Michel Pour? wrote:

Le vendredi 09 mars 2007 ЪЪ 10:58 +0100, Magnus Hagander a ЪЪcrit :

No idea. Assuming you want to do it beforehand. otherwise, just create
the index and see how large it got?

Thank you for your comments. I will add TSeach2 support to phpBB 3.x
soon.

I had incredible response time on simple queries on more than plain-text
200.000 row. About 1 millisecond! Incredible but true.

explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@
'jmp'::tsquery limit 100;

Limit (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824
rows=100 loops=1)
-> Index Scan using idxfti_idx on phpbb_posts_text
(cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418
rows=100 loops=1)
Index Cond: (idxfti @@ '''jmp'''::tsquery)
[color=red]Total runtime: 1.068 ms

TSearch2 will be used by wikimedia shortly (it is supported in beta
version).

I've already seen tsearch2 in 1.9.X release

unfortunely, rev 20094 wich start to correctly handle tsearch2 operator in
query parser is not in the 1.9.3 mediawiki svn tag but in the branch and
trunk.

Show quoted text

IMHO, it would be better if PostgreSQL parser was able to find Tseach
indexes alone, rewritting the query automatically. Maybe it is on the
radar list of hackers.

btw, we're working on new FTS feature of PostgreSQL, it's basically
tsearch integrated into the pg core. But it has some new features and
new SQL commands for configuring of FTS, so if the matter is not pressing
I'd recommend to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc

Kind regards,
Jean-Michel

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

#7Magnus Hagander
magnus@hagander.net
In reply to: Jean-Michel Pouré (#3)
Re: Tsearch2 / Create rule on select

Jean-Michel Pouré wrote:

IMHO, it would be better if PostgreSQL parser was able to find Tseach
indexes alone, rewritting the query automatically. Maybe it is on the
radar list of hackers.

How would you "rewrite a query automatically"? PostgreSQL will find it
automatically when you attempt to do a match using the fulltext search
operator, which is the only one supported by tsearch. I'm not sure what
else it could do?

//Magnus