TSEARCH2: disable stemming in indexes and triggers

Started by Erwin Molleralmost 19 years ago4 messagesgeneral
Jump to latest
#1Erwin Moller
erwin@darwine.nl

Hi all,

I installed TSEARCH2 on Postgres8.1 (Debian).
It runs all fine.
I have 2 tables indexed, and created triggers to keep the vectorcolumns
up
to date.

However, the text I indexed is a mix of Dutch and English and German.
The default stemmingprocess is an annoyance for me.
I would like to disable it.
I found out that using 'simple' instead of 'default' when using
to_tsvector() does excactly that, but I don't know how to change my
triggers and indexes to keep doing the same (using 'simple').

My TSEARCH2 and trigger-skills are developing, but I lack the confidence
to
change thing on the site (which is live), so I thought I check in here.

I have a table named tblvacature, that contains a bunch of columns that
need
to be indexed by TSEARCH.

This is what I did (and it works, except for the fact I want to disable
stemming)

1) ALTER TABLE tblvacature ADD COLUMN idxFTI tsvector;
2) UPDATE tblvacature SET idxFTI=to_tsvector('simple',
coalesce(title,'') ||' '||
coalesce(shortintro,'') ||' '||
coalesce(werkgever,'') ||' '||
coalesce(vacaturesteller,'') ||' '||
coalesce(standplaats,'') ||' '||
coalesce(divafdelingwerkgever,'') );

3) VACUUM FULL ANALYZE;

4) CREATE INDEX idxFTIvacture_idx ON tblvacature USING gist(idxFTI);

5) VACUUM FULL ANALYZE;

6) CREATE TRIGGER tsvectorupdate_vacature BEFORE UPDATE OR INSERT ON
tblvacature FOR EACH ROW EXECUTE
PROCEDURE tsearch2(idxFTI, title, shortintro, werkgever,
vacaturesteller,
standplaats, divafdelingwerkgever);

Step 2 worked fine: no stemming.
But how do I cange my step6 (trigger) so it keeps using 'simple'?
Or do I have to change the index itself (step4) too? Or both?

Please advise.

Thanks for your time.

Regards,
Erwin Moller

-----------------------
PS: This is my first posting to pgsql-general: if I screw something up,
please let me know.

#2Oleg Bartunov
oleg@sai.msu.su
In reply to: Erwin Moller (#1)
Re: TSEARCH2: disable stemming in indexes and triggers

On Thu, 31 May 2007, Erwin Moller wrote:

Hi all,

I installed TSEARCH2 on Postgres8.1 (Debian).
It runs all fine.
I have 2 tables indexed, and created triggers to keep the vectorcolumns
up
to date.

However, the text I indexed is a mix of Dutch and English and German.
The default stemmingprocess is an annoyance for me.
I would like to disable it.
I found out that using 'simple' instead of 'default' when using
to_tsvector() does excactly that, but I don't know how to change my
triggers and indexes to keep doing the same (using 'simple').

My TSEARCH2 and trigger-skills are developing, but I lack the confidence
to
change thing on the site (which is live), so I thought I check in here.

I have a table named tblvacature, that contains a bunch of columns that
need
to be indexed by TSEARCH.

This is what I did (and it works, except for the fact I want to disable
stemming)

1) ALTER TABLE tblvacature ADD COLUMN idxFTI tsvector;
2) UPDATE tblvacature SET idxFTI=to_tsvector('simple',
coalesce(title,'') ||' '||
coalesce(shortintro,'') ||' '||
coalesce(werkgever,'') ||' '||
coalesce(vacaturesteller,'') ||' '||
coalesce(standplaats,'') ||' '||
coalesce(divafdelingwerkgever,'') );

3) VACUUM FULL ANALYZE;

4) CREATE INDEX idxFTIvacture_idx ON tblvacature USING gist(idxFTI);

5) VACUUM FULL ANALYZE;

6) CREATE TRIGGER tsvectorupdate_vacature BEFORE UPDATE OR INSERT ON
tblvacature FOR EACH ROW EXECUTE
PROCEDURE tsearch2(idxFTI, title, shortintro, werkgever,
vacaturesteller,
standplaats, divafdelingwerkgever);

Step 2 worked fine: no stemming.
But how do I cange my step6 (trigger) so it keeps using 'simple'?
Or do I have to change the index itself (step4) too? Or both?

Please advise.

tsearch trigger is just an example, you could write your own function !
for example,

create function my_update() returns trigger as
$$
BEGIN
NEW.idxFTI=to_tsvector('simple',
coalesce(title,'') ||' '||
coalesce(shortintro,'') ||' '||
coalesce(werkgever,'') ||' '||
coalesce(vacaturesteller,'') ||' '||
coalesce(standplaats,'') ||' '||
coalesce(divafdelingwerkgever,'') );
RETURN NEW;
END;
$$
language plpgsql;

btw, why do you don't use setweight() function to assign different weights to
the different attributes ?

Thanks for your time.

Regards,
Erwin Moller

-----------------------
PS: This is my first posting to pgsql-general: if I screw something up,
please let me know.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

#3Teodor Sigaev
teodor@sigaev.ru
In reply to: Erwin Moller (#1)
Re: TSEARCH2: disable stemming in indexes and triggers

I found out that using 'simple' instead of 'default' when using
to_tsvector() does excactly that, but I don't know how to change my
triggers and indexes to keep doing the same (using 'simple').

Suppose, your database is initialized with C locale. So, just mark
simple configuration as default:

# update pg_ts_cfg set locale=null where ts_name='default';
# update pg_ts_cfg set locale='C' where ts_name='simple';

If your locale setting is not C then mark needed configuration with your
locale.

#4Erwin Moller
erwin@darwine.nl
In reply to: Teodor Sigaev (#3)
Re: TSEARCH2: disable stemming in indexes and triggers

On Thu, 2007-05-31 at 20:27, Teodor Sigaev wrote:

I found out that using 'simple' instead of 'default' when using
to_tsvector() does excactly that, but I don't know how to change my
triggers and indexes to keep doing the same (using 'simple').

Suppose, your database is initialized with C locale. So, just mark
simple configuration as default:

# update pg_ts_cfg set locale=null where ts_name='default';
# update pg_ts_cfg set locale='C' where ts_name='simple';

Thanks Teodor.

That did the trick. :-)
At first I didn't see a change, but after reconnecting to the database
it worked for some reason beyound my meager knowledge.

Thanks to Oleg Bartunov too for his suggestion about writing my own
procedure.
This solution seemed simpler and worked the first time right away.

Thanks for your help!

Regards,
Erwin Moller

Show quoted text

If your locale setting is not C then mark needed configuration with your
locale.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org/