tsearch comments
Hi,
I was browsing the archives for anything related to tsearch and stumbled
across a post from Christopher Kings-Lynne in Nov 2002. He suggested to have
txt2txtidx as an index function, but he had some problems implementing his
idea.
But his idea works very fine. Just apply the attached patch (against 7.3.1),
install tsearch as normal, and then
CREATE INDEX my_test_idx ON mytable using gist(txt2txtidx(mycolumn));
assuming "mycolumn" is a text-type column.
Then you can search "mycolumn" with
SELECT * FROM mytable WHERE txt2txtidx(mycolumn) ## 'patch&gist';
This method has several advantages:
- you don't have to extend your table with another column
- pg_dump's output won't be bloated
- you don't have to use a special trigger, the index keeps itself up to
date!
Besides, we tested tsearch also with arabian and chinese input, and it works
perfectly with unicode databases as well (although it must be the exact same
phrase, since the builtin stemmer cannot stem chinese :).
Oleg: Can you search your memory why txt2txtidx possibly should not be
marked as "iscachable"?
The rest: Is there perhaps a problem with this approach that I am not aware
of?
Regards,
Bjoern
Attachments:
tsearch_txt2txtidx_patch.txttext/plain; name=tsearch_txt2txtidx_patch.txtDownload+1-1
Oleg: Can you search your memory why txt2txtidx possibly should not be
marked as "iscachable"?
Well I now see where the problem is, the function does not always have to
return the same if the arguments are the same (the data might have changed
and there may be more or less matching entries). Any hints how to overcome
this?
Regards,
Bjoern
Import Notes
Resolved by subject fallback
On Wed, 22 Jan 2003, [iso-8859-1] BjО©╫rn Metzdorf wrote:
Oleg: Can you search your memory why txt2txtidx possibly should not be
marked as "iscachable"?Well I now see where the problem is, the function does not always have to
return the same if the arguments are the same (the data might have changed
and there may be more or less matching entries). Any hints how to overcome
this?
No way, Bjoern. Think about stemming, for example.
Regards,
Bjoern---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Well I now see where the problem is, the function does not always have
to
return the same if the arguments are the same (the data might have
changed
and there may be more or less matching entries). Any hints how to
overcome
this?
No way, Bjoern. Think about stemming, for example.
Do index functions always have to be marked "iscachable" ?
Btw, the error message was confusing. It said that the function has to be
marked "isImmutable", but there is no such attribute, instead "iscachable"
had to be used. This seems to be a bug.
Regards,
Bjoern
Do index functions always have to be marked "iscachable" ?
Btw, the error message was confusing. It said that the function has to be
marked "isImmutable", but there is no such attribute, instead "iscachable"
had to be used. This seems to be a bug.
The manual says:
"All functions and operators used in an index definition must be immutable,
that is, their results must depend only on their input arguments and never
on any outside influence (such as the contents of another table or the
current time). This restriction ensures that the behavior of the index is
well-defined. To use a user-defined function in an index, remember to mark
the function immutable when you create it."
Well, in the tsearch case the results don't depend on any influence outside
of the function. The stemmer is integrated in the function and we don't use
the current time nor another table. So it should be safe to mark that
function "iscachable" or when it is fixed "isimmutable", am I right?
Regards,
Bjoern
Bjorn,
you'll get problem with your approach not to create additional columns
if index will not used ! In our approach sequential scan will use
txtidx column. I don't understand what error you're comment.
Probably I miss something. We are working on new version of tsearch
and alpha version will be available next week. Are you sure
'iscacheability' should be configureable somehow ? Teodor, what do
you think ?
Oleg
On Thu, 23 Jan 2003, Bjorn Metzdorf wrote:
Do index functions always have to be marked "iscachable" ?
Btw, the error message was confusing. It said that the function has to be
marked "isImmutable", but there is no such attribute, instead "iscachable"
had to be used. This seems to be a bug.The manual says:
"All functions and operators used in an index definition must be immutable,
that is, their results must depend only on their input arguments and never
on any outside influence (such as the contents of another table or the
current time). This restriction ensures that the behavior of the index is
well-defined. To use a user-defined function in an index, remember to mark
the function immutable when you create it."Well, in the tsearch case the results don't depend on any influence outside
of the function. The stemmer is integrated in the function and we don't use
the current time nor another table. So it should be safe to mark that
function "iscachable" or when it is fixed "isimmutable", am I right?Regards,
Bjoern
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
txtidx column. I don't understand what error you're comment.
I think "iscachable" and "isimmutable" are mixed up. "CREATE FUNTION" does
not accept "isimmutable".
Probably I miss something. We are working on new version of tsearch
and alpha version will be available next week. Are you sure
Great news! Any comments on the new version? Snowball support? txt2txtidx
without morphology?
Regards,
Bjoern
Yes, I don't see that problem either. The locale and the configuration of
dictionaries do not change while running the postmaster, do they?
I forgot parser too.
Parser and dictionaries will be changed after gmake install (contrib/tsearch)
:), without restarting postmaster.
In common case, we can't set iscachable option to txt2txtidx, but if you know
what you do you can set it. May be
some comments must be included in README.
Bj�rn Metzdorf wrote:
txtidx column. I don't understand what error you're comment.
I think "iscachable" and "isimmutable" are mixed up. "CREATE FUNTION" does
not accept "isimmutable".Probably I miss something. We are working on new version of tsearch
and alpha version will be available next week. Are you sureGreat news! Any comments on the new version? Snowball support? txt2txtidx
without morphology?
Snowball, ISpell....
SQL-level for configure parsers and dictionaries.
SQL-level for configure map (which dictionary for each type of lexem).
Just wait :)
--
Teodor Sigaev
teodor@stack.net
Yes, I don't see that problem either. The locale and the configuration
of
dictionaries do not change while running the postmaster, do they?
I forgot parser too.
Parser and dictionaries will be changed after gmake install
(contrib/tsearch)
:), without restarting postmaster.
In common case, we can't set iscachable option to txt2txtidx, but if you
know
what you do you can set it. May be
some comments must be included in README.
Ok, but then it should be sufficient to recreate the txt2txtidx function
and/or the index after a change of parser and dictionaries. So generally
spoken, the index function approach to tsearch works, if you take care of
that. I gladly recreate the function and/or index from time to time, if I
can do without that additional column, bloated dump and slow trigger.
This is very good news, as this is a very easy approach to having an easy to
use fulltextsearch in postgresql.
Great news! Any comments on the new version? Snowball support?
txt2txtidx
without morphology?
Snowball, ISpell....
SQL-level for configure parsers and dictionaries.
SQL-level for configure map (which dictionary for each type of lexem).
Just wait :)
Wow, sounds great!
Regards,
Bjoern
As someone who is just getting started with PostygreSQL from years working
with MySQL, it appears that the timestamp data type does not behave in the
way it did with MySQL. I got used to just defining a column as a timestamp
and letting the database throw the latest time stamp in there whenever a row
was updated. Is there anything simular in PosgreSQL? How can I accomplish
something simular inside the database, or am I stuck populating the field in
some manner as in the following example
update blah blah blah timestamp = NOW()
Thanks,
Alan
As someone who is just getting started with PostygreSQL from years working
with MySQL, it appears that the timestamp data type does not behave in the
way it did with MySQL. I got used to just defining a column as a timestamp
and letting the database throw the latest time stamp in there whenever a
row
was updated. Is there anything simular in PosgreSQL? How can I accomplish
something simular inside the database, or am I stuck populating the field
in
some manner as in the following example
There is no such datatype in postgresql. If you just need the current time
inserted on INSERT and not on UPDATE, then you can declare that column with
"default now()". Else you will need to install a trigger, then you can have
exactly the same behaviour as with mysql.
Regards,
Bjoern
postgreSQL actually has a better implementation of timestamps.... In
mySQL -- You can have only 1 timestamp field...
In postgreSQL -- you can have as many "create_dt" timestamp default 'now()'
fields as you want and on an insert -- they ALL get stamped whereas in
mySQL --ONLY 1 COLUMN gets updated....
""Alan T. Miller"" <amiller@hollywood101.com> wrote in message
news:003a01c2c2da$a90d10a0$6e01a8c0@webdev...
As someone who is just getting started with PostygreSQL from years working
with MySQL, it appears that the timestamp data type does not behave in the
way it did with MySQL. I got used to just defining a column as a timestamp
and letting the database throw the latest time stamp in there whenever a
row
was updated. Is there anything simular in PosgreSQL? How can I accomplish
something simular inside the database, or am I stuck populating the field
in
Show quoted text
some manner as in the following example
update blah blah blah timestamp = NOW()
Thanks,
Alan
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Ok, but then it should be sufficient to recreate the txt2txtidx function
and/or the index after a change of parser and dictionaries. So generally
spoken, the index function approach to tsearch works, if you take care of
that. I gladly recreate the function and/or index from time to time, if I
can do without that additional column, bloated dump and slow trigger.This is very good news, as this is a very easy approach to having an easy
to
use fulltextsearch in postgresql.
I talked a bit more with Oleg and Teodor about this index function approach
and we came to the conclusion that it is safe to use (if you take care of
the above), but it might be a bit slower than the original column based
approach. That it because the used operators are defined with RECHECK, and
with the index function approach the RECHECK is against an (expensive)
function instead of "raw" data in a column.
Btw. the "iscachable" has another advantage, it seems to really speed up the
search. I have done tests again 250000 entries, mostly nicknames based on
fantasy (so the english stemmer has not much to do), and the first search
for a name takes about 0.5 - 2.5 seconds whereas all subsequent searches for
the same name (even with fresh inserted data) take less than 0.1 seconds. Or
does this have nothing to do with "iscachable"?
Regards,
Bjoern
"Bjorn Metzdorf" <bm@turtle-entertainment.de> writes:
Btw, the error message was confusing. It said that the function has to be
marked "isImmutable", but there is no such attribute, instead "iscachable"
had to be used. This seems to be a bug.
Actually the preferred syntax in 7.3 is
CREATE FUNCTION ... LANGUAGE foo IMMUTABLE
For awhile during 7.3 development you had to write WITH (isImmutable)
but we changed the syntax to be more SQL-spec-compatible. This error
message seems not to have gotten fixed --- thanks for pointing it out.
regards, tom lane
As someone who is just getting started with PostygreSQL from years working
with MySQL, it appears that the timestamp data type does not behave in the
way it did with MySQL.
Much as I like MySQL, it can sometimes be a little *too* helpful.
I got used to just defining a column as a timestamp
and letting the database throw the latest time stamp in there whenever a
row
was updated. Is there anything simular in PosgreSQL?
When you create the table do something like:
CREATE TABLE foo (
bar timestamp DEFAULT now(),
...
);
You can of course do this with any column-type and value. See the
SQL-reference for details.
- Richard Huxton
On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote:
As someone who is just getting started with PostygreSQL from
years working with MySQL, it appears that the timestamp data
type does not behave in the way it did with MySQL. I got used
to just defining a column as a timestamp and letting the
database throw the latest time stamp in there whenever a row
was updated. Is there anything simular in PosgreSQL? How can I
accomplish something simular inside the database, or am I stuck
populating the field in some manner as in the following exampleupdate blah blah blah timestamp = NOW()
triggers or rules can do that with any timestamp (or timestamp(0)
-- no partial-seconds) field. here's a "rules" approach:
create table _something (
id serial,
dat text,
freshened timestamp(0),
primary key ( id )
);
create view something as
select
id,
dat,
freshened
from
_something;
create rule something_add as
on insert to something
do instead (
insert into _something (
--id,
dat,
freshened
) values (
--let id take care of itself,
NEW.dat,
current_timestamp
);
);
create rule something_edit as
on update to something
do instead (
update _something set
--id = leave it alone,
dat = NEW.dat,
freshened = current_timestamp
where
id = NEW.id
;
);
then you can just
insert into something (dat) values ('yada yada');
update something set dat = 'here we go' where id = 23978;
and "freshened" takes care of itself.
--
There are 10 kinds of people:
ones that get binary, and ones that don't.
will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!
Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !
Of course the proposed solution solves the INSERTs....you can also set a
trigger that
on UPDATE set the value to 'now()'
dev@archonet.com wrote:
Show quoted text
As someone who is just getting started with PostygreSQL from years working
with MySQL, it appears that the timestamp data type does not behave in the
way it did with MySQL.Much as I like MySQL, it can sometimes be a little *too* helpful.
I got used to just defining a column as a timestamp
and letting the database throw the latest time stamp in there whenever a
row
was updated. Is there anything simular in PosgreSQL?When you create the table do something like:
CREATE TABLE foo (
bar timestamp DEFAULT now(),
...
);You can of course do this with any column-type and value. See the
SQL-reference for details.- Richard Huxton
---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster
One word of caution, I *think* you want quotes around the 'now()'
statement in your table definition. Otherwise your default value will be
the instant the *table* was created, not the instant your insert
happened.
I too had an adjustment period when switching from MySQL to postgres.
However, I think you'll find that if you use a RULE to implement this
feature you will soon become addicted to PostgreSQL's advanced feature
set.
You can learn more about rules by going to
http://www.ca.postgresql.org/docs/aw_pgsql_book/node124.html where you
can find a brief description and example, or for more detail, you can go
to
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/rules-insert
.html
I attained all of my database knowledge through hands on experience;
much of it on MySQL. Switching to PostgreSQL was somewhat traumatic
because it has a lot more power and a lot more features. Once I got a
handle on things like Views, Triggers and Rules, I have become somewhat
dependant on them and I haven't been able to use MySQL for anything more
than the most basic of applications. I guess that's just a warning...
There may be no turning back.
--
Matthew Nuzum
www.bearfruit.org
cobalt@bearfruit.org
-----Original Message-----
From: Björn Metzdorf [mailto:bm@turtle-entertainment.de]
Sent: Thursday, January 23, 2003 7:32 AM
To: Alan T. Miller; pgsql-general@postgresql.org
Subject: Re: I was spoiled by the MySQL timestamp fieldAs someone who is just getting started with PostygreSQL from years
working
with MySQL, it appears that the timestamp data type does not behave
in
the
way it did with MySQL. I got used to just defining a column as a
timestamp
and letting the database throw the latest time stamp in there
whenever a
row
was updated. Is there anything simular in PosgreSQL? How can I
accomplish
something simular inside the database, or am I stuck populating the
field
insome manner as in the following example
There is no such datatype in postgresql. If you just need the current
time
Show quoted text
inserted on INSERT and not on UPDATE, then you can declare that column
with
"default now()". Else you will need to install a trigger, then you can
have
exactly the same behaviour as with mysql.Regards,
Bjoern
Why not just use a DEFAULT NOW()?
ie:
CREATE TABLE blah (
time TIMESTAMP NOT NULL DEFAULT NOW()
);
I know it works for DATETIME types, don't know about TIMESTAMP but I assume
it would be the same.
Luke.
----- Original Message -----
From: "will trillich" <will@serensoft.com>
To: <pgsql-general@postgresql.org>
Sent: Friday, January 24, 2003 7:13 AM
Subject: Re: [GENERAL] I was spoiled by the MySQL timestamp field
Show quoted text
On Thu, Jan 23, 2003 at 04:26:31AM -0800, Alan T. Miller wrote:
As someone who is just getting started with PostygreSQL from
years working with MySQL, it appears that the timestamp data
type does not behave in the way it did with MySQL. I got used
to just defining a column as a timestamp and letting the
database throw the latest time stamp in there whenever a row
was updated. Is there anything simular in PosgreSQL? How can I
accomplish something simular inside the database, or am I stuck
populating the field in some manner as in the following exampleupdate blah blah blah timestamp = NOW()
triggers or rules can do that with any timestamp (or timestamp(0)
-- no partial-seconds) field. here's a "rules" approach:create table _something (
id serial,
dat text,
freshened timestamp(0),
primary key ( id )
);create view something as
select
id,
dat,
freshened
from
_something;create rule something_add as
on insert to something
do instead (
insert into _something (
--id,
dat,
freshened
) values (
--let id take care of itself,
NEW.dat,
current_timestamp
);
);create rule something_edit as
on update to something
do instead (
update _something set
--id = leave it alone,
dat = NEW.dat,
freshened = current_timestamp
where
id = NEW.id
;
);then you can just
insert into something (dat) values ('yada yada');
update something set dat = 'here we go' where id = 23978;and "freshened" takes care of itself.
--
There are 10 kinds of people:
ones that get binary, and ones that don't.will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!Looking for a firewall? Do you think smoothwall sucks? You're
probably right... Try the folks at http://clarkconnect.org/ !---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
See http://www.brasileiro.net:8080/postgres/cookbook/view-one-recipe.adp?recipe_id=10
Matthew Nuzum wrote:
Show quoted text
One word of caution, I *think* you want quotes around the 'now()'
statement in your table definition. Otherwise your default value will be
the instant the *table* was created, not the instant your insert
happened.I too had an adjustment period when switching from MySQL to postgres.
However, I think you'll find that if you use a RULE to implement this
feature you will soon become addicted to PostgreSQL's advanced feature
set.You can learn more about rules by going to
http://www.ca.postgresql.org/docs/aw_pgsql_book/node124.html where you
can find a brief description and example, or for more detail, you can go
to
http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/rules-insert
.htmlI attained all of my database knowledge through hands on experience;
much of it on MySQL. Switching to PostgreSQL was somewhat traumatic
because it has a lot more power and a lot more features. Once I got a
handle on things like Views, Triggers and Rules, I have become somewhat
dependant on them and I haven't been able to use MySQL for anything more
than the most basic of applications. I guess that's just a warning...
There may be no turning back.--
Matthew Nuzum
www.bearfruit.org
cobalt@bearfruit.org
=20-----Original Message-----
From: Bj=F6rn Metzdorf [mailto:bm@turtle-entertainment.de]
Sent: Thursday, January 23, 2003 7:32 AM
To: Alan T. Miller; pgsql-general@postgresql.org
Subject: Re: I was spoiled by the MySQL timestamp field
=20As someone who is just getting started with PostygreSQL from years
working
with MySQL, it appears that the timestamp data type does not behave
in
the
way it did with MySQL. I got used to just defining a column as a
timestamp
and letting the database throw the latest time stamp in there
whenever a
row
was updated. Is there anything simular in PosgreSQL? How can I
accomplish
something simular inside the database, or am I stuck populating the
field
insome manner as in the following example
=20
There is no such datatype in postgresql. If you just need the currenttime
inserted on INSERT and not on UPDATE, then you can declare that column
with
"default now()". Else you will need to install a trigger, then you can
have
exactly the same behaviour as with mysql.
=20
Regards,
Bjoern---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?