tsearch in core patch, for inclusion
We (Oleg and me) are glad to present tsearch in core of pgsql patch. In basic,
layout, functions, methods, types etc are the same as in current tsearch2 with a
lot of improvements:
- pg_ts_* tables now are in pg_catalog
- parsers, dictionaries, configurations now have owner and namespace similar to
other pgsql's objects like tables, operator classes etc
- current tsearch configuration is managed with a help of GUC variable
tsearch_conf_name.
- choosing of tsearch cfg by locale may be done for each schema separately
- managing of tsearch configuration with a help of SQL commands, not with
insert/update/delete statements. This allows to drive dependencies,
correct dumping and dropping.
- psql support with a help of \dF* commands
- add all available Snowball stemmers and corresponding configuration
- correct memory freeing by any dictionary
Work is sponsored by EnterpriseDB's PostgreSQL Development Fund.
patch: http://www.sigaev.ru/misc/tsearch_core-0.33.gz
docs: http://mira.sai.msu.su/~megera/pgsql/ftsdoc/ (not yet completed and it's
not yet a patch, just a SGML source)
Implementation details:
- directory layout
src/backend/utils/adt/tsearch - all IO function and simple operations
src/backend/utils/tsearch - complex processing functions, including
language processing and dictionaries
- most of snowball dictionaries are placed in separate .so library and
they plug in into data base by similar way as character conversation
library does.
If there aren't objections then we plan commit patch tomorrow or after tomorrow.
Before committing, I'll changes oids from 5000+ to lower values to prevent holes
in oids. And after that, I'll remove tsearch2 contrib module.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.
I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut wrote:
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.
Of which I will counter that we don't have a hailed plugin mechanism. We
have a contrib which professionals generally consider untested and not
part of PostgreSQL.
I am constantly running into this:
Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*
Where on the website can I see what "plugins" are included with PostgreSQL?
Where on the website can I see the Official PostgreSQL Documentation for
Full Text Indexing?
With TSearch2 in core will that fix the many upgrade problems associated
with using TSearch2?
Sincerely,
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.
This is a fairly large patch and I would like the chance to review it
before it goes in --- "we'll commit tomorrow" is not exactly a decent
review window.
Peter Eisentraut <peter_e@gmx.net> writes:
I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons,
One possible argument for this over the contrib version is a saner
approach to dumping and restoring configurations. However, as against
that:
1) what's the upgrade path for getting an existing tsearch2
configuration into this implementation?
2) once we put this in core we are going to be stuck with supporting its
SQL API forever. Are we convinced that this API is the one we want?
I don't recall even having seen any proposal or discussion. It was OK
for tsearch2's API to change every release while it was in contrib, but
the expectation of stability is a whole lot higher for core features.
regards, tom lane
Joshua D. Drake wrote:
Peter Eisentraut wrote:
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.Of which I will counter that we don't have a hailed plugin mechanism. We
have a contrib which professionals generally consider untested and not
part of PostgreSQL.I am constantly running into this:
Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*Where on the website can I see what "plugins" are included with PostgreSQL?
Where on the website can I see the Official PostgreSQL Documentation for
Full Text Indexing?With TSearch2 in core will that fix the many upgrade problems associated
with using TSearch2?
contrib is a horrible misnomer. Can we maybe bite the bullet and call it
something else?
cheers
andrew
On Wed, 2007-01-24 at 19:15 +0100, Peter Eisentraut wrote:
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.
On that point, why do we have /contrib? It's for "plugins" that are so
version-dependent that they can't exist as a separate project, as I
understand it.
But what we want when we say we have a plugin mechanism is something
more like CPAN, where software is developed on it's own timeline and can
be added seamlessly into any version of PostgreSQL that supports the
needs of the project.
PostGIS is a good example of this. You don't have to wait for a
PostgreSQL release to upgrade PostGIS, and they don't have to discuss
the intricacies of spatial queries and data on -hackers.
If tsearch2 really does need to be in lockstep with the PostgreSQL
releases (although I don't see why it does), I don't see a problem
putting it in core. It's an important feature, and we're already giving
up a lot of the benefits of plugins anyway by distributing it with the
project.
Regards,
Jeff Davis
On Wed, Jan 24, 2007 at 01:53:54PM -0500, Andrew Dunstan wrote:
Joshua D. Drake wrote:
Peter Eisentraut wrote:
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.Of which I will counter that we don't have a hailed plugin mechanism. We
have a contrib which professionals generally consider untested and not
part of PostgreSQL.I am constantly running into this:
Q. Does PostgreSQL have full text indexing?
A. Yes it is in contrib.
Q. But that isn't part of core.
A. *sigh*Where on the website can I see what "plugins" are included with
PostgreSQL?Where on the website can I see the Official PostgreSQL
Documentation for Full Text Indexing?With TSearch2 in core will that fix the many upgrade problems
associated with using TSearch2?contrib is a horrible misnomer. Can we maybe bite the bullet and
call it something else?
Some version of "version-dependent plugins?"
Cheers,
D (who hasn't come up with anything shorter just yet)
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter
Remember to vote!
On Wed, 24 Jan 2007, Peter Eisentraut wrote:
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.
I for one am greatly looking forward to tsearch2 being in core. I was
very fond of the plugin mechanism, until I signed up with a hosting
provider. I do not have superuser privileges on the database cluster, and
they will not install any plugins due to unspecified "security concerns".
So ATM if I want full text indexing, my only choice would be to avail
myself of their mysql instance which has it built in. So I have been
jaded, and my opinion of optional plugins has gone from "wow, this is
neat" to "man, this is a pain". They do not install plpgsql so I cannot
write any triggers, they don't install tsearch2 so I don't get full text
indexing, so all of the great features of postgres I have come to enjoy on
my own box are suddenly taken away :(
Sorry for the rant, I am just looking forward to 8.3 so I could get full
text indexing...
--
ARCHDUKE FERDINAND FOUND ALIVE --
FIRST WORLD WAR A MISTAKE
Jeremy Drake wrote:
On Wed, 24 Jan 2007, Peter Eisentraut wrote:
Teodor Sigaev wrote:
If there aren't objections then we plan commit patch tomorrow or
after tomorrow.I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.I for one am greatly looking forward to tsearch2 being in core. I was
very fond of the plugin mechanism, until I signed up with a hosting
provider. I do not have superuser privileges on the database cluster, and
they will not install any plugins due to unspecified "security concerns".
You could move to Hub or Command Prompt ;)
Joshua D. Drake
So ATM if I want full text indexing, my only choice would be to avail
myself of their mysql instance which has it built in. So I have been
jaded, and my opinion of optional plugins has gone from "wow, this is
neat" to "man, this is a pain". They do not install plpgsql so I cannot
write any triggers, they don't install tsearch2 so I don't get full text
indexing, so all of the great features of postgres I have come to enjoy on
my own box are suddenly taken away :(Sorry for the rant, I am just looking forward to 8.3 so I could get full
text indexing...
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
On Wed, 2007-01-24 at 13:49 -0500, Tom Lane wrote:
2) once we put this in core we are going to be stuck with supporting its
SQL API forever. Are we convinced that this API is the one we want?
I don't recall even having seen any proposal or discussion.
There has been some prior discussion:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00919.php
But I agree that we need considerably more discussion before committing
the patch. I'm personally not sold on the need for modifications to the
SQL grammar, for example, as opposed to just using a set of SQL-callable
functions and some new system catalogs.
Another question that would be easier to resolve before the patch is
committed is naming: the patch currently uses a mix of "full text" and
"tsearch[2]" as the name of the full-text search feature. If we're going
to bless this as "the" integrated full-text search in PG, it might make
more sense to use "full text search" and "FTS" exclusively.
-Neil
Jeremy Drake wrote:
On Wed, 24 Jan 2007, Peter Eisentraut wrote:
I still haven't heard any argument for why this would be necessary or
desirable at all, other than that it looks better for marketing
reasons, which I will counter by saying that it looks worse for
marketing reasons because our hailed plugin mechanism is apparently so
poor that it can't support some practical extension module such as
this.I for one am greatly looking forward to tsearch2 being in core.
For goodness' sake! This is work that's been sponsored! Are we going to
turn around now and reject it? We'd be a laughing stock.
cheers
andrew
Andrew Dunstan wrote:
contrib is a horrible misnomer. Can we maybe bite the bullet and call
it something else?
plugins?
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Jeff Davis wrote:
On that point, why do we have /contrib? It's for "plugins" that are
so version-dependent that they can't exist as a separate project, as
I understand it.
No. (I don't know a better and succinct answer, but that is not it.)
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Jeremy Drake wrote:
I for one am greatly looking forward to tsearch2 being in core. I
was very fond of the plugin mechanism, until I signed up with a
hosting provider.
Yes, you have told us about your hosting provider before. Just make
sure your next hosting provider does not refuse to install database
objects whose OID is a multiple of 13 because of bad luck, or you might
miss out on full-text indexing again.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut wrote:
Jeremy Drake wrote:
I for one am greatly looking forward to tsearch2 being in core. I
was very fond of the plugin mechanism, until I signed up with a
hosting provider.Yes, you have told us about your hosting provider before. Just make
sure your next hosting provider does not refuse to install database
objects whose OID is a multiple of 13 because of bad luck, or you might
miss out on full-text indexing again.
Well we just turn off OIDs to help prevent that possible curse.
Sincerely,
Joshua D. Drake
--
=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/
Neil Conway wrote:
On Wed, 2007-01-24 at 13:49 -0500, Tom Lane wrote:
2) once we put this in core we are going to be stuck with supporting its
SQL API forever. Are we convinced that this API is the one we want?
I don't recall even having seen any proposal or discussion.There has been some prior discussion:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00919.php
But I agree that we need considerably more discussion before committing
the patch. I'm personally not sold on the need for modifications to the
SQL grammar, for example, as opposed to just using a set of SQL-callable
functions and some new system catalogs.
I think one can find arguments for both variants - one of the question
might even be how other databases are doing that and if the proposed
syntax is resembling one of those or not.
Another question that would be easier to resolve before the patch is
committed is naming: the patch currently uses a mix of "full text" and
"tsearch[2]" as the name of the full-text search feature. If we're going
to bless this as "the" integrated full-text search in PG, it might make
more sense to use "full text search" and "FTS" exclusively.
making this consistent makes a lot of sense and I agree that it might be
a good idea to just call it FTS (or similiar).
But on the other side would have to go as far as renaming
TSVECTOR/TSQUERY to FTSVECTOR/FTSQUERY or similiar which might pose some
considerable headache for people upgrading from the contrib/ version.
Stefan
Neil Conway wrote:
But I agree that we need considerably more discussion before
committing the patch. I'm personally not sold on the need for
modifications to the SQL grammar, for example, as opposed to just
using a set of SQL-callable functions and some new system catalogs.
In particular, I would think that unless one is affiliated with The New
COBOL World Order, one would *prefer* a set of functions over new SQL
statements. And using functions to manage extensions seems to be the
established way in Oracle land, if that matters at all.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Peter Eisentraut wrote:
Andrew Dunstan wrote:
contrib is a horrible misnomer. Can we maybe bite the bullet and call
it something else?plugins?
standard-plugins might be more informative. I think of them as being
like perl's standard modules, things that are part of the standard perl
distribution as opposed to all the other stuff on CPAN.
Maybe it needs to split into two - things that are genuine plugins and
other stuff (e.g. start-scripts).
cheers
andrew
Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
Neil Conway wrote:
Another question that would be easier to resolve before the patch is
committed is naming: the patch currently uses a mix of "full text" and
"tsearch[2]" as the name of the full-text search feature. If we're going
to bless this as "the" integrated full-text search in PG, it might make
more sense to use "full text search" and "FTS" exclusively.
making this consistent makes a lot of sense and I agree that it might be
a good idea to just call it FTS (or similiar).
But on the other side would have to go as far as renaming
TSVECTOR/TSQUERY to FTSVECTOR/FTSQUERY or similiar which might pose some
considerable headache for people upgrading from the contrib/ version.
If we use "text search" (abbrev TS) as the key phrase we can avoid that.
But this reiterates my point that the upgrade path for existing tsearch2
users is an important thing to consider.
regards, tom lane
Peter Eisentraut wrote:
Jeremy Drake wrote:
I for one am greatly looking forward to tsearch2 being in core. I
was very fond of the plugin mechanism, until I signed up with a
hosting provider.Yes, you have told us about your hosting provider before. Just make
sure your next hosting provider does not refuse to install database
objects whose OID is a multiple of 13 because of bad luck, or you might
miss out on full-text indexing again.
sure that ISP is a bit stupid(especially wrt plpgsql) - but tsearch2 in
the current version is actually imposing some additional(often
non-trivial) complexity for things like database restores and upgrades
so I can see an ISP wanting to avoid that altogether.
A fully integrated fulltext search could make that much easier(in a few
years when most distributions have picked up 8.3) and just telling
people they should switch their hosting ISP is not always an immediatly
workable solution (think contracts,migration costs,legacy apps).
Stefan