Proposal: syntax of operation with tsearch's configuration
Hi!
Now we (Oleg and me) are working on moving tsearch into core.
Pls, review suggested syntax. Comments, suggestions, objections will be appreciated.
1) parser operation (pg_ts_parser table)
CREATE PARSER prsname (
START = funcname,
GETTOKEN = funcname,
END = funcname,
LEXTYPES = funcname
[ , HEADLINE = funcname ]
);
DROP PARSER [IF EXISTS] prsname [ CASCADE | RESTRICT ];
ALTER PARSER prsname RENAME TO newprsname;
COMMENT ON PARSER IS text;
2) dictionaries (pg_ts_dict)
CREATE DICTIONARY dictname (
INIT = funcname,
LEXIZE = funcname,
OPT = text,
);
--create new dictionary as already existed but with different
-- options for example
CREATE DICTIONARY dictname [(
[ INIT = funcname ]
[ , LEXIZE = funcname ]
[ , OPT = text ]
)] LIKE template_dictname;
DROP DICTINARY [IF EXISTS] dictname [ CASCADE | RESTRICT ];
ALTER DICTIONARY dictname RENAME TO newdictname;
ALTER DICTIONARY dictname SET OPT=text;
COMMENT ON DICTIONARY IS text;
3) configuration (pg_ts_cfg [,pg_ts_cfgmap])
CREATE TSEARCH CONFIGURATION cfgname (
PARSER = prsname
[, LOCALE = localename]
);
--create new configuration and optionally copies
--map of lexeme's type to dictionaries
CREATE TSEARCH CONFIGURATION cfgname [(
LOCALE = localename
)] LIKE template_cfg [WITH MAP];
DROP TSEARCH CONFIGURATION [IF EXISTS] cfgname [ CASCADE | RESTRICT ];
ALTER TSEARCH CONFIGURATION cfgname RENAME TO newcfgname;
ALTER TSEARCH CONFIGURATION cfgname SET LOCALE=localename;
ALTER TSEARCH CONFIGURATION cfgname SET PARSER=prsname;
COMMENT ON TSEARCH CONFIGURATION IS text;
4) operate mapping lexemes to list of dictionary
CREATE TSEARCH MAPPING ON cfgname FOR lexemetypename USE dictname1[, dictname2
[..] ];
DROP TSEARCH MAPPING [IF EXISTS] ON cfgname FOR lexemetypename;
ALTER TSEARCH MAPPING ON cfgname FOR lexemetypename USE dictname1[, dictname2
[..] ];
Next, tsearch's configuration will be readable by psql backslashed command (F
means fulltext):
\dF - list of configurations
\dF PATTERN - describe configuration with used parser and lexeme's mapping
\dFd - list of dictionaries
\dFd PATTERN - describe dictionary
\dFp - parser's list
\dFp PATETRN - describe parser
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes:
Now we (Oleg and me) are working on moving tsearch into core.
Pls, review suggested syntax. Comments, suggestions, objections will be appreciated.
Is it really necessary to invent a batch of special-purpose commands?
Seems like this will add some thousands of lines of code and no actual
new functionality; not to mention loss of backwards compatibility for
existing tsearch2 users.
regards, tom lane
Hmm, IMHO, it's needed for consistent interface: nobody adds new column to table
by editing pg_class & pg_attribute, nobody looks for description of table by
selection values from system table.
Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
Now we (Oleg and me) are working on moving tsearch into core.
Pls, review suggested syntax. Comments, suggestions, objections will be appreciated.Is it really necessary to invent a batch of special-purpose commands?
Seems like this will add some thousands of lines of code and no actual
new functionality; not to mention loss of backwards compatibility for
existing tsearch2 users.regards, tom lane
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/
Teodor Sigaev wrote:
Hmm, IMHO, it's needed for consistent interface: nobody adds new
column to table by editing pg_class & pg_attribute, nobody looks for
description of table by selection values from system table.Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
Now we (Oleg and me) are working on moving tsearch into core.
Pls, review suggested syntax. Comments, suggestions, objections will
be appreciated.Is it really necessary to invent a batch of special-purpose commands?
Seems like this will add some thousands of lines of code and no actual
new functionality; not to mention loss of backwards compatibility for
existing tsearch2 users.
Thousands of lines seems a high estimate, but maybe I'm wrong. I guess
an alternative would be to do this in some builtin functions, but that
seems a tad unclean.
I am also a bit concerned that the names of the proposed objects
(parser, dictionary) don't convey their purpose adequately. Maybe
TS_DICTIONARY and TS_PARSER might be better if we in fact need to name them.
cheers
andrew
On Fri, 17 Nov 2006, Andrew Dunstan wrote:
Teodor Sigaev wrote:
Hmm, IMHO, it's needed for consistent interface: nobody adds new column to
table by editing pg_class & pg_attribute, nobody looks for description of
table by selection values from system table.Tom Lane wrote:
Teodor Sigaev <teodor@sigaev.ru> writes:
Now we (Oleg and me) are working on moving tsearch into core.
Pls, review suggested syntax. Comments, suggestions, objections will be
appreciated.Is it really necessary to invent a batch of special-purpose commands?
Seems like this will add some thousands of lines of code and no actual
new functionality; not to mention loss of backwards compatibility for
existing tsearch2 users.Thousands of lines seems a high estimate, but maybe I'm wrong. I guess an
alternative would be to do this in some builtin functions, but that seems a
tad unclean.
As Teodor already wrote we want to be consistent with the current interface to
system catalog, as long as full text search is going to the pg core.
We don't invent anything new, we just extending current user's interface
to support full text search.
I am also a bit concerned that the names of the proposed objects (parser,
dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and
TS_PARSER might be better if we in fact need to name them.
this looks reasonable to me.
cheers
andrew
---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov wrote:
On Fri, 17 Nov 2006, Andrew Dunstan wrote:
I am also a bit concerned that the names of the proposed objects (parser,
dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and
TS_PARSER might be better if we in fact need to name them.this looks reasonable to me.
Huh, but we don't use keywords with ugly abbreviations and underscores.
How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"? (Using
"FULLTEXT" instead of "FULL TEXT" means you don't created common
reserved words, and furthermore you don't collide with an existing type
name.)
I also think the "thousands of lines" is an exaggeration :-) The
grammar should take a couple dozen at most. The rest of the code would
go to their own files.
We should also take the opportunity to discuss new keywords for the XML
support -- will we use new grammar, or functions?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote:
We should also take the opportunity to discuss new keywords for the
XML support -- will we use new grammar, or functions?
The XML stuff is defined in the SQL standard and there are existing
implementations, so any nonstandard syntax is going to be significantly
less useful. (The other problem is that you can't implement most of
the stuff in functions anyway.)
I don't see any comparable arguments about this full-text search stuff.
In particular I don't see any arguments why a change would necessary at
all, including why moving to core would be necessary in the first
place.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Alvaro Herrera wrote:
Oleg Bartunov wrote:
On Fri, 17 Nov 2006, Andrew Dunstan wrote:
I am also a bit concerned that the names of the proposed objects (parser,
dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and
TS_PARSER might be better if we in fact need to name them.this looks reasonable to me.
Huh, but we don't use keywords with ugly abbreviations and underscores.
How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"? (Using
"FULLTEXT" instead of "FULL TEXT" means you don't created common
reserved words, and furthermore you don't collide with an existing type
name.)
good point. this works for me.
We should also take the opportunity to discuss new keywords for the XML
support -- will we use new grammar, or functions?
Well, it will have to be keywords if we want to be able to do anything
like the spec, IIRC.
cheers
andrew
Peter Eisentraut <peter_e@gmx.net> writes:
I don't see any comparable arguments about this full-text search stuff.
In particular I don't see any arguments why a change would necessary at
all, including why moving to core would be necessary in the first
place.
AFAIR the only argument in favor of that is basically a marketing one:
users perceive a feature as more real, or more supported, if it's in
core. I don't find this argument especially compelling myself.
regards, tom lane
On Fri, 17 Nov 2006, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I don't see any comparable arguments about this full-text search stuff.
In particular I don't see any arguments why a change would necessary at
all, including why moving to core would be necessary in the first
place.AFAIR the only argument in favor of that is basically a marketing one:
users perceive a feature as more real, or more supported, if it's in
core. I don't find this argument especially compelling myself.
I am currently in the position that my hosting provider is apprehensive
about installing modules in contrib because they believe they are less
secure. They cited (real or imagined) "security holes" as the reason they
would not install tsearch2, or any other contrib module. This leaves me
without any fulltext indexing option, as it requires a superuser to
install. I have currently worked around this by running my own postgres
instance from my home directory, as they provide shell access and allow
running background processes, but I was really happy when I heard that
tsearch2 was going to be integrated into core in 8.3.
I think I would settle for some sort of assurance somewhere by someone who
sounds authoritative that the contrib modules are not less secure than
postgres core, and are fully supported by the developers. I think if I
could point them at that, I may be able to convince them that it is safe.
Show quoted text
regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes:
I also think the "thousands of lines" is an exaggeration :-)
I think a reasonable comparison point is the operator-class commands,
which are at least in the same general ballpark of complexity.
opclasscmds.c is currently 1075 lines, and that's not counting the
grammar additions, nor miscellaneous bits of support in places like
backend/nodes/, dependency.c if you expect to be able to DROP the
objects, namespace.c if they live in schemas, aclchk.c if they have
owners or permissions, comment.c, etc. Teodor is proposing to add not
one but four new kinds of system objects. In round numbers I would
bet that such a patch will add a lot closer to 10000 lines than 1000.
It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a
huge improvement over the previous ways of doing it --- but don't
underestimate the size of what we're talking about.
regards, tom lane
Tom Lane wrote:
Alvaro Herrera <alvherre@commandprompt.com> writes:
I also think the "thousands of lines" is an exaggeration :-)
I think a reasonable comparison point is the operator-class commands,
which are at least in the same general ballpark of complexity.
opclasscmds.c is currently 1075 lines, and that's not counting the
grammar additions, nor miscellaneous bits of support in places like
backend/nodes/, dependency.c if you expect to be able to DROP the
objects, namespace.c if they live in schemas, aclchk.c if they have
owners or permissions, comment.c, etc. Teodor is proposing to add not
one but four new kinds of system objects. In round numbers I would
bet that such a patch will add a lot closer to 10000 lines than 1000.It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a
huge improvement over the previous ways of doing it --- but don't
underestimate the size of what we're talking about.
Hmm, actually the tsearch2 directory contains 16500 lines of code
(generated using David A. Wheeler's 'SLOCCount'), so I didn't doubt that
it's a big piece of code as a whole -- but I thought what was being
discussed was the size of the grammar changes, which is why I mentioned
the "a couple dozen" figure.
Having the supporting code in core does not make much of a difference
otherwise from having it in contrib, does it?
--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, 17 Nov 2006, Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I don't see any comparable arguments about this full-text search stuff.
In particular I don't see any arguments why a change would necessary at
all, including why moving to core would be necessary in the first
place.AFAIR the only argument in favor of that is basically a marketing one:
users perceive a feature as more real, or more supported, if it's in
core. I don't find this argument especially compelling myself.
marketing is not always "swear-word" :) We live in real world and there are
many situations where marketing is the deciding vote. Not all are
Tom Lane, who could convince customer saying there is no difference
between contrib module and core feature, or that PostgreSQL is a mature
database with fts add-on, which could be installed separately (with
supersuser rights).
I think, this is a good question for the next poll on postgresql.org.
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Alvaro Herrera <alvherre@commandprompt.com> writes:
Tom Lane wrote:
It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a
huge improvement over the previous ways of doing it --- but don't
underestimate the size of what we're talking about.
Hmm, actually the tsearch2 directory contains 16500 lines of code
(generated using David A. Wheeler's 'SLOCCount'), so I didn't doubt that
it's a big piece of code as a whole -- but I thought what was being
discussed was the size of the grammar changes, which is why I mentioned
the "a couple dozen" figure.
No, what I was on about was the cost of inventing custom-SQL-statement
manipulation of the catalog entries that drive tsearch2. The analogy to
operator classes is fairly exact, because before 7.3 you had to
manipulate those using direct insertions of catalog entries. The
manipulation commands are just about independent of the actual use of
the catalog entries --- my count of "support" didn't include any of the
planner or index AM code that actually uses operator classes, and in the
same way the existing tsearch2 code doesn't have any particular
relationship to this new code that'd have to be written to support the
manipulation commands.
Having the supporting code in core does not make much of a difference
otherwise from having it in contrib, does it?
Given the nonextensibility of gram.y and keywords.c, it has to be in
core to even think about having special syntax :-(
regards, tom lane
Tom Lane wrote:
Peter Eisentraut <peter_e@gmx.net> writes:
I don't see any comparable arguments about this full-text search stuff.
In particular I don't see any arguments why a change would necessary at
all, including why moving to core would be necessary in the first
place.AFAIR the only argument in favor of that is basically a marketing one:
users perceive a feature as more real, or more supported, if it's in
core. I don't find this argument especially compelling myself.
On the flip side of that argument - the more non-SQL-standard pieces
are in core, the more "non-real" other pieces non-in-core appear.
People seem to have little doubts regarding the CPAN, or Ruby Gems.
I believe because to a large part that's because a lot of very
important and well supported functionality exists outside of their
core distributions. The less that's pre-baked into core, I think
the more people will be aware of the rich set of extensions postgresql
enables.
From a marketing point of view (should I have moved this to .advocacy),
it seems to me the biggest problem is the name "contrib". If it were
called "optional" or "advanced" or "extra" I think it'd be seen less
suspiciously by hosting companies (who seem to have the biggest problem
with contrib) and we wouldn't need as many discussions of which contribs
to move into core.
Ron M
On 11/17/06, Peter Eisentraut <peter_e@gmx.net> wrote:
Alvaro Herrera wrote:
We should also take the opportunity to discuss new keywords for the
XML support -- will we use new grammar, or functions?The XML stuff is defined in the SQL standard and there are existing
implementations, so any nonstandard syntax is going to be significantly
less useful. (The other problem is that you can't implement most of
the stuff in functions anyway.)
Yes, it's better not to mix XML syntax discussion and the Tsearch2
configuration syntax discussion in one place. Not only because these
are different things - here we have a discussion of syntax for catalog
manipulation commands, when XML stuff (at least that I was working on
during summer and am going to continue) is about functionality itself.
And in case of XML we have some things to stick to - the standard
papers and existent implementations...
However, Alvaro made me to recall my old thoughts - when I just
started to use Tsearch2 I was wondering why should I explicitly create
column for index - in other databases I shouldn't do this. Indeed,
this is the index and, ideally, all I have to do is to write "CREATE
INDEX ..." only, maybe with some custom (fulltext-special) additions
(and something like "fulltext" instead of "gist").
So, is it possible to let people to avoid explicit "ALTER TABLE .. ADD
COLUMN ... tsvector"? Maybe it would be a "syntax sugar" too, but I
suppose that (especially for postgres-novices) it would simplify the
overall use of Tsearch. For me such changes are more important than
syntax for manipulations with catalog (i.e., I would live with "insert
into ts_cfg ..." one or two years more :-) ). However, I'm sure that
Oleg and Teodor already considered this feature and there should be
some things that prevent from letting users write only "CREATE INDEX"
w/o ALTERing tables...
I don't see any comparable arguments about this full-text search stuff.
In particular I don't see any arguments why a change would necessary at
all, including why moving to core would be necessary in the first
place.
Many hosters with PostgreSQL support (e.g. goDaddy - one of the
biggest hosters) don't provide any contrib module - so people have to
live w/o fulltext search. Then, many sysadmins are afraid of the word
"contrib"... So, there is no doubt for me that adding to core is
really good thing :-)
--
Best regards,
Nikolay
On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:
Having the supporting code in core does not make much of a difference
otherwise from having it in contrib, does it?Given the nonextensibility of gram.y and keywords.c, it has to be in
core to even think about having special syntax :-(
Has anyone ever heard of extensible grammers? Just thinking wildly, you
could decree that commands beginning with @ are extensions and are parsed
by the module listed next. Then your command set becomes:
@tsearch CREATE PARSER ....
Then contrib modules can add their own parser. You'd have the overhead
of multiple lex/yacc parsers, but you wouldn't have to change the main
parser for every extension.
Has anyone ever heard of something like this?
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
Show quoted text
From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes:
On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:
Given the nonextensibility of gram.y and keywords.c, it has to be in
core to even think about having special syntax :-(
Has anyone ever heard of extensible grammers?
Yeah, I worked with systems that could do that at Hewlett-Packard, nigh
thirty years ago ... but they were much less pleasant to use than bison,
and if memory serves, slower and more limited in what they could parse
(something narrower than LALR(1), IIRC, which would make certain parts
of SQL even hairier to parse than they are now). I'm not in a big hurry
to go there, even though it would certainly take some of the steam out
of "I want this in core" arguments.
... decree that commands beginning with @ are extensions and are parsed
by the module listed next. Then your command set becomes:
@tsearch CREATE PARSER ....
This'd only work well for trivial standalone commands; as a counterexample
consider CREATE INDEX, which requires access to the core sub-grammars
for typename and expression. The SQL2003 XML additions couldn't be
handled this way either.
regards, tom lane
Jeremy Drake wrote:
I am currently in the position that my hosting provider is
apprehensive about installing modules in contrib because they believe
they are less secure.
Using irrational and unfounded statements one can of course make
arguments for just about anything, but that won't help us.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/
Oleg Bartunov wrote:
marketing is not always "swear-word" :) We live in real world and
there are many situations where marketing is the deciding vote.
I don't know about you, but I market PostgreSQL partially using
1. sane design, not driven by random demands
2. extensibility
which would be completely contradicted by moving any module into core
for "marketing" reasons.
Not
all are Tom Lane, who could convince customer saying there is no
difference between contrib module and core feature, or that
PostgreSQL is a mature database with fts add-on, which could be
installed separately (with supersuser rights).
It's not like PostgreSQL is the first software product in the world to
provide a module or plugin mechanism. (It is incidentally the first
DBMS to do so.) People who refuse to understand that are idiots, and
we don't design for idiots.
--
Peter Eisentraut
http://developer.postgresql.org/~petere/